Backup vs. archiving: Make the break
Too many companies believe backup and archiving are one and the same; in fact, they're separate processes that can actually improve each other.
Our IT departments need another process like Bill Gates needs to borrow a few bucks. While there's little chance of ever spotting Bill queued up at your local ATM, it's very likely IT can benefit from evaluating existing processes to see if improvements can be made. One area that should command attention--if it hasn't been addressed already --is the separation of backup (data protection and recovery) from archiving (data retention and retrieval). This effort is especially important because external influences such as record-retention and information privacy regulations must be properly balanced with the need for better overall data protection strategies.
Yes, I'm recommending adding yet another process to IT's long, daily to-do list; but, in reality, most companies can achieve this process refinement by splitting backup into two separate initiatives. The reason is simple: Organizations should differentiate between copying data for recovery, and retaining data for future reference and retrieval.
It took a long time to formalize data protection operations. By now, most shops have ingrained procedures to back up data nightly. The biggest issue for data protection is backing up large volumes of data in a short period of time--and there won't be any relief as long as IT continues to protect unchanging data on a regular basis. Archives are still considered part of the backup process because IT is used to retaining historical backups for months or years, typically referring to this data as "corporate archives." Right now, many of you use these archives to meet regulatory requirements or satisfy electronic record-retention programs. The archiving process equates to saving old backups, usually on tape media.
The reasons for keeping archiving separate from backup may not always be evident, largely because the definitions of these terms are often misconstrued. The true definition of archiving was rarely understood or even questioned because it was usually considered something that occurred at the end of data protection operations. It became generally accepted that archives were just copies of historical data created from old backups. Only compliance issues or an electronic discovery request that required delving into the old data would create the need to use an archive. Because this older data was needed relatively infrequently, businesses rarely perceived archives as readily accessible information that could be leveraged in everyday activities.
Where to make the break
The combined process of backup and archiving served IT departments well up until the turn of the 21st century, but a change is needed. Record-retention regulations, such as HIPAA, require organizations to keep certain electronic business records, including e-mail, for specified periods of time. The increased scrutiny of corporate governance due to high-profile incidents involving Enron, ImClone and other major companies has compelled organizations to implement electronic records management programs to deter executive malfeasance, among other inappropriate activities.
As organizations were being required to retain more information, litigators and regulators began to target these formal data repositories, seeking a smoking gun for specific legal or regulatory matters under investigation. Recent changes to the Federal Rules of Civil Procedure (effective December 1, 2006) encourage organizations to produce more evidence in electronic format, creating a renewed urgency for organizations to keep data accessible in case a subpoena arrives.
Because backup and archiving were treated as one, IT departments probably didn't realize just how much of the same data they were copying repeatedly during the backup process. Backup solutions should be used for copying data from a primary storage system (or server) to a tertiary system. They enable IT to protect the primary copy of data from corruption, or to prevent data loss if the primary copy is accidentally or maliciously deleted. As backups get older, they become much more difficult to restore because the data is usually stored in larger data sets such as full monthly backups. Recovering large data sets is a slow and laborious process, so these backups should only be used in a worst-case scenario to recover data necessary to continue normal business operations. Worst-case scenarios are simply not appropriate for data retrieval for business sharing, electronic discovery and other scenarios.
Archived data needs to be searchable and accessible so that specific information and files can be found quickly. The key functions of archiving include setting parameters for how long the information should be retained, data permissions (who can access it, can it be altered, etc.) and where it should be kept. All of the functions create data attributes that can be indexed and used for search. The rich index created by archiving applications can find data quickly to respond to an electronic discovery or regulatory inquiry. An archiving application may also automatically identify duplicate content and store only the attributes of the copies, thus saving storage space.
When trying to rationalize splitting the archival and backup processes, organizations can take their lead from electronic discovery trends. On numerous occasions, regulators or litigators have turned up smoking guns within unmanaged, out-of-control backups. The Enterprise Strategy Group estimates that 46% of organizations have experienced an electronic discovery request in the past 12 months; the growing likelihood that a company will find itself on the receiving end of one of these actions has prompted IT departments and in-house counsel to find a better way to quickly locate relevant information. It costs approximately $2,000 to $3,000 to restore a backup tape and make it searchable, whereas an online archive has been indexed and is ready for attorneys to conduct keyword or other queries.
|The benefits of archiving|
Points of intersection
Backup and archiving processes can intersect at two specific points. First, IT should archive inactive data to free up capacity on primary storage and servers, and reduce the amount of data that needs to be backed up regularly from these systems. If the data being protected is old, unchanging or rarely accessed, but still needs to retained, there's no reason to keep the information on production servers and storage. That data can be archived and moved to lower cost storage where it will still be accessible. This takes the aged data out of recurring backup operations. Organizations can complete backups much faster and save money on tertiary media by archiving. The brute-force alternative is to simply delete old data from primary systems. However, this would put an organization at risk of being out of compliance with regulations and limit the opportunity to leverage the information for other business purposes.
The second point of intersection involves adding information archive systems to the backup schema for data protection purposes. Efficient archiving mandates that the data doesn't reside anywhere else (because it was moved from primary systems). As such, IT must back up the archive system as part of the backup schema so that archived data is also protected appropriately.
By segregating the archive process from backup, IT will have another process to manage with its own infrastructure and resources. However, it should be easier to rationalize an investment in archiving, especially when an organization is trying to reduce backup windows, comply with regulations and expedite the electronic discovery process. The potential benefits of taking one large, laborious process (backup) and splitting it into two (backup and archiving) should be apparent.