One issue that almost every storage manager faces on an ongoing basis is accommodating and storing an ever-expanding...
dataset. Because primary storage tends to be expensive and has a finite capacity, a majority of organizations move older data to an archive. This practice helps to free up space on an organization's primary storage in an effort to make room for new data.
On the surface, the concept of archiving data is simple. In practice, it often proves to be quite challenging. Careful planning is required before the first bits of data are ever moved. This article discusses some data archiving best practices.
Identify the data to be archived
The first step is to determine which data should be archived. As a general rule, this means archiving static data that hasn't been modified in a while, perhaps several months. Some organizations start this process by looking at the date on which the data was last accessed.
But there are a number of other considerations that must be taken into account, such as the data type. For example, you'll likely find you need one archive policy and method of archiving for file server data, but a completely different policy and archival method for SQL Server data. Unfortunately, there's no such thing as a universal archiving method that handles all data types in an equal manner. Moving file data is easy, but you usually can't get away with archiving an entire database table because an application likely requires the table. Instead, database archiving involves moving old data out of a table and into an archive database table.
Deletion policies plus data lifecycle management
Another data archiving best practice is evaluating your overall data lifecycle management. Suppose you decided to archive data that hasn't been modified or accessed in three years. That decision leads to a number of other questions related to the data lifecycle. For example, should all the data that meets the three-year criteria be archived, or can some types of data simply be deleted rather than archived? Likewise, will data remain in your archives forever or will it be purged at some point? You must have specific plans that address the exact circumstances under which data should be archived, as well as a plan for what will eventually happen to archived data. Many companies assume that having an archiving policy means they have a deletion policy; they eventually wind up wishing they had spelled out the specifics of deletion and when data will be archived.
Regulatory compliance also needs to be taken into account. Not every organization is subject to federal regulations surrounding data retention, but those that are can face severe penalties if they fail to properly retain required data. Multinational companies also need to be aware of varying regulatory policies.
Because administrators can be subject to both civil and criminal charges for failing to properly archive data, some archive far more data than is required by law and retain those archives forever.
The problem with this approach is that it can do more harm than good. Federal regulations require certain data to be retained so that it can be analyzed in the event an organization is accused of some wrongdoing. Many litigation experts who represent companies undergoing e-discovery requests say that preserving data beyond what is required by law can lead to trouble. For starters, it often means more money is spent sifting through more data. In addition, more data can mean more vulnerability.
Creating an archive policy for the ages
Once you have a clear idea about what data you want to archive, the next step is to develop a comprehensive archive policy. This is a formalized set of procedures dictating the rules for the archival process. The archive policy should contain things such as:
- The criteria for archiving data. There will likely be separate criteria for each data type.
- The mechanisms that will be used to facilitate the archival process.
- The type of media that will be used to store archived data.
- The duration for which data will remain in the archive. This can vary for each data type.
- Rules for who may access the archives, and under what circumstances.
Many companies assume that having an archiving policy means they have a deletion policy; they eventually wind up wishing they had spelled out the specifics of deletion and when data will be archived.
In an interesting case study, one organization mandated that only its IT director and HR director would be able to access its archives. While researching message archives software, the organization discovered an application that provided end users with access to archived messages. The software kept the messages secure and prevented users from modifying or erasing archived messages. Yet users were still able to view archived mail, and they could print or forward archived messages. Because this particular software allowed users continued access to old messages while keeping the messages secure, the decision was made to loosen the archive policy in a way that made it permissible for users to access their own archived messages.
Another important consideration is protecting the archive's integrity. This concept has two separate aspects. First, the archives must be protected against tampering. They must be secure enough that an end user can't make modifications to archived data as a way of covering up unethical behavior.
The other aspect is guarding archived data against loss. Imagine if an organization moved all of its data from 2005 to a tape-based archive and then the tape became demagnetized. In such a situation, all the archive data from that year would be gone. Organizations must protect against this type of data loss.
To protect your archives against data loss you should have multiple copies of the archived data. Some organizations create multiple copies of tape-based archives so that one tape can be stored on the premises while a duplicate tape resides safely off-site. Cloud storage gateways can provide similar functionality. A gateway appliance can store an on-premises copy of the archives, while also replicating the archives to the cloud. Likewise, applications exist that can check tapes for reliability and provide the ability to copy questionable ones.
When it comes to securing archive data, your approach depends on the level of access that users will need. At the very least, archived data needs to be encrypted at the storage volume level, and the data needs to be read-only (to prevent tampering). Many organizations store archived data on storage servers (or on cloud storage) isolated from the rest of the production network. This isolation provides another level of security.
Regardless of how you choose to store your archives, they should be protected by an auditing mechanism. Auditing can be used to alert you anytime someone accesses (or attempts to access) the archives. If your archives are ever called into question, your audit logs can help you prove the archived data is authentic and that the data hasn't been tampered with.
Archiving criteria: search, automation, flexibility
There are a number of data archiving products available on the market, ranging from backup applications with built-in archival capabilities to full-blown dedicated archive management applications. Regardless of the product you select, there are several key features you should look for.
Search is the first essential capability. The e-discovery process typically involves examining huge amounts of archived data. An efficient search engine can help to minimize your search times. The software's search engine should be flexible enough that it allows you to perform granular searches based on the following:
- Data type (Word documents, email and so on);
- Data sources (A good search engine should be able to perform searches across data platforms. For instance, a single search might contain results from Exchange, SharePoint and a file server.);
- Document author;
- Key pieces of data (bank account numbers, social security numbers and credit card numbers);
- Data that matches a specific data structure rather than a specific piece of data (i.e., any data containing a social security number, rather than a specific social security number).
Audit tracking is another important feature. For reasons related to litigation holds and e-discovery, an audit trail can tell you which custodian has accessed the archives, when they were accessed and what specific data was accessed.
You should also pick a data archive product that supports as many data platforms as possible. While there's no such thing as a universal archive product, there are archival products on the market that are designed to work with a number of popular applications and platforms. Some of these even include the ability to archive social networking data, such as the contents of an organization's Facebook page.
A good data deduplication engine is another essential feature. Archives, by their very nature, can grow to be quite large. Fortunately, almost every modern archiving product supports deduplication.
Your archival product should be flexible with regard to data sources and data targets. Just because an organization is archiving to tape today doesn't mean it will still be doing that tomorrow. A good archival product should allow you to write archives to disk, tape, the cloud or any other medium.
Finally, the archival software should provide automation capabilities. You don't want to manually move data into or out of archives. A good archival product should be easily adaptable to your archive policy. The automation process ensures data is always archived according to policy and that nothing slips between the cracks. The software should also create a detailed log of the archive process.
While the concept of archiving seldom-accessed data is simple, putting that concept into practice can be a big undertaking. Having a clear and well-documented plan can make the archiving process go much smoother.
About the author
Brien Posey is a Microsoft MVP with two decades of IT experience. Before becoming a freelance technical writer, Brien worked as a CIO for a national chain of hospitals and healthcare facilities. He has also served as a network administrator for some of the nation's largest insurance companies and for the Department of Defense at Fort Knox.