This article can also be found in the Premium Editorial Download "Storage magazine: What you need to know about data storage provisioning."
Download it now to read this article plus other related content.
|Dos and Don'ts for creating a long-term archive|
Planning for the future
To ensure you'll have a reasonable chance to recover archived data 30 or more years from now, you must first identify the different requirements for data retention and then use those requirements to define policies (see "Dos and don'ts for creating a long-term archive," this page). Then decide what kind of data management application you'll need and start a test bed to validate rule sets and policies. You'll need to work out a way to pull old unstructured archives into the new management system, bringing all the old archives under the same system. Consider rewriting data formats and backup formats on a regular basis to avoid orphaned data that can no longer be read.
For the long term, data formats such as Adobe Systems Inc.'s PDF/A, Microsoft Corp.'s Microsoft Office Open XML file format and XML-based standards like OpenDoc should ensure that data continues to be readable. Developers are beginning to embrace these long shelf-life data formats, so there's a good chance the applications your company uses can be updated or enhanced to add these capabilities.
On the storage format side, it's critical to create meta data describing what files are being stored and how they were created. According to Forrester Research's Balaouras, the Storage Networking Industry Association (SNIA) will be instrumental in creating standards for both information lifecycle management (ILM) and the eXtensible Access Method (XAM) standard, which gives ILM applications a standard interface and meta data structure to communicate with object-based storage systems. Meta data stored with each object identifies the owner, the application that created the file, data format and so forth. The standard specifically addresses both long-term retention standards and data security.
Storage management products from companies like CA Inc. and IBM Corp./Tivoli can use the meta data associated with files to determine how long a piece of data is archived and what policies apply. This is increasingly important when administrators are faced with archiving millions of e-mails, as well as all of the other content created within an organization. There's no way a human can individually set policies for that much data.
There should also be policies in place for long-term storage of encryption keys. As more regulations designed to protect customer and proprietary data require some form of encryption, the practice will undoubtedly become more prevalent. While it's possible that data-recovery companies will be able to bypass current encryption standards 50 years from now, it might still be more expensive to re-create the data. Archiving the necessary keys as part of the overall archive process should prevent this problem.
This was first published in October 2006