This article can also be found in the Premium Editorial Download "Storage magazine: What you need to know about data storage provisioning."
Download it now to read this article plus other related content.
Text documents formatted using XML, OpenDoc or RTF standard formats should be readable in the long term, and even Micro-soft Word and Excel documents will probably be readable for quite a long time. Microsoft Corp. recently announced it will support the Open Document Format (ODF) standard via a translator that will convert documents from the proprietary Open XML format that Microsoft prefers.
Text documents are relatively easy to read, even if you have to do some conversion or drop out formatting. Graphics, on the other hand, are less simple. Even open standards like TIFF and JPEG have many varieties, not all of which can be read by any given program. Unfortunately, open-graphics format standards like Portable Network Graphics (PNG) aren't yet widely supported by many commercial applications. Adobe Systems Inc.'s PDF/A is supposed to address this issue, but given that many older PDFs can crash the Adobe 7.0 reader, this is not yet a perfect solution.
Time is not on your side
According to Stephanie Balaouras, senior analyst in the computing systems research group at Forrester Research, Cambridge, MA, the first issue in creating an archiving plan is to define what data needs to be archived and why, whether it's for legal discovery, regulatory compliance or business requirements. Because discovery, compliance and business requirements may have extremely different or even conflicting criteria for data retention, the issue can become very complex. To manage the archiving process, large organizations are beginning to create specialized archiving positions with titles such as archiving officer or digital-preservation officer.
Long-term archiving can be very expensive. In addition to data, you must build what amounts to a museum of old tape drives, associated parts and software to restore the old tapes. Of course, you don't necessarily have to hoard old archival equipment; there are service companies that specialize in data recovery, but it's an expensive proposition and there's no guarantee they'll actually have the equipment your media requires. The process becomes even more complex when there's a requirement to keep data in a read-only format for its required retention period.
Databases offer a special case for archiving. Unlike relatively small flat files, databases can range in size (up to many terabytes), change often and, in some cases, should be archived in a way that documents every change to each record over time. Additionally, restoring a single record to previous states usually depends on a proprietary scheme unique to each database vendor. Because enterprises often have multiple databases on different software platforms, it's very difficult to develop a unified retention strategy.
Overlapping retention requirements is another large issue. Even within a single archiving application that's used to create retention policies, the process of retaining read-only copies of data for a period and then purging the data when it expires is complicated by the need to satisfy retention policies for different standards. Consider, for example, information on a single customer that's maintained across several files and in two databases. Three different data-retention policies require different pieces of the files and database records to be retained for a given time and then purged. One of the standards requires data to be retained for seven years and then purged, another has a 12-year period and the third requires retaining the data for 70 years. How do you keep track of which regulation has precedence, and which pieces of each file or database record apply to each standard? (See "Long-term formats," this page).
This was first published in October 2006