Economics rears its ugly head in a big way when dealing with legacy data. Most of the time all of that old data represents a cost sink rather than a profit center. You may need to keep it on hand in case the regulators or lawyers come calling, but you aren't going to be able to generate any more value from it. At the same time, the data may have been recorded on systems that are three or more generations back and you'd like to keep it in a form which is readable by your current systems.
Fortunately, there are a number of things you can do with that legacy data to reduce the cost of keeping it around in a readable format.
The first step is to figure out what you've got, how often you're going to need it and what form you're likely to need it in. "Legacy data" covers an enormous range of material, with an equally enormous range of value and accessibility requirements.
Some organizations, such as oil companies and scientific institutions, have large amounts, often multiple terabytes (TB), of data that may have been collected years, or even decades, ago -- and which is still frequently used.
Most enterprises will have data that must be preserved for regulatory or legal reasons and which will probably never be looked at again. However, some of that material, like archives of email messages, will need to be searched through quickly for specific message threads if it ever is needed. You've got to know what to do with it.
Data formats are another important consideration. You not only need to have the data on media you can read, you need to have it in a format your current systems can handle. It doesn't do any good to carefully transfer those old files onto new media if the files are formatted for an application you discarded years ago. You may have to convert the data, as well as translate the media.
The real question is how much of this data do you want to protect? Typically a lot of 'legacy' data, perhaps 80% of it or more, isn't needed. It makes sense to do some serious housekeeping before you do any conversion.
Many of the decisions on what to keep and what to discard can't be made by IT alone. They require input from the people who generated the data in the first place, as well as other departments such as legal and accounting.
After pruning, the data you're left with may have to be kept around forever. This introduces some considerations in storage. Cost per gigabyte isn't the only consideration in choosing a technology for storing old data. All existing media have a certain lifespan and preserving data permanently means rewriting it to media before that lifespan expires. True, that will typically be 10 years or more down the road, but you need to consider the cost of transferring the data when the time comes. It may make sense to choose a longer-lived medium, such as optical disk, even if it has a higher cost, to cut down on the expense of later storage transfers.
It also pays to think ahead, especially in the area of formats. For example, converting text-type data to XML will make it a lot more accessible and easier to manipulate in the future -- factors that may pay off. Similarly, it's obvious that you're probably going to want to convert EBDIC data to ASCII, but you might want to consider taking that a step further and putting text data, EBDIC or ASCII into Unicode format.
Even with careful pruning, legacy data can amount to several TB of information. In the case of large or complex data migration projects, it may be more cost-effective to outsource the conversion and associated services.
There are a number of companies which specialize in transferring data, including Disc Interchange Service Company (DISC), which has a number of brief articles on various aspects of file conversion available on its web site, and Appian Analytics.
Storing media under the proper conditions will significantly prolong its life. For most media, especially tape, the most important factors are temperature and humidity.
The other issue is making sure the media containing your legacy data is properly indexed and cataloged. Make sure all the media are properly labeled and you have a catalog showing where each tape or disk is stored. Then make sure it is actually kept in that place.
Do you know…
About the author: Rick Cook has been writing about mass storage since the days when the term meant an 80 K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last 20 years, he has been a freelance writer specializing in storage and other computer issues.
This was first published in August 2006