| is long-term archiving the Y2K problem for the 21st century? The Storage Networking Industry Association (SNIA) and others in the industry hope to bring attention to the archiving compatibility problem early in this century rather than at the end. The specific problem is how to make sure archived data will be readable down the road after format changes in hardware and software.
A SNIA survey of 267 organizations found that 80% have information they must keep for more than 50 years because of legal and regulatory rules; 68% must keep information for more than 100 years; and more than 40% keep email for at least 10 years.
SNIA formed a 100-Year Archive Task Force, and among things that can be done, according to task force member Michael Peterson of Strategic Research Corp., Santa Barbara, CA, is to put the term "archiving" on ice.
"We need to abandon the term 'archive' and replace it with retention and preservation," says Peterson. "The term archiving denotes a dungeon into which I put things and never look at them again. Thinking of archival as a long-term problem turns out to be wrong thinking because of the concept of legal compliance."
Peterson says ediscovery and legal requirements are what will get people's attention. He also says long-term compatibility problems can be solved by properly handling data as it is stored instead of waiting until retrieval.
Most of the technologies and products to do this already exist, says Peterson, including self-healing storage arrays; federated repositories that support tape, disk and optical media; and data dedupe and other migration methods. And standards are emerging for tools like eXtensible Access Method (XAM).
"It's not a technology problem; it's an operating practices problem," he says. "We call this process information-centric management. If you don't start the process, nothing will work on the back end."
IT consulting firm MindTree Ltd. has developed a set of best practices based on SNIA research. Rama Narayanaswamy, MindTree's VP who oversees its storage practice, breaks his recommendations into three areas: physical media, data and application levels.
His recommended best practices for media include storing all data on networked storage media because it makes it easier to read, manage and protect. He says all networked storage media can be uniquely identified using either a WWN (for FC) or a MAC address (for Ethernet) supplied by vendors, and all data can be migrated to new media.
Narayanaswamy recommends storing all data on block-based rather than file-based storage to ensure successful migration.
On the application layer, he says information should be segregated based on longevity. Long-living information should be stored in text format because that format is most likely to survive. Narayanaswamy says disk is a better medium than tape for long-term retention, but it's not without problems.
"Disk storage on a SAN makes it easier to migrate seamlessly," he says. "But let's say we're talking about Hitachi [Data Systems] or EMC. Twenty years from now, will they support that same type of migration? Standards have to be implemented now to make it work in the long run."