This article can also be found in the Premium Editorial Download "Storage magazine: Best storage products of the year 2002."
Download it now to read this article plus other related content.
One of the most challenging tasks facing storage managers is the development of a strategy for archiving data. Deciding what should be archived, when it should be archived and for how long goes to the core of the storage management process. You also have to understand the business value of data - perhaps more than you currently do. But when done properly, archiving can be a lifesaver to businesses requiring access to historic information for regulatory or audit purposes. Conversely, when it isn't done right, it can cost a company dearly in lost revenue, fines and other penalties.
To avoid these problems, you need a comprehensive strategy built around solid policies about data retention - something you'll need to develop with business managers. But there are also a host of factors directly under the control of storage managers: the tools you use, the formats you choose and the procedures to execute your strategy.
Archiving vs. backup
When many administrators hear the term archive, they think backup. That's where the trouble often begins.
"Sure, we do archiving," I was recently told by an IT manager. "Every quarter, we send full backups off for seven years," he stated confidently.
I asked him a few follow-up questions: How would he handle specific requests for three- or four-year-old data? What would the process be for retrieving it? This quickly left him feeling somewhat less confident.
One reason that the term "archive" is often misused is that many products that claim to do archiving provide different capabilities. At one end of the spectrum, there are a number of backup products that treat archiving as simply a backup followed by a deletion of the data from primary storage - a rather scary thought. This definition of archiving is really intended to assist in removal of old data cluttering up servers. A more effective approach to addressing this particular problem is through the use of storage resource management (SRM) or hierarchical storage management (HSM) tools.
So what is archiving, anyway?
A more useful definition of archiving is "the long-term storage of a point-in-time copy of information for a specific business purpose." This contrasts with backup in that backups are intended primarily to protect against short-term data loss, such as accidental deletion, device failure and data corruption.
Some strong candidates for archival data include periodic corporate financial information retained for auditing purposes, medical patient information retained for compliance with Health Insurance Portability and Accountability Act of 1996 (HIPAA) regulations, or data pertaining to clinical trials of a new drug wending its way through the FDA Drug Approval process.
The long-term nature of archived data presents a number of problems. Some may seem obvious, while others are less so. Here are some fundamental concerns:
- Can the media format be read? How many of you still have QIC tape drives in-house? How about 9-track tape? Today, we have various tape formats and numerous generational variants within a given format. Tape drives typically can't read media older than a generation or two. For long-term retention, some thought must be given to maintaining devices for long-term recovery, or migrating data to newer media. This is further complicated in some regulated industries, where migration can raise validation and authentication issues.
- Is the media still valid? The lifespan of magnetic tape media is dependent on a number of factors, but the bottom line is that if data is being maintained for a long time, steps must be taken to ensure long-term integrity. This includes maintaining proper environmental control, refreshing volumes as needed and similar tasks.
- Can the data be utilized after it's restored? This goes to the heart of the matter. The data must be in a somewhat portable format, and not dependent on a now obsolete version of an application or operating platform. Old data might be dependent on a version of an application, an operating system and even the architecture of the processor in use when the data was stored.
This was first published in January 2003