This article can also be found in the Premium Editorial Download "Storage magazine: Best storage products of the year 2002."
Download it now to read this article plus other related content.
While there aren't a large number of tools to assist specifically with data archiving, there are some worthy of consideration. Start with your current backup software. Some backup products - such as IBM's Tivoli Storage Manger - have specific features designed to handle archiving, including the ability to do the following:
- Attach a descriptive label to a group of archived files, browse by using this label and if desired; retrieve the entire group of files en masse;
- Designate dedicated storage pools for archiving;
- Easily define and assign retention policies for an archived file different from the backed up copy;
- Track archived volume location and expiration;
- Don't require deletion of archived files from primary storage.
One of the most problematic types of data to archive is that which is contained in a database. While it is relatively easy to archive files, how do you archive the records contained within a database file? The most common current practice is to retain a copy of the entire database. This has at least two disadvantages:
- The entire database needs to be restored to
retrieve the desired data. Imagine all of the problems associated with restoring your five-year-old Oracle database. Suffice it to say, I don't have enough space to list them here.
It doesn't address the need to prune databases of old information. One highly desirable outcome to database archiving is to slow the rate of growth of databases by removing unused or infrequently accessed records contained with the database tables.
Database growth is a problem that has a ripple effect throughout storage. As databases expand, they become slow and unwieldy, consume more disk space and are increasingly difficult to back up and restore. If there was a way to prune and store records from databases, it could have a significant impact in a storage infrastructure.
Another more effective approach to database archiving is by exporting data using SQL. This provides greater portability, is easier to retrieve and is readily available. In addition, there are tools available that improve on this process by making it more automated and manageable.
Another promising development in data archiving is the emergence of content management tools. Designed to work at the application level, these products are aware of the relationships and context of data within a specific application and can be used to store data in a readily retrievable form (see a sampling of these in "Some applications have tailored archive tools," this page).
Know your data
The key requirement for effective archiving is to develop an understanding of your data - or more accurately, the value of your data. A system of data classification leads to intelligent policy management with regard to primary storage (e.g., disk) and secondary storage (backup and archive).
Data classification isn't an easy undertaking for an organization. It requires business units and other application owners to make decisions about what's important and what isn't. Data classification complicates life for IT organizations - and especially storage administrators - by forcing them to consider tiers of offerings instead of the simple one-size-fits-all approach. Most challenging, perhaps, is that it forces diverse groups in an organization to communicate with one another.
Is it worth all this trouble? The alternative to data classification is what I refer to as the cross-your-fingers approach to storage management. With regard to archiving, this translates to a policy of "save everything, and hope that you never need to retrieve it." This may work with small quantities of data, but it's extremely costly in most organizations and can prove extremely risky as well. The result could essentially be the same as no long-term protection. The question you must answer is: How good are you at finding needles in haystacks?
This was first published in January 2003