This article can also be found in the Premium Editorial Download "Storage magazine: Upgrade path bumpy for major backup app."
Download it now to read this article plus other related content.
|Which is best for archiving: Disk or tape?|
An archive system encounters the same issues as a backup system if tape is used as its primary storage medium. One solution might be to use content-addressed storage (CAS) as the primary storage device for archives. If the product supports a standard file-system interface, such as NFS or CIFS, as well as single-instance storage and delta-block technologies, it could solve a number of problems.
First, a disk product using single-instance storage and delta-block incremental technologies will be less expensive to operate than a tape-based system because you can't apply delta-block technologies to tape-based systems. Second, if the CAS device supports a file-system interface, then migrating between storage systems should be relatively simple. With a tape-based system, you have to copy all data from the old tape format to the new tape format. With a file-system-based system, you simply copy data from the older device to the newer device.
Finally, you could potentially solve the format issue. If archive products can support the discovery of existing CAS systems, you could theoretically switch archive products with no ill effects. The raw data would still be accessible via the file-system interface, and the meta data could be imported--or the new archive system could grab the meta data from the CAS device. Your mileage will definitely vary, but solutions are available.
Other backup bugaboos
Backups are also an extremely inefficient way to store archives. While an archive system will make sure it has only one or two copies of a particular version of a file, a backup system usually has no such logic. If a company is using weekly full backups as archives (or creating "archives" with its backup product but not deleting the original files), and storing its archives for seven years, it will have 364 copies of many of its files stored on tape--even if those files never changed. This leads to an incredible amount of media waste.
Another strike against using backups as archives is the number of times a company changes backup formats and tape formats over the years. Almost every company using backups as its archives has a number of older tape and backup formats it must continue to support for archive purposes. While older tape formats can be converted with a lot of copying, converting older backup formats is another story. Most people choose to hold onto both old tape formats and old backup formats, and hope they never have to read them.
The most important feature of an archiving system is that it contain enough meta data to allow information to be retrieved in logical ways. For example, meta data can include the author or business unit that created an item. (An item can be any piece of archived information, such as a file, a record from a database or an e-mail.) Meta data might also contain the project the item is attached to or some other logical grouping. An e-mail archive system would include who sent and received an e-mail, the subject of the e-mail and other appropriate meta data. Finally, an archive system may import the full text of the item into its database, allowing for full-text searches against the archive. This can be useful, especially if multiple formats can be supported. It's particularly expedient to be able to do a full text search against all e-mails, Word documents, PDF files, etc.
Another important feature of archive systems is their ability to store a predetermined number of copies of an archived item. A company can then decide how many copies to keep. For example, if a firm is storing its archives on a RAID-protected system, it may choose to have one copy on disk and another on a removable medium such as optical or tape.
This was first published in September 2006