This article can also be found in the Premium Editorial Download "Storage magazine: Tips for unifying storage management."
Download it now to read this article plus other related content.
|Saving duplicate data once|
Backups are performed inefficiently
Many companies do a regular full backup every week or month, which further aggravates the problems I've mentioned. This means more tapes are created (and hopefully copied), more data is sent across the network and more CPU resources are consumed on the backup client. Why, for example, make another copy of files that haven't changed, and are already securely ensconced on tape? The answer is because restores would take longer than they already do. Not performing full backups means that in order to restore a large file system, you need a full backup tape from months or years ago, followed by every incremental tape created since then that contains the last version of any file that was in the file system before it was damaged. That could add up to hundreds of tapes. So, full backups are performed every once in a while to overcome this problem. However, not every tape product does it this way.
A similar question is: Why make additional backup copies of files that have already been backed up from other systems? For example, how many times do you need to back up wordpad.exe or /bin/ksh? The answer is, "only once," but no one does it that way. The reason is that tape-based backup software packages don't perform what's called single instance store backups--storing only one copy of each unique file. If they did, can you imagine how many tapes you would need to restore a large file system? Because most backup systems have been tape-based for years, until recently no vendors have even thought of adding such features to their software, but that's beginning to change.
Now ask yourself another question: Why is the backup copy of the data in a different format than the original? The answer is simple. The commands cp filename.txt /dev/rmt/0cbn and copy myfile.txt tapetape.0 don't work. You can't copy files to a tape drive. Of course, in the Unix world, you could use something like this:
find . –print| while read i do dd if=$i of=/dev/rmt/0cbn bs=32k
That simple string writes every file on disk to tape in its native format, similar to the way ANSI tapes for the mainframe were made. But this is an inefficient use of tape. Every file would write a file mark on the tape, which would translate into extremely slow throughput. So, some bright Unix developer came up with the idea of putting several files into a backup file that could be written to tape and cpio, dump and tar were born. Now it's 30 years later, and we're stuck with this handed-down way of doing things.
While writing regular files into a backup file makes perfect sense for tape, it doesn't need to be done when writing to disk. In fact, it adds unnecessary overhead. Not only does it slow down the creation of the backup copy, it slows down the restore of individual files from that backup. If the files were left in their native format, these problems would be eliminated. You could restore as many files at a time as is needed; the only limitation is the disk throughput. In fact, if the files were left in their native format, you could even use the backup copy as the production copy in a crisis. This is exactly what happens with replication products.
So, what's changed?
In case you've missed the recent news flashes, ATA disk arrays have changed the face of storage forever. Terabytes of disk-based storage cost fewer than three cents per megabyte. Inexpensive ATA disk opens storage options that were never possible with higher-priced disk.
ATA disk arrays have created a market for several new software storage products that change the backup equation. Everything from disk-to-disk-to-tape systems to real-time protection systems have recently come onto the market. Some of these new products allow you to change how you protect your company's data.
I'm dividing the products mentioned in this article into two categories: "enhanced traditional backup" and "new ideas." Enhanced traditional backup products perform traditional backup and recovery in a somewhat non-traditional way. Either they simply enhance it with disk or they perform incremental backups forever--forgoing occasional full backups. "New ideas" products approach the backup problem in very different ways.
Traditional backup is the way backups have been done for years: an occasional full backup to tape, followed by daily incremental backups to tape. However, such a system is fraught with problems.
Most backup systems can be enhanced by simply placing disk storage in front of tape storage. These systems use tape storage as a way to create backups for off-site purposes, but completely forgo the creation of tape for on-site purposes. Products to use in this scenario come in two flavors: file system-based devices and virtual tape products.
A file system device is simply a large disk array with a file system; your backup software writes to this file system. Each backup creates a backup file in the file system. These backups can also be duplicated to tape. While any disk array can be used for this, it's most common to use ATA-based arrays for this purpose. Copan Systems, in Longmont, CO, and Nexsan Technologies, in Woodland Hills, CA, are companies to watch in this space.
A virtual tape system is a bit more complicated and interesting. Several companies have built disk arrays that pretend to be tape drives. This allows you to continue to do backups the way you're used to, but with all the advantages of disk. Of course, tape-based backups need to be converted to real tape for off-site storage. Vendors to watch here include Advanced Digital Information Corp. (ADIC), Alacritus Software, FalconStor Software and Quantum Corp.
The major benefit of these products is that restores are quicker and easier. Whether you're using a file system device or a virtual tape system, you can perform full backups less frequently, as it doesn't increase your recovery time the way it does if you did this with tape. Restoring from a three-month-old full backup and 90 days of incremental backups takes no longer than restoring from a full backup done yesterday--if all those backups are on disk.
Anyone who is familiar with IBM Corp.'s Tivoli Storage Manager (TSM) would read the first section of this article and say, "We don't perform regular full backups!" That's right, TSM users don't do that. And apparently Veritas Software Corp.'s NetBackup 5.0 users might not need to do it either.
The problem with never doing full backups is that you'll need hundreds of tapes to restore an individual system. TSM and NetBackup deal with this in two ways. The first is by creating new full backups from older full backups. Why go back to the original system and move a file across the network to create a new full, when you can simply move it from tape to tape? (See "Saving duplicate data once".) TSM calls this reclamation; NetBackup calls it synthetic full backups.
This was first published in February 2004