Surprise! cheap disks cure slow backup


This article can also be found in the Premium Editorial Download "Storage magazine: Is storage virtualization ready for the masses?."

Download it now to read this article plus other related content.

Multiplexing lets you send multiple backup jobs to a single tape drive simultaneously. These multiple data streams are then multiplexed, or interleaved, onto the tape to supply the tape drive with enough throughput to keep it streaming. However, multiplexing hurts restore performance. Restoring one stream of data that has been interleaved with other streams of data doesn't allow the tape drive to stream, significantly slowing a restore.

Tape-to-tape copying takes too much time. Due to details beyond the scope of this article, copying from tape-to-tape isn't easy - it's difficult to keep two tape drives streaming simultaneously, when one is the source and the other is the destination for a copy. The result is that many people don't make off-site copies.

Requires Free Membership to View

Sampling of disk backup products
There are a number of companies that manufacture large, multiterabyte, SCSI, and Fibre Channel addressable, ATA/IDE-based arrays, and there is at least one software manufacturer making a product specifically designed to utilize these arrays for backups. The following is a sample of some of these products (in alphabetical order).

ALACRITUS SECURITUS I. This is a tape library virtualization product. That is, it takes one or more disk arrays and presents them to your backup server as one or more virtual tape libraries. This product is the first independent software product of its type.

BAKBONE NETVAULT. The makers of NetVault also have a tape library virtualization product built into their software. Similar to the Alacritus product, it allows you to treat disk arrays as virtual tape libraries.

LEGATO NETWORKER'S disk staging feature has been around for some time, but has not been widely used - perhaps due to the high cost of disk. However, with the advent of these new ATA-based disk arrays, this "old" feature is now in vogue.

NETWORK APPLIANCE NEARSTORE. Network Appliance recently released the 12TB R100, (9TB usable) the first appliance in the NearStore product line. It offers all of the functionality of a traditional Network Appliance filer at a fraction of the cost, according to the company. This is the first Network Appliance product to use ATA disk drives.

MAXSTOR MAXATTACH SVS. The MaxAttach product is called a storage virtualization solution, as it allows you to create a larger virtual filer using smaller, inexpensive arrays. They have recently announced that they are certified to work with BakBone's product.

QUANTUM/ATL DX30. Imagine over 4TB of ATA disk drives that fit into 2U (3 1/2 inches), and appears to your backup server as a tape library. That's the DX30. An interesting twist is that Quantum is using RAID3 to stripe the drives together. RAID3 performs extremely well when presented with large sequential I/O operations such as backup.

In addition to the above products, there are many disk vendors that are now producing ATA-based, Fibre Channel and SCSI addressable disk arrays. They are often available as JBOD, RAID or NAS. These vendors include, but are not limited to, 3ware, Atto, ExaDrive, LSI Logic, Nexsan, Raidzone, and Zzyzx.
Most people know that the number of tapes required for a restore is directly proportional to the length of time in between full backups. The longer you wait to perform a full backup, the more tapes you're going to need to perform a complete restore. Of course, the more tapes you need, the chance is greater that one of them will fail - and ruin the entire restore. This is why many people perform weekly full backups, even though the media costs are considerably higher.

Another common problem is the way incremental backups work with many backup software packages. An incremental backup of a large file system may run for over an hour, supplying only a few hundred megabytes of data. This, of course, makes it impossible to stream the tape drive. And even though dynamic drive sharing software - such as Legato's DDS or Veritas' SSO - allows the sharing of a tape drive between multiple servers, these programs don't permit writes to the same tape drive simultaneously from multiple servers.

As mentioned previously, a single bad tape can cause a large restore to fail. The more tapes your backup resides on, the chance is greater that a single tape will cause a restore to fail. And, of course, you never know if a tape has failed until you need it - probably one of the biggest disadvantages of tape over disk.

ATA/IDE disk arrays
Disks can solve the limitations of tape mentioned above, but server-class SCSI disk drives cost too much to use as a backup device in most environments.

However, someone realized that SCSI disk drives aren't the only game in town, and ATA/IDE disk arrays were born. Ranging from $8,000 to $10,000 per terabyte, ATA/IDE arrays (see "Sampling of disk backup products") are inexpensive when compared to other disk arrays - approximately one-third to one-fourth of their cost. When comparing these arrays to tape libraries, you must include the price of the tape library and its accompanying media. The robot is often the most expensive part of a tape library, and the more slots the robot can manage, the less it costs per terabyte. Consequently, you will find prices ranging from $10,000 per terabyte for smaller libraries, all the way down to $3,000 per terabyte for larger libraries.

On the software side, almost any backup package is capable of backing up to tape; however, some have better overall solutions if you are going to use disk. For example, Legato NetWorker's disk staging feature is nice, and BakBone, NetVault's tape virtualization feature - which uses disk like it's tape - works well, too.

What to do?
OK, here's the punch line: Suppose you need to store about 3TB in your on-site tape library, and you need room for about 1TB of off-site copies. Instead of buying a 4TB tape library, purchase a 3TB disk solution and a 1TB tape library. Make all your on-site backups to disk and leave them there. Just as you would make on-site backups to tape and allow the tapes to expire and be overwritten, you will do the same with the virtual tape library on disk. All on-site recoveries then come straight from disk. There are no tapes to swap, and no robots to be repaired - just a fast virtual tape library.

For off-site and archival purposes, copy each night's backups from the virtual tape library to the real tape library. Those tapes are then ejected and stored off-site for archival restores, to be used in case of a disaster that destroys your virtual tape library. This virtual tape library system has a number of advantages over a traditional library. It doesn't require a constant data stream. Disk drives can go as fast, or as slow as you need them to. The drives don't need to be multiplexed - since you don't need to stream them, you don't need to multiplex them.

These drives can be shared among servers. Some of these arrays let you create as many virtual tape drives as you could possibly need, allowing each backup client to be given their own virtual tape drive. Disk drives are quicker to copy from. Unlike copies from tape-to-tape, copying from disk-to-tape allows the tape drive to easily stream at its maximum throughput, since it's a local copy coming from a random access device.

There seems to be some controversy about whether disk-to-disk restore is really as fast as people think. During a restore operation, a disk will be faster than tape simply because the disk does not have to be loaded, fast-forwarded and skipped through. The load/fast forwarding accounts for 30 to 250 seconds per tape in a restore, depending on the drive type. The skipping through has to do with multiplexing. Since people tend to send several backups to one tape drive simultaneously, the restore of a single backup from that tape must read data, skip data, read data, skip data and so on. First, you probably won't need to multiplex to disk, so there will be no need to skip/read/skip. Second, even if you did, the skip/read/skip would be lightning fast because it's disk.

Disks don't require media management. There are no tapes to load. And, last but not least, you don't have to perform full backups as frequently. A virtual tape library might actually give you greater capacity than a similarly sized tape library. That's because performing full backups less often doesn't increase restore time, or decrease the integrity of your backup.

The Gartner Group says that backups are still the most expensive storage application. Maybe that's about to change. Imagine the time and money you could save if you only had to swap tape for off-site backups. The possibilities are limitless.

This was first published in June 2002

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: