Pros and cons of VTLs

For disk-to-disk backup, virtual tape libraries (VTLs) treat disk as tape and offer many advantages compared to disk-as-disk backup targets. But VTLs aren't perfect, and there some caveats about the technology that you need to know before implementing a VTL.

Virtual tape libraries (VTLs), which treat disk as tape, offer two main advantages over disk-as-disk backup targets:...

ease of management and better performance. As described in Part 1 of this three-part series on the various ways to use disk to protect data (see D2D backup: Disk's dual role), a disk-as-disk target requires all of the usual provisioning steps of standard shared storage arrays. In contrast, if you tell a VTL how many virtual tape drives and virtual cartridges it should emulate, the VTL software automatically handles all of the provisioning and allocates the appropriate amount of disk to each virtual cartridge.

If the VTL needs to be expanded (not all VTLs are expandable), you simply connect the additional storage, tell the VTL it's there and the VTL will automatically begin using the new storage. There's no volume manager to run and no RAID groups to administer.

Another important management advantage of VTLs is how easy it is to share VTLs among multiple servers and applications (see Should a virtual tape library be shared). To share a VTL among multiple backup servers running the same software, use the built-in library sharing capability that most commercial backup products have. To share a VTL among multiple servers running different apps, partition the VTL into multiple smaller VTLs, assign a certain number of virtual cartridges to each VTL and associate each VTL with a different backup server. Both of these scenarios are much easier than what's required to share a disk-as-disk target among multiple backup servers.

Better performance

To understand the performance advantages of VTLs, think of how backup apps write data to tape. A backup application typically continues writing to a tape until it hits the physical end of tape (PEOT). It will append to a tape, even if some of the previously written data has expired. Once the backup app hits PEOT, the tape is considered full. Most backup applications leave everything on the tape until all of the backups on that tape have expired; then they expire the whole tape and write to it from the beginning. Other backup applications wait until a certain percentage of the backups on a tape have expired before "reclaiming" that tape by migrating the non-expired backups to a second tape. The first tape is then expired and ready to be overwritten. The bottom line is that portions of a tape can't be overwritten.

This differs from how backup applications write to a file system. The application tells the operating system that it wants to write to a certain file name and then begins writing data to that file. Each backup gets its own file and when that file expires, it's deleted. The backup application has no knowledge of how this data is actually written to disk. Underneath the covers, the bytes of any given file are fragmented all over the disk, which results in performance degradation of the backup.

Because a VTL treats disk like tape, it eliminates fragmentation by writing backups to contiguous sections of disk. The blocks allocated to a tape stay allocated to that tape until the backup app starts overwriting that tape, at which point the VTL can once again write to contiguous sections of disk--just like data is written to tape. Because VTL vendors control the RAID volumes, they ensure that a given RAID group is only written to by a single virtual tape. A disk can perform much better if it's only writing/reading for a single application using contiguous sections of disk. This key difference explains why the fastest file systems write in hundreds of megabytes per second, while the fastest VTLs write in thousands of megabytes per second.

VTLs offer other advantages, as well. With one exception (see the next section), VTLs work with all existing backup software, processes and procedures (see NetBackup's inline tape copy, Do IBM Tivoli Storage Manager users need a VTL? and EMC/Legato's NetWorker understands disk, too). In other words, everything works exactly as it would with a physical tape library (PTL). That isn't the case with disk-as-disk targets, where backup software can behave quite differently.

VTL disadvantages

The disadvantage of VTLs most cited by storage admins is cost. They believe that if a disk array costs x, a disk array made to look like a VTL will cost x + y. But the y factor can vary from one VTL vendor to another. Most VTLs use capacity-based pricing, which means the cost is $x/GB. At least one VTL vendor uses throughput-based pricing, so the price is determined by the number of Fibre Channel (FC) connections. The actual price of VTLs with disk included ranges from less than $4/GB to a little more than $12/GB. Disk-as-disk units fall into roughly the same price range, so it's basically a misconception that a VTL will always cost more than a disk-as-disk device.

Another issue is the price of backup software licensing. If a VTL sits next to an existing tape library, it will most likely require an additional tape library license for a library that's actually not there. This adds to the price of the VTL. How much you pay is based on how the VTL is configured and how your backup software charges for libraries. Some backup software products have a single license for all tape libraries, while others charge for the number of slots or drives. When deciding how to configure your VTL, you should consider how your backup software charges for libraries. When comparing VTLs to disk-as-disk targets, you also need to remember that backup software products are beginning to charge to back up to disk-as-disk targets.

However, these licensing challenges will probably go away as backup software vendors move toward capacity-based pricing in an effort to appear more VTL friendly.

Should a virtual tape library be shared?

Partitioning makes it possible to share a virtual tape library (VTL) among backup servers running the same application; however, this can increase costs if your backup software charges by the drive. For example, assume you have seven servers, each of which needs 10 tape drives once a week for their full backup. You could create 10 virtual tape drives and share them, or you could create 70 virtual drives and give each server the 10 tape drives it needs. Unlike a physical tape library, a VTL can do this with no problem. But if your backup software charges by the drive, that's a 700% increase in tape drive costs.

NetBackup's inline tape copy

Veritas NetBackup supports a feature called inline tape copy, which allows sending a backup to two tape drives simultaneously--creating an original and copy in one step. An alternative is to use a standalone VTL, and to send one copy to physical tape and one to the virtual tape library. The shortcoming with this approach is that it causes the VTL to run at the speed of the tape drive--defeating the purpose of going to disk backup in the first place. A more interesting approach would be to use an integrated VTL, send both backups to virtual tape, and then use the integrated VTL to copy one to physical tape.

Do IBM Tivoli Storage Manager users need a VTL?

While IBM's Tivoli Storage Manager (TSM) backs up directly to disk quite well, TSM administrators will experience provisioning and fragmentation issues if they begin storing all onsite backups on disk. (Most TSM disk storage pools aren't fragmented because they're immediately migrated to tape every night.) So, the advantages of virtual tape libraries (VTLs) apply to TSM as much as they apply to other backup products. In addition, a VTL would let TSM users create thousands of small virtual tapes, allowing them to turn on collocation for all clients without the usual penalty of hundreds of partially used tapes. It would also allow users to have dozens of virtual tape drives to perform reclamation at any time without causing contention for tape resources.

EMC/Legato's NetWorker understands disk, too

In Version 7, EMC/Legato's NetWorker introduced support for simultaneous reads and writes to a file type device. EMC/Legato realized that a disk can obviously read and write at the same time, so they simply needed to allow the application to do that. This allows for some interesting activities, like initiating cloning before a backup is complete.

VTLs offering compression use in-band software compression that saves space, but results in a significant performance hit--as much as 50%. If your backup speed is throttled by the speed of your clients and/or network, you may not see this performance hit. But in local or LAN-free backups, speed tends to be most affected by the backup device. Some vendors perform their compression after the fact, attempting to give you the benefits of compression without the performance loss. As of this writing, only Quantum Corp. supports hardware compression that doesn't impact performance. It accomplishes this by using the same chip used in front of its tape drives.

Ejecting virtual tapes

How you eject virtual tapes will determine whether you require a standalone (see Standalone virtual tape library) or integrated (see Integrated virtual tape library) VTL. As discussed previously, a major advantage of VTLs is that they don't require any changes to your existing backup process or configuration. The one exception is if you don't copy your backup tapes and send the copies offsite. Although it isn't a best practice to do so, many environments eject their original tapes and send them offsite. This works fine with a PTL but, as of this writing, only one VTL (Spectra Logic) supports the ejection of virtual tapes. Therefore, companies that eject their original tapes and wish to use a VTL must usually do one of two things: learn how to copy tape or use an integrated VTL. The approach that's best for your environment will be based on individual preference.

Some observers believe the tape-to-tape copy method with standalone VTLs is the only proper way to create physical tapes from virtual tapes. (Standalone VTLs include those from Diligent Technologies Corp., Quantum and Sepaton Inc.) The tape-to-tape copy method allows the backup software to control the copy process, therefore integrating the copy process into normal reporting procedures. However, there are two challenges. The first is the difficulty related to automating this process. Some backup products require the purchase of an additional license, and some need a custom script for this process.

The second challenge is that many environments don't have enough time and resources to copy their backup tapes quickly enough. For many companies, it's all they can do to get their backups done in time to be picked up by Iron Mountain. If you know how to copy your backup tapes, and have sufficient resources to do so, this won't be an issue.

If the challenge of copying virtual tapes to physical tapes is a concern, you should consider an integrated VTL, such as those offered by Advanced Digital Information Corp. (ADIC), Alacritus Software, EMC Corp., FalconStor Software, Maxxan Systems Inc., Neartek Inc. and Spectra Logic.

An integrated VTL sits between your backup server and PTL. It inventories the PTL and represents its contents as virtual tapes in the VTL. For example, if you have physical tape X01007 in your PTL, virtual tape X01007 will appear in your VTL. Your backup software will then back up to virtual tape X01007. At some user-configurable point, virtual tape X01007 is copied to physical tape X01007. When the backup software tells the VTL to eject virtual tape X01007, physical tape X01007 appears in the PTL's mail slot. An important point is that physical tape X01007 looks just like it would if the backup software had backed up to it directly. The backup software thinks it backed up to and ejected physical tape X01007 and, in the end, that's what it did.

Bar-code matching maintains the consistency between the backup software's media manager and the physical tapes. But you need to remember that this method doesn't result in two copies of the tape. The virtual copy of the tape is deleted when the physical copy is successfully created.

There are, however, some issues with this method. For example, what happens when the copy from the virtual tape to the physical tape fails? If the copy failed because the actual tape is bad, you'll need to remove the tape, swap its bar code to a new tape, put the new tape in the PTL and tell the VTL to try the copy again. (This will only work if your bar codes are removable.) If this happens occasionally, it's not a major disadvantage. But if it happens every day, it becomes disruptive. You also need to realize that this process is happening without the knowledge of the backup software, so if something happens with a tape copy, the VTL will need to notify you of the problem. This results in another reporting interface, which might be considered a disadvantage. Another potential problem arises if the VTL puts more data on the virtual tape than can fit on the physical tape, preventing creation of a physical copy of the tape. Integrated VTL vendors ensure that this doesn't happen by stopping before the normal PEOT. However, standalone vendors might say this practice increases the number of tapes to purchase and handle, and adds to your costs.

Important VTL features

There are a number of differences among the major VTLs. Some (Alacritus, Diligent, FalconStor) are software only, so you can buy the software and run it on a regular disk array. Other VTL vendors (Maxxan, Neartek) sell a VTL head, which is analogous to a filer head. You use their software and head, but supply your own disk. Finally, some VTL vendors (ADIC, EMC, Quantum, Sepaton and Spectra Logic) offer an entire solution: software, head and disk. Software-only and filer head vendors allow you to redeploy an existing array, reducing your cost. Turnkey products cost more, but have the fewest integration issues.

Most VTLs offer replication or cascading, which replicates one VTL's backups to another VTL. But the tapes in the second VTL won't be considered duplicates by your backup software because they'll have the same bar codes as the original tapes. Also, remember that you'll probably be replicating the entire backup, and most backups aren't block level. Even incremental backups take up roughly 1% to 5% of the amount of data being backed up. This means you'll need to replicate 1% to 5% of your data center every night--a significant undertaking for many environments. Therefore, it may only be possible to use this feature within a campus, as opposed to including data from remote sites.

Some VTL vendors are beginning to offer a feature where their VTLs will examine the incremental backup, identify the changed blocks within that backup and replicate only the changed blocks. When that functionality becomes more widely available, replication between data centers will be much easier to accomplish. Today, incremental backup is offered by Alacritus- and FalconStor-based VTLs.

If you have a heterogeneous environment with mainframe, AS/400 and open systems, you might consider a VTL that supports all three environments. Only Neartek currently offers this functionality.

A few integrated VTLs (FalconStor and Neartek) offer a feature called stacking. Stacking copies multiple virtual tapes onto one physical tape, a feature borrowed from mainframe virtual tape systems (VTS). Stacking was important to mainframes because applications were unable to append to a tape. The VTS would present hundreds of small virtual tapes to the app and then stack those virtual tapes onto one physical tape, significantly cutting media costs.

However, the value of stacking in most open-systems environments is questionable because any decent backup product can append to a tape until it's full. But you should be aware that the use of stacking breaks the relationship between the backup software's media manager and the physical tape. And products that support stacking must read the entire stacked tape to read just one of the virtual tapes included on that tape. This feature is only useful if you gain a benefit akin to that achieved in the mainframe environment.

You also need to think about which type of notification the VTL supports, especially if you're considering an integrated VTL. Some support SNMP traps, a few support e-mail notification, while others require you to log into a Web page to be notified of any issues.

If high-end performance is important, you should look for a VTL with a multiple data-mover architecture. Most VTLs run all software on one VTL head. Some vendors use the VTL head as a control mechanism, while passing the movement of the data on to one or more data movers. Need more performance? Simply purchase more data movers. This allows scaling to a much higher level without having to add and administer another VTL (Alacritus, Neartek and Sepaton use this approach).

Finally, remember that VTLs don't perform at the same level, so it's important to conduct performance testing in your environment.

Dig Deeper on Storage for virtual environments