Virtual tape libraries in depth

Virtual tape libraries (VTLs) have been a relatively easy way to replace traditional tape libraries, but as other disk backup targets emerged, many thought VTLs would disappear. Now, with added features such as dedupe, they can be an attractive alternative to other disk target systems.

For disk-based backup, VTLs have been a relatively easy way to replace traditional tape libraries. With added features such as deduplication, they can be an attractive alternative to other disk target systems.

Virtual tape libraries (VTLs) are dead, right? Weren't they supposed to be temporary solutions that would be long forgotten once everyone started backing up to "real" disk? While that might be what the VTL naysayers had in mind, we're more than a few years into the VTL "fad" and many of the products are doing just fine.

What happened was that an industry segment morphed to encompass both VTLs and intelligent disk targets (IDTs), a segment that was ultimately validated when EMC Corp. acquired Data Domain for $2.4 billion. We'll review some of the factors that led to the development of VTLs, the current state of VTL technologies and products (including the newer features they now offer), and then we'll end with a look into the future of VTLs and IDTs.

Why VTLs came about

The VTL/IDT market has become so overshadowed by the data deduplication craze that some people may have forgotten why the industry developed VTLs in the first place.

Tape was (and is) too fast. The core problem vendors were trying to solve with virtual tape libraries is the mismatch between the speed of tape and the speed of the disk drives, file systems and databases they're backing up. In approximately 15 years, the sustained throughput of open system disk drives has gone from approximately 4 MBps to 70 MBps -- an increase of 1,700%. In roughly the same amount of time, the sustained throughput of open system tape drives has grown from 256 KBps (Exabyte EXB-8200 drive) to 180 MBps (LTO-4) -- an increase of 70,000%.

VTLs made the unfamiliar familiar. For many backup administrators and their backup software, backing up to disk was a foreign concept. Knowing that progress in backup systems is an incremental process, VTL vendors felt they could take the unfamiliar (disk) and make it seem like an old friend (tape).

Scalable. Demand for VTLs has been driven by the needs of large enterprise customers. With a tape library, they had hundreds or thousands of tapes and dozens or hundreds of tape drives, and they could just throw all their backups at this big tape library and it would sort it out. To use disk, however, they would need to manage and load balance their backups across dozens to hundreds of discreet disk systems.

The VTL solved this problem by presenting disk as large tape libraries, something they were already familiar with. In various ways, VTL vendors made dozens of individual disk arrays look like one or more tape libraries that could scale to almost limitless levels.

Shareable. Because backup software already knew how to share tape libraries, they could easily share VTLs. Instead of using extra-cost sharing software (such as Symantec Corp.'s Veritas NetBackup Shared Storage Option or EMC Corp.'s NetWorker Dynamic Drive Sharing Option), you could create as many "tape drives" as you needed to give each backup server its own tape drives, while dynamically sharing the VTL. And if you have multiple backup applications that refuse to share, a VTL can be carved into separate virtual libraries.

Fragmentation issues with file system devices. VTLs also avoid the fragmentation issues associated with backing up to file systems. They solved this problem using proprietary file systems that wrote data contiguously.

State of the VTL industry

VTLs came about to address specific backup issues. Let's look at how they've progressed in those areas they were supposed to fix.

Scalability. Scalability isn't just an issue for big enterprises; it's also necessary to meet the needs of small- and medium-sized businesses (SMBs). When the VTL market was in its early days, there were very few products that could scale well for either of these segments. But times have changed, and there are now several products that scale both up and down. With some notable exceptions -- Copan Systems Inc., IBM Corp., NEC Corp. of America and Sepaton Inc. -- all VTL/IDT vendors offer products for SMBs. Companies with less than 20 TB of data to back up each night can choose from a number of products -- some less than $5,000 -- that offer a lot of the same functionality available in high-end products. Offering products to the SMB market before they're deemed bulletproof typically spells failure, so the arrival of these SMB virtual tape libraries and intelligent disk targets is a sign that vendors have done a good job of working out any kinks in their products.

Midsized enterprises with 20 TB to 40 TB to back up each night can choose from almost every vendor. To back up that kind of data you need a system capable of handling 500 MBps to 1,000 MBps. Almost every vendor listed in the "Product sampler: VTLs and IDTs" (below) sidebar has a product with that capability.

Click here to view a PDF of "Product sampler: VTLs and IDTs."

The high end of the enterprise (companies with 40 TB or more to back up every night) has fewer products to choose from. Users with that much data to back up connect large servers to a Fibre Channel storage-area network (FC SAN) and back them up using local-area network (LAN)-free backups. The last thing those users want to do is send those backups over IP; therefore, a product targeting this market segment must have FC as a transport.

Another reason why there are only a few products appropriate for this market is the lack of global data deduplication in some products. A user with 100 TB to back up each night needs 2,300 MBps throughput. They won't want to (nor should they have to) create and maintain three separate 33 TB backup collections that'll back up to three devices that can only handle 40 TB per night each. They need a single system that can handle this load over FC without splitting it into multiple backup collections. There are only a few companies with products capable of doing that: FalconStor Software Inc. and Sepaton (and their respective OEM partners Copan Systems and Sun Microsystems Inc., and Hewlett-Packard [HP] Co.). The aggregate throughput of NEC's Hydrastor is actually much higher than 2,300 MBps, but it doesn't yet offer Fibre Channel as a transport. If you need this kind of throughput over FC, but don't need deduplication, EMC, Fujitsu and Gresham Storage Solutions Inc. have products that can help.

Noticeably absent from the list is EMC/Data Domain. Their fastest FC-based VTL runs at 900 MBps. Data Domain's DDX "array" boasts a number much higher than that, but it's actually 16 separate DDR units in the same rack that aren't integrated as far as deduplication goes. Data Domain doesn't support global deduplication, although the company has said it's on its roadmap. However, there's been no indication as to when this feature may become available.

Ease of use. VTL and IDT products range from the "ridiculously easy" to use to "so hard you can't believe it passed any kind of functionality testing." But most are relatively easy to use. Still, ease of use varies considerably, so you should definitely test with any products you're considering.

Integration with backup appliances. All VTLs and IDTs can be backup targets for just about any backup software product on the planet, and most can also replicate their data to another VTL/IDT. But few products today integrate with the backup software so that it knows about replicated copies and can use them for restores and copies to tape.

Symantec's NetBackup OpenStorage (OST) API offers one solution to this problem. With this API, the disk target isn't addressed as a virtual tape or a file system; the backup job is named and passed to the target, and the target stores it however it wants to. Once the backup is stored on the target, NetBackup can tell the IDT to replicate the data; when the replication is done, the IDT tells NetBackup. So, NetBackup is aware of the replicated data and the replication process, and can use it to create a tape copy. The process yields an onsite copy, an offsite disk copy and an offsite tape copy without anyone ever touching a tape. Today, only Data Domain, FalconStor and Quantum Corp. support this API -- and only FalconStor supports it via Fibre Channel; Data Domain and Quantum use IP as their transport.

CommVault Systems Inc. has a similar feature that works with network-attached storage (NAS)-based IDTs (but not VTLs). A media agent watches a directory that you're replicating to and looks for changes. It communicates with the CommServe (the main backup server) and tells it about the other copy, resulting in both copies being available for restores. If this other media agent were located offsite, you could then use that replicated copy to create an offsite tape copy of your replicated backup.

HP also offers this capability for its Data Protector software and the HP Virtual Library System (VLS). The product is similar to CommVault's, except it uses a completely separate Data Protector backup server (with its own catalog) to watch for newly replicated virtual tapes. Once those tapes are detected, it asks the other Data Protector server for its catalog information. Both servers can then use those virtual tapes, which would allow creation of a tape copy of the replicated backup.

Software-based VTLs vs. VTL appliances. Because all appliances are just servers running software, the difference between a software VTL and an appliance is more a matter of packaging than a technical issue. It comes down to preferences: prepackaged or build your own. Most VTLs and IDTs are prepackaged, but there are some exceptions, such as the software-only versions of FalconStor's and Gresham's products.

You may also opt to buy a virtual tape library/intelligent disk target with its disk already attached or choose to add your own. In the latter case, options include software-only products or gateway products such as those offered by Data Domain and IBM.

Interoperability with tape libraries. A VTL may provide a direct connection to and integration with a physical tape library. The appeal of this feature has diminished with the increased interest in data deduplication. VTL-tape library integration made it easier to stage data from disk to tape to save space on expensive disk. But with deduplication, there's less need to do this. Products that integrate with physical tape are available from FalconStor, Fujitsu, Gresham, HP and Quantum.

IDTs vs. VTLs. Whether you should back up to a file system device or a virtual tape library truly boils down to personal preference. If you want FC as a transport, your choice is easy; if you want a scalable, deduplicated system, only VTLs offer that today.

File system-based devices have two advantages over virtual tape libraries: what happens when your backup software expires a backup, and simultaneous read and write support.

When an IDT deletes a file, it automatically reclaims the space. But a VTL has no idea that the tape it's holding has expired. A workaround is to manually re-label tapes when they expire. When the VTL sees a new label being written to the tape, it knows it can throw away the rest of the data on that tape.

File system devices support simultaneous read and write, but VTLs don't. If a backup is writing to one virtual tape, another process can't read that tape to do a restore or copy. But this only happens if you're backing up and restoring/copying at the same time -- probably a rare occurrence that can be made even less likely by using smaller virtual tapes.

Why dedupe and FC disk don't mix

Fibre Channel (FC) is essential to the enterprise and data deduplication is important as well. But the only way to get both in a backup appliance is to buy a virtual tape library (VTL). Why don't they just make a deduplicated logical unit number (LUN) that's accessible via FC? The short answer is that it's a lot harder than it sounds. Giving you a LUN allows you to pick your own file system, which the appliance would then need to support. Windows, Linux, Solaris, HP-UX, AIX, MacOS, etc., all have their own completely incompatible file systems. The IDT vendor would have to test deciphering all of the various backup formats on all of the various file systems as well. Think of that test matrix.

But GreenBytes Inc. has gotten close. It's about to offer an iSCSI deduplicated LUN with its GB-X Series of storage appliances.

New features of VTLs and IDTs

Virtual tape libraries and intelligent disk targets continue to evolve; here are some of the areas where these products are developing.

Data deduplication. The biggest game-changing feature has been deduplication. It changes a VTL from a disk staging device with only a few days of backups (due to the cost of disk) to a device that can affordably hold all onsite backups. And dedupe built the IDT market; without dedupe, an intelligent disk target is truly just

Deduplication can reduce backup size by 10:1 or 20:1 without significantly affecting the performance of restores and copies from disk to tape. But not all data dedupes well. Applications such as imaging, audio, video or seismic processing systems generate new data every time they run, so there's little detectable duplication. Dedupe systems also use compression, but not all data compresses well either.

There are other significant differences among target dedupe systems (VTLs/IDTs). The IBM ProtecTIER product, for example, has a single-stream restore speed limitation of approximately 90 MBps. Although Quantum has made significant progress with restore speed, the restore speeds from their "block pool" (i.e., deduped data) are still nowhere near those possible when restoring from the last few backups stored in native format. Sepaton's dedupe system is backup product-specific, and the firm has yet to release support for CA ARCserve Backup, CommVault Simpana, EMC NetWorker and Symantec Backup Exec, among others. And the lack of global deduplication from some of the major vendors (e.g., Data Domain, NetApp and Quantum) means that users must continue to slice their backups into chunks that are manageable by a single appliance.

Deduplicated replication. Deduplication also makes replication much more affordable and feasible. Without dedupe, you might need 10 times to 100 times more bandwidth to replicate a full backup. With dedupe, a typical full backup only stores and replicates 1% to 10% of its native size.

Tape consolidation and virtualization. Some vendors, notably Fujitsu and Gresham, tend to use the term tape virtualization rather than VTL. They see tape virtualization as a way to enhance your continued use of tape while removing many of tape's limitations, especially if you want to use tape as a long-term storage device. If you store data on tape for multiple years, you're supposed to occasionally "retension" your media and move backups around to keep all the bits fresh. Updating your tape technology is another issue: What do you do with the old tapes and drives?

A tape virtualization system solves these issues by employing what's often referred to as a hierarchical storage management (HSM) system for tape. Newer backups are stored on disk; older backups are stored on tape. When you buy new tape drives and bigger tapes, you simply tell the tape virtualization system that you want to retire the older tapes and they're migrated to the newer, bigger tapes by stacking the smaller tapes onto the larger tapes and keeping track of which "tapes" are stored on which tapes. If the backup application requests a bar code that's been stacked onto a bigger tape, the system loads the appropriate tape, positions to the point in the physical tape where the requested "tape" resides, and the application doesn't know the difference.

The future of VTL technology

Virtual tape library technology continues to develop and expand, but just being a VTL may not be enough anymore. With so many users replicating backups offsite, the industry must find a solution to the challenges posed by using replicated backups. Unfortunately, in the near term we're likely to see more product-specific approaches such as Symantec's NetBackup OpenStorage and HP's Data Protector/Virtual Library System.

There have also been predictions that as data deduplication becomes more pervasive in backup software, the need for intelligent disk targets will be reduced. But that's only likely to happen if source deduplication software products can address their restore speed limitations, which were designed to back up remote sites. As such, their restore speeds are slow (10 MBps to 20 MBps). Unless that changes, there will continue to be a market for high-speed disk targets.

BIO: W. Curtis Preston is the executive editor for and an independent backup expert.

Dig Deeper on Storage for virtual environments