VTL data management issues

As disk libraries become the primary backup target for near-term data recoveries, storage managers are exploring new ways to exploit tape's high capacity, low cost and mobility. Disk is the best medium for fast backups and recoveries, and many companies have turned to virtual tape libraries as a way to put disk in their backup process. On the surface, it may seem easy to implement a VTL, but there are many subtle operational issues that must be dealt with to ensure that your data can be recovered quickly when needed.

This article can also be found in the Premium Editorial Download: Storage magazine: Hot storage technology for 2008:

With the proliferation of virtual tape libraries within storage environments, storage managers face an increased number of decisions in regards to managing and cataloging the transfer of data to tape.


As disk libraries become the primary backup target for near-term data recoveries, storage managers are exploring new ways to exploit tape's strengths: high capacity, low cost and mobility. Disk is the best medium for fast backups and recoveries, but tape is often the best option for long-term data storage and retention. Companies continue to use disk and tape in their backup process, but managing both media creates issues such as:

  • Compression algorithm differences in disk and tape libraries
  • Encrypting and decrypting backed up data
  • Keeping data on specific media to meet application recovery point objectives and recovery time objectives
  • Managing size differences between virtual and physical tape cartridges
  • Optimizing data placement based on power savings or energy costs
  • Reconstructing deduplicated data before moving it to tape
  • Scheduling data movement so it doesn't impact backup windows
  • Updating and protecting the backup software catalog
Although some virtual tape libraries (VTLs) include an option to manage data movement from disk to tape, most VTLs leave it to the backup software to initiate data movement and track where data is stored. VTLs' roots for managing disk and tape are found in systems originally designed for mainframe backup such as Sun Microsystems Inc.'s StorageTek Virtual Storage Manager and IBM Corp.'s Virtualization Engine TS7700. These systems were among the first to offer a disk cache to store backed up data. But these systems work only with mainframe OSes, support only FICON connections and often have prices starting in the hundreds of thousands of dollars.

Still, the same principles they introduced to manage disk and tape in mainframe environments are applicable for open-systems VTLs without mainframe costs and restrictions. Some open-systems VTLs include data management software, which manages the copying of data from disk to tape and back again. Two types of software are available to handle this: proprietary management software and third-party backup software.


Click here for an overview of Key VTL features and considerations (PDF).


Moving from virtual to physical tape
A few products, including IBM's Virtualization Engine TS7520 and Quantum Corp.'s Pathlight VX 650 and DXi7500, use proprietary management software, although they manage tape differently. For instance, Quantum's Pathlight VX 650 functions strictly as a VTL and creates virtual tape cartridges within it. Data backed up to Pathlight VX 650's virtual cartridges is stored in native tape format and each virtual cartridge is assigned its own virtual bar code. By storing data this way, the Pathlight VX 650 can directly export virtual tape cartridges to physical tape cartridges in their native tape format, which lets backup software recover data directly from tape without any dependencies on the VX 650.

Quantum's new DXi7500 uses different management software and is configurable as a VTL, a NAS backup target or both. When used as a NAS backup target, the DXi7500 appears as a disk pool to the backup application; when it copies data from its disk pool to physical tape, it optimizes the amount of data stored on each tape by filling it. This alleviates a current problem when copying data from a virtual to a physical cartridge: If a virtual tape cartridge isn't completely filled, the corresponding physical tape cartridge won't be either.

Using a VTL to manage data copying from disk to tape creates other possible problems (see "Six questions to ask before buying a VTL"). When copying data from disk to tape, the data stored to the physical tape may be in the same format as that stored on disk. In this format, it lacks a tape header and other information needed by the backup software to read the physical tape. This means you have to recover data from the physical tape to the VTL before the software can recover the data. Additionally, some mechanism is also required to copy the VTL's catalog from the production site to a VTL at the disaster recovery site so it can recover the data.

The greater concern with permitting the VTL to manage the data copy from disk to tape is that the VTL needs to connect to the backup software to update the backup software's catalog with the information about the new physical tape. Usually, admins manually update the backup software catalog with the information each time the VTL creates a physical tape copy, although some VTLs can handle the chore; for example, Quantum's DXi7500 has an interface to Symantec Corp.'s Veritas NetBackup 6.5 for updates. If the catalog isn't updated, you can only recover data from physical tape by first using the VTL to read the tape or forcing the software to read each tape and then catalog the data on it.

Six questions to ask before buying a VTL
Before buying a virtual tape library (VTL), you should consider the following.
  1. Is backup software supported within the VTL? Installing backup software on the VTL lets it function as a media server recognizable by the backup software. The backup app can then automatically record the creation of physical tape copies in its catalog. The downside is that finger-pointing between vendors can result if support issues arise and backup software upgrades become dependent on what operating system the VTL supports.


  2. How is the backup software catalog updated? Most VTLs don't internally support backup software, so backup software needs to be made aware of tape-copy processes to ensure it knows a tape copy exists and where it's located. Although some VTLs manage the disk-to-tape copy themselves, they lack any mechanism to update most backup software catalogs. This requires admins to manually update the catalog or the backup software must read each tape in the tape library to determine what information is on each tape.


  3. When does compression and deduplication occur? Compression and deduplication are becoming must-haves when implementing VTLs, but they're problematic when copying data from disk to tape. Data must be decompressed, reconstituted or both when moved from disk to tape, introducing performance overhead and lengthier tape copy windows. The best way to avoid this is to use a VTL that can first back up data in an uncompressed, native format for easy movement to tape.


  4. In what format is the data stored on tape? Data that's copied to tape must be in a format recognized by the backup software. In situations where the VTL controls copying data from disk to tape, the backup software may not recognize the format of the data on the tape, rendering it unreadable or requiring the VTL that created the tape to first recover the data. This creates a dependency on the VTL in recovery scenarios.


  5. How does the VTL device manage existing physical tape libraries? VTLs may treat physical tape libraries in one of three ways: no knowledge of their existence, treat them as a backup target or virtualize them. While one way isn't necessarily better than another, knowing how or if a VTL manages tape libraries can help you optimize your tape library in conjunction with the VTL.


  6. Does the VTL present itself as a NAS target, a VTL or both? NAS targets appear as one large disk pool to the backup software, while VTLs present virtual tape targets and manage disk as virtual tape cartridges. Using a large disk pool eliminates some of the inefficiencies associated with virtual tape cartridges (unused space on virtual cartridges) and allows admins to entirely fill physical tape cartridges when data is moved to tape. However, VTLs associate a bar code with each virtual tape cartridge; with a NAS target, the bar code is created only when the data is moved to a physical tape.

Backup software
The need to keep the backup software and VTL catalogs in sync for recovery from tape has led many VTL vendors to leave this responsibility to the backup software. Sepaton Inc. considered incorporating a tape management feature in its VTLs, but has abandoned that for now. "We found that most customers want the backup software to manage the creation and copy of physical tape copies in order to maintain backup catalog consistency," says Jay Livens, Sepaton's director of marketing.

Most VTLs appear only as a disk pool or a tape library to backup software, although some, like EMC Corp.'s and Spectra Logic Corp.'s, support backup software within their VTLs. With the EMC DL6000 Series VTLs, admins may configure multiple nodes to host EMC's NetWorker or Symantec's Veritas NetBackup (but not both), which operate as media managers; admins can install backup software directly onto Spectra Logic's nTier.

Jay Krone, EMC's director of Clariion platform marketing, finds that by placing the backup software inside the disk library the software always knows what's going on with the disk and physical tape. EMC took the additional step of more tightly integrating its EMC DL6000 management software with NetWorker and Veritas NetBackup so admins can manage physical tape creation and movement through the backup software or the DL6000's native VTL management interface. "In this configuration, no matter which way physical tapes are cloned and ejected, the backup software catalog always knows where they are," says Krone.

Spectra Logic took a different approach to support backup software on its nTier family. Because nTier runs on Windows Storage Server 2003, admins may install any backup software that runs on the Windows OS on an nTier Series disk library. Admins may configure the backup software to back up data directly to the disk cache on the nTier and then copy or move the data to any disk or tape library that's external to the nTier using the installed software.

Here again problems can surface. Because the backup software server must handle data movement from the VTL to the tape software and back again, the backup software needs to insert itself into the data path. Server performance issues can emerge as the amount of backed up data increases or as data is moved from disk to physical tape, which can impact backup and restore windows. Although backup software media servers external to the VTLs can be upgraded to meet those requirements, this task becomes complicated when upgrading backup software media servers that reside within VTLs.

Leaving the management of backed up data on disk entirely up to backup software doesn't completely alleviate other problems that may arise after using disk over time. Dave Kenyon, Sun's VP of storage marketing, finds that backup software does a "lousy job" of managing disk in VTLs because it provides no method to defragment disks or control access to data on a VTL. But he recognizes that using disk as a primary means for recovery with backup software is becoming a prerequisite for firms. "Companies are really screwing themselves if they use tape as their primary means of recovery," he says.

Virtual tape director
Fujitsu Siemens Computers' CentricStor and Gresham Enterprise Storage Solutions' Clareti VTL are virtual tape directors that are a subset of the broader class of VTLs. Like other VTLs, they virtualize and present disk as virtual tape cartridges, but they also virtualize external physical tape libraries and even other VTLs. Residing in the backup data path, virtual tape directors aggregate a firm's physical and virtual tape resources to present a single backup target or mount point to the backup software (see "Consolidating VTLs," below).

Consolidating VTLs
Owners of virtual tape libraries (VTLs) are experiencing some of the same frustrations that early adopters of NAS filers experienced, with "Loved the first, hated the third" a common sentiment. As companies add to the number of VTLs they deploy and start to consolidate them, VTLs that can manage disk and tape offer a couple of new ways to consolidate.

Treat existing VTLs as backup targets. VTLs that host backup software can treat external VTLs as another backup target just like a physical tape library. Although the storage on new and old VTLs can't be aggregated, this capacity isn't wasted.

Virtualize the VTL. VTLs that double as virtual tape directors, such as Fujitsu Siemens Computers' CentricStor and Gresham Enterprise Storage Solutions' Clareti VTL, can virtualize the interface on existing VTLs. Administrators only need to configure the backup software to interface with one VTL, the virtual tape director, which controls all of the other devices behind it.

During backups, a virtual tape director behaves like a VTL storing data to its local disk cache. Once cached, however, the data is copied to the appropriate VTL or tape library based on policies set in the backup software. Because the virtual tape director appears like the physical tape library to the software, the virtual tape director can respond to tape library commands issued by the backup software and copy data from disk to tape. This allows the backup software to offload the performance overhead associated with the data movement to the virtual tape director while keeping the backup software's catalog up to date with the creation of physical tape.

Because they virtualize physical tape libraries, virtual tape directors such as the Gresham Clareti VTL may also integrate with physical tape libraries and facilitate faster data recoveries. When the backup software requests data from the Clareti VTL, it will pull the data directly from the disk if it still resides on its disk cache. If the requested data is no longer on disk, the Clareti VTL's integration with tape libraries allows it to recall data from tape faster than using backup software.

When backup software requests data directly from a physical tape library, the software sends the library the information about where the data is positioned on the tape. Because most backup software doesn't know how the tape media in the cartridge is physically positioned, the tape drive must rewind the tape to the beginning of the cartridge before it can start to look for the data. However, because the Clareti VTL maintains information about the tape's position in the tape cartridge in its own catalog, it can immediately go to the position on the tape where the data is located without rewinding it.

Dealing with deduplication
New VTL?features dramatically increase the amount of data that can be stored on disk, but they add to the complexity of copying data from disk to tape. The compression algorithm in the VTL may not be the same as the one used by the target tape drive. This forces an admin to do one of three things when copying data to tape:
  1. Decompress the data on the VTL and then compress it again at the tape drive.
  2. Copy data directly from disk to tape with compression on the tape drive turned off.
  3. Turn off disk compression on the VTL.
None of these options is particularly desirable. The first adds overhead during the copy to the VTL and tape drive, although it may be the most acceptable option depending on the time admins have to copy data from disk to tape and the performance on the VTL. The second option eliminates performance overhead during the tape copy, but forces firms to first recover data from tape to the VTL and then from the VTL. Turning off compression on the VTL may double or triple the amount of capacity needed to store backed up data.

Deduplication on VTLs creates similar issues. Because tape drives don't natively support deduplication, a VTL with deduplicated data must first reconstruct the data in its native format before sending it to tape. This requires reserving sufficient time and ensuring that the VTL's performance is sufficient to reconstruct the deduplicated data before copying it off to tape. Technically, the deduplicated data can be copied to tape, but that reintroduces the dependency on the VTL for recoveries.

Some vendors are providing workarounds to these problems. The simplest method might be found in Copan Systems Inc.'s Revolution 300T/TX, which stores the most recent backup in native backup format with no compression or deduplication. While the Revolution 300T/TX supports compression and deduplication, it performs these functions after the backup is complete or post-backup at a time scheduled by the admin. This avoids the need to reconstruct the data in its native format when copying it to tape, although firms will need sufficient storage on the Revolution 300T/TX to keep an entire backup of all of their data in native format.

Most firms aren't encountering problems with encryption when copying data from disk to tape because encryption is primarily used just prior to moving data offsite. In that scenario, either the backup software or the tape drive encrypts the data just as it's stored to tape. While most VTLs offer encryption as an option, "A practical use case for encrypting data on a widespread basis in the VTL has not yet been made," says EMC's Krone.

As disk assumes a larger role in backup, tape remains a part of most data protection operations. While some VTL vendors have taken measures to incorporate tape management into their product, in the near term, give priority to products that integrate backup software with their VTLs such as EMC's DL6000 for the large enterprise, and Quantum's DXi7500 and Spectra Logic's nTier for SMBs. But the recent emergence of virtual tape directors like Gresham Enterprise's Clareti VTL and Fujitsu's CentricStor offer a compelling alternative as virtual tape directors let you introduce disk into their backup, use their existing physical tape libraries and let their backup software manage it all.

This was first published in December 2007

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close