VTLs are built around four architectures that address issues such as media portability and backup software compatibility...
Virtual Tape Libraries (VTLs) are being deployed in ever-increasing numbers as the primary target for backups and the source for recoveries. But because VTLs are inserted before tape drives in the backup and recovery process, performance and management issues are cropping up that VTL vendors are just beginning to address.
Before jumping on the VTL bandwagon, you need to answer the following questions:
- How well does the VTL scale to meet higher connectivity or capacity needs?
- What management features does a particular VTL offer?
- How well does it integrate with backup software?
- What impact do features like compression, data deduplication and encryption have on VTL performance and capacity?
VTLs are built around four different architectures that, in various ways, address these questions.
- VTL storage appliance. The VTL is a single appliance that contains both the disk drives and the controllers that host the VTL software. Examples include units from Copan Systems Inc., EMC Corp., Quantum Corp. and Network Appliance (NetApp) Inc.
- VTL server appliance. A dedicated server is loaded with VTL software and connected to external disk arrays over Fibre Channel (FC). Diligent Technologies Corp., FalconStor Software Inc., Maxxan Systems Inc., Neartek Inc. and Sun Microsystems Inc. each provide this type of technology.
- Grid-based architecture. The S2100-ES2 from Sepaton Inc. allows users to add server nodes to their VTLs and scale to meet the capacity demand for either more performance or additional FC ports.
- Tape library based. ADIC's Pathlight VX series and Spectra Logic Corp.'s Spectra T950 allow users to store data to tape or disk media within their tape libraries.
|Enterprise virtual tape libraries|
|Click here for a comprehensive list of Enterprise virtual tape libraries (PDF).|
Each architecture has its pros and cons. VTL storage appliances are easy to set up and deploy; but when performance or capacity limits are reached, users need to deploy more VTL storage appliances, thereby introducing management headaches. VTL server appliances may allow users to scale to larger capacity and performance thresholds and use different vendors' storage, but may become performance bottlenecks or introduce untested configurations. Sepaton's S2100-ES2 grid-based approach scales economically when capacity, connectivity or performance thresholds are reached, but is eventually subject to the same limitations as storage appliances. ADIC's Pathlight VX series and Spectra Logic's Spectra T950 manage disk and tape in a single frame, but create a dependency on the VTL to manage tapes created and exported from the VTL.
|Click here for a comprehensive list of SMB VTLs (PDF).|
VTL storage appliances
Several vendors integrate FalconStor's VirtualTape Library software for their VTL appliance, including Copan Systems and EMC. However, these two vendors add their own software features to their VTL products. For instance, EMC includes an Active Engine Failover feature in its Clariion Disk Library (DL) 700 line that allows the backup job to fail over from one controller to the other without restarting the backup job.
Copan's Revolution 220T and 220TX embed FalconStor's VTL software into their controllers and use it with Copan's Disk Aerobics software. Disk Aerobics manages Copan's massive array of idle disks (MAID) technology that reduces power and cooling costs by powering down disk drives when they're not in use.
NetApp's NearStore VTL600 and VTL1200 VTL storage appliances integrate NetApp's VTL software (obtained with its acquisition of Alacritus Software in 2005) with its Data Ontap OS. One of the benefits of this integration is its "self-tuning" feature. This allows the VTL software, through its integration with the Data Ontap OS, to directly communicate with the back-end disk and see all available LUNs. It can then ascertain which LUN will provide the highest performance and assign the backup stream to that LUN. It then monitors the incoming backup stream and checks with the operating system after every 1GB of data to see if the backup stream should be redirected to another LUN that will provide better performance.
|Click here for a comprehensive list of Software VTLs (PDF).|
Coping with capacity
All VTLs have limited amounts of capacity. They address this limitation through compression, data deduplication, scalable back-end disk capacity, and tape libraries that integrate disk and real tape.
Data is compressed using the backup software on the host or compression software on the VTL. Compression allows VTLs to increase their total storage capacity by ratios of 2:1 or more, but compression introduces latency during backups and recoveries on the backup server or VTL that most users will find unacceptable.
Quantum's DX-Series of VTL storage appliances circumvents the performance problems that compression normally introduces. Using a combination of its software and hardware, Quantum embeds its Optyon compression technology (a $2,500 option) into its DX-Series controllers and then dedicates hardware adapter cards to compress incoming data and decompress outgoing data.
Another technology some vendors are starting to employ to reduce the amount of data stored is data deduplication, which stores similar blocks of data together and identical blocks only once. The technology uses meta data to track specific blocks of data and reconstruct the data in the appropriate order during recoveries. Similar to compression, the primary concern with this approach is the performance hit that comes with the deduplication process. Vendors address this issue in one of the following two ways.
Diligent Technologies' ProtecTier VT software is loaded onto a Red Hat Linux server, configured as a VTL server appliance and processes data in real time. Its HyperFactor technology detects recurring data within sets of data and then creates a single-instance store. Unlike most hashing techniques that introduce significant performance overhead when data is stored and retrieved, HyperFactor maintains an index in the server's RAM that allows the ProtecTier appliance to support a high throughput rate because there's no disk I/O.
HyperFactor also enables the mapping of up to 1 petabyte (PB) of physical storage using just 4GB of RAM on the VTL server appliance. The deduplication feature allows users to achieve compression ratios of 25:1 or greater over time because only new data is stored. But, as with all compression algorithms, Diligent Technologies can't entirely escape the performance hit its data deduplication introduces; the highest rated throughput for a ProtecTier cluster of four servers is 800MB/sec or approximately 2.9TB/hour.
An alternative way to deduplicate data is to execute the routine after the data is already backed up. Sepaton's S2100-ES2 deduplication doesn't impede the backup process, but rather stores data directly to disk in its native form. Its ContentAware DeltaStore software then identifies new, changed and unchanged data, and performs forward differencing, keeping the newer copy of the data and eliminating the old data. This approach speeds recoveries because data is stored intact, rather than fragmented as with backward differencing.
VTL server appliances offer another option to increase capacity. Unlike VTL storage appliances that have a fixed upper amount of internal disk they can support, VTL server appliances generally allow users to discover, add and manage more storage through their FC interfaces; they're also available in a variety of configurations.
Neartek's Virtual Storage Engine (VSE) 3.0 runs on any Intel-based system that supports the Linux 2.6 kernel, and lets you cluster up to 32 servers in a single logical configuration. A 32-server configuration, when fully populated with FC host bus adapters, will also deliver the highest throughput of any VTL available, topping out at more than 11GB/sec.
Neartek's VSE is highly scalable and allows users to create a heterogeneous server, storage and software configuration; however, mixed configurations also introduce the possibility for incompatibilities among devices. A better way to attain infinite capacity is to use a VTL tape library. Tape library-based architectures logically integrate disk and tape, though vendors implement this technology differently. ADIC's Pathlight VX 450 and VX 650 models integrate and manage existing tape library models using their VTL management software. Data is stored initially on the disk within the VTL; movement of the data between disk and real tape is then handled and managed by Pathlight VX. Only the VX 650 supports other vendors' tape library models, including Sun's StorageTek L180 and L700, and IBM Corp.'s 3584. The Pathlight VX 650 is sold as a self-contained appliance; pricing starts at $118,700 for a 3.8TB system with a single controller.
Spectra Logic's Spectra T950 fits more cleanly into the traditional tape library category but provides a disk media option, RAID eXchangeable TeraPack (RXT) SabreMedia, that when used effectively converts the T950 library into a VTL. The RXT SabreMedia, priced at $1,595 for 500GB in a RAID 0 configuration or $6,995 for 1.2TB in a RAID 5 configuration, comprises SATA disks housed in a container that can be inserted, ejected and moved just like tape media. However, this configuration requires special RXT drives, which sell for approximately $16,650 or about the same price as some LTO-3 drives.
The different virtual tape library (VTL) architectures offer various ways to fine-tune performance by taking advantage of the unique characteristics disk has to offer. For instance, EMC Corp.'s Clariion Disk Library allows users to increase performance by using its write-cache consolidation feature, which consolidates blocks of data in backup streams into 1MB blocks and then writes the blocks directly to disk. This allows the data to be laid down sequentially rather than randomly, which degrades performance on SATA drives.
Sepaton Inc.'s S2100-ES2 employs two different technologies to optimize performance on its VTL. First, Sepaton allows users to group shelves of disk drives into pools that are written to and read by the scalable replication engines (SREs) that host its VTL software. Next, the SREs break up incoming backup jobs into 32MB chunks called extents and sequentially writes one extent to each shelf in the pool.
This approach provides two performance benefits. First, by distributing data across all disks in the pool using extents, sequential read-and-write performance is not impacted. The 32MB size of the extents ensures that random reads or writes are spread across enough disk drives on different shelves that performance doesn't suffer. The other performance benefit shows up if the throughput of the existing SREs is reached. Because the S2100-ES2 supports up to nine SREs, another SRE can usually be added to the S2100-ES2 (unless it's already fully populated) without the need to introduce a new VTL into the equation.
Diligent Technologies Corp.'s VTL server appliance approach gives users the option to place backed up data on Fibre Channel (FC) disk drives, not just SATA drives. The company finds that, on average, one of its nodes can achieve about 200MB/sec when connected to back-end SATA drives; if that same node uses FC drives on a high-end array, performance can climb to as high at 350MB/sec. However, Diligent generally sees performance increases of 10% to 20% when users back up to FC drives instead of SATA drives.
Once data is stored on a VTL, copying or moving the data from the VTL to other media for long-term archiving or offsite data protection becomes an issue. There are three basic ways to move or replicate data from a VTL:
- Use the VTL to manage the movement of data between disk and tape
- Use backup software to move VTL-based data to tape
- Replicate data to an offsite VTL
This is one of the reasons why it makes sense to use tape library-based architectures. Products such as ADIC's Pathlight VX 450 and VX 650 create real tapes for export, but only under the control and direction of the backup software. In this way, the backup software catalog remains consistent and data transfers between disk and tape occur without introducing SAN traffic or overhead on the backup server to perform the copies from virtual tape to real tape. Vendors of the other VTL architectures generally recommend letting the backup software manage and move the data between disk and tape. However, using this approach creates a performance hit on both the backup server and the SAN, and should be scheduled during periods of low backup activity to minimize impact.
To eliminate additional overhead to the backup software server, EMC introduced a new feature on its Clariion DL700 series that lets users address this specific data management problem. By including an optional storage node that contains a version of EMC's NetWorker backup software, the node handles the processing of the movement of virtual tapes to physical tapes while sending updates to NetWorker's master backup software catalog.
Sending updates from the storage node to the master catalog allows the master catalog to maintain its consistency. As the storage node moves data back and forth between virtual and real tape, the node updates the catalog on the master backup server with the location of the tapes. Although currently only available on NetWorker, EMC plans to offer storage nodes that support other backup software products and to extend this feature to its Clariion DL200 line of VTL products. Other VTL hardware appliance vendors like NetApp and Sepaton also intimated that they plan to announce similar functionality in the near future.
The final option for moving and archiving data offsite is to simply install a second VTL and replicate data between the two, taking tape out of the equation altogether. Supported by VTL products from Copan Systems, Dynamic Solutions International, EMC, NetApp and Sun, among others, users can asynchronously copy or move data between VTLs at two sites and nearly eliminate the need for real tape. However, this approach doesn't scale easily and requires significant network bandwidth. You should employ this approach only when limited amounts of data need to be moved or copied.
VTL vendors are implementing a host of features to make their VTLs look and act more like real tape libraries. But only VTLs that deliver the benefits of real tape libraries--infinite capacity and data portability--should be considered enterprise ready. For now, only tape library-based architectures from ADIC, Spectra Logic and EMC's Clariion DL700 line with its storage node option appear to meet those requirements.