Most VTLs offer replication or cascading, which replicates one VTL's backups to another VTL. But the tapes in the second VTL won't be considered duplicates by your backup software because they'll have the same bar codes as the original tapes. Also, remember that you'll probably be replicating the entire backup, and most backups aren't block level. Even incremental backups take up roughly 1% to 5% of the amount of data being backed up. This means you'll need to replicate 1% to 5% of your data center every night--a significant undertaking for many environments. Therefore, it may only be possible to use this feature within a campus, as opposed to including data from remote sites. Today, replication is offered by Alacritus- and FalconStor-based VTLs.
Some VTL vendors are beginning to offer a feature where their VTLs will examine the incremental backup, identify the changed blocks within that backup and replicate only the changed blocks. When that functionality becomes more widely available, replication between data centers will be much easier to accomplish. Diligent is the first to announce such a product with its ProtecTier offering.
If you have a heterogeneous environment with mainframe, AS/400 and open systems, you might consider a VTL that supports all three environments. Only Neartek currently offers this functionality.
A few integrated VTLs (FalconStor and Neartek) offer a feature called stacking. Stacking copies multiple virtual tapes onto one physical tape, a feature borrowed from mainframe virtual tape systems (VTS). Stacking was important to mainframes because apps were unable to append to a tape. The VTS would present hundreds of small virtual tapes to the app and then stack those virtual tapes onto one physical tape, significantly cutting media costs.
However, the value of stacking in most open-systems environments is questionable because any decent backup product can append to a tape until it's full. You should be aware that the use of stacking breaks the relationship between the backup software's media manager and the physical tape. Products that support stacking must read the entire stacked tape to read just one of the virtual tapes included on that tape. This feature is useful only if you gain a benefit akin to that achieved in the mainframe environment.
You also need to think about which type of notification the VTL supports, especially if you're considering an integrated VTL. Some support SNMP traps, a few support e-mail notification, while others require a storage admin to log into a Web page to be notified of any issues.
If high-end performance is important, you should look for a VTL with a multiple data-mover architecture. Most VTLs run all software on one VTL head. Some vendors use the VTL head as a control mechanism, while passing the movement of the data on to one or more data movers. Need more performance? Simply purchase more data movers. This allows scaling to a much higher level without having to add and administer another VTL (Diligent, Neartek and Sepaton use this approach).
Finally, remember that VTLs don't perform at the same level, so it's important to conduct performance testing in your environment.
Alternative backup methods
If you have a centralized data center with a four-hour recovery time objective (RTO), a 24-hour recovery point objective (RPO), a 24-hour synchronicity requirement and an eight-hour backup window, you can stop reading now. But if your backup requirements include remote, unattended data centers, a five-minute RTO, a 15-minute RPO or a non-existent backup window, alternative backup systems can help bring some needed sanity to your storage environment.
Alternative backup options include snapshots, replication, continuous data protection (CDP) and data reduction backup (DRB). These technologies will reduce backup and restore times, and help meet requirements such as RTO, RPO, backup window and synchronicity.
RTO--how long it takes to recover a system--can range from zero seconds to several days or even weeks. Each piece of information serves a business function, so the question is how long the business can live without that function. If the business can't live without it for one second, then the RTO is zero.
RPO is determined by how much data a business can afford to lose. If the business can lose three days' worth of a set of data, then the RPO is three days. If the data is real-time transactions essential to the business, the RPO is zero for that application.
There can also be an RPO for a group of machines. If several systems are related to each other, they may need to be recovered to the same point in time. This is the synchronicity requirement; to meet it, all related systems have to be backed up at exactly the same time. This is referred to in disaster recovery circles as consistency groups.
Setting RPO, RTO requirements
All RPO, RTO and synchronicity requirements must be business-centric. Before deciding what these requirements are, you should first analyze and prioritize the business functions, and assign each computer system the recovery priority of the business function it serves. Next, decide on an RTO and RPO for each system and type of disaster--from the loss of a disk to the loss of a metropolitan area. Some systems will have the same requirements for all types of disasters; others may have tougher requirements for specific types of disasters.
Once you've determined an RTO and RPO for each system and disaster type, the final step is to determine how long it will take to back up the system and how much the backup will impact the production system.
Everything should start with RTO and RPO, although very few people do it that way. Most people go right to the backup window. Instead, you should concentrate on meeting your RTO and RPO requirements, and the backup window will almost always fall right in line. The reverse isn't necessarily true, however. There are many things that will shrink your backup window but not help your recovery objectives. If your requirements are impossible to meet with a traditional backup system, the following technologies are worth considering.
SNAPSHOTS. The most common type of snapshot is a virtual copy of an original volume or file system. The reliance on the original volume is why snapshots must be backed up to provide recovery from physical failures (see "Match snaps to apps," p. 46). Snapshot functionality resides in a number of places, including advanced filesystems, volume managers, enterprise arrays, NAS filers and backup software.
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
A TALE OF TWO DIVISIONS |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
First American Trust Federal Savings Bank, Santa Ana, CA, handles up to $2 billion worth of wire transfers each day. The bank was recently asked by the Securities and Exchange Commission (SEC) to restore one year's worth of Microsoft Exchange e-mail data--a significant request.
One division used Network Appliance (NetApp) Inc.'s unified storage solution, SnapManager for Exchange, and Single Mailbox Recovery software, while another division used traditional backup and tape. The results from the two divisions couldn't have been more different. "The SEC request made the need for using nearline storage to easily recover and access e-mail undisputable," says Henry Jenkins, chief technology officer at First American. "Our disk-based solution rose to the occasion, but damaged tapes and botched backups made restoring from tape excruciating for our sister division."
It took the bank only a few days to restore roughly 360GB of e-mail using the combination of hardware and software from NetApp. In contrast, it took several months for one IT bank staffer to restore a smaller volume of e-mail from tape.
First American also uses offsite replication of critical SQL Server databases, Exchange e-mail and flat-file data that's used to perform routine wire services. All of this critical data creates only 200MB of changed data blocks per day, which are then asynchronously replicated to a remote system located at a disaster recovery (DR) site approximately 100 miles away. The DR system has an RPO of four hours in the event of a site failure.
"SnapMirror software saves us time by not having to replay logs and data at the remote site is, on average, less than 15 minutes behind," says Jenkins. "Every year for the past three years, we've done a disaster recovery test and every year it's just a matter of bringing up the warm servers," he adds.
|
 |
 |
 |
 |
 |
 |
 |
Snapshots can help you to meet aggressive backup requirements. For example, some snapshots can satisfy an RTO of a few seconds by simply changing a pointer. An aggressive RPO can be achieved by creating several snapshots per day and, because snapshots can be created in seconds, you can also meet stringent backup window requirements. For instance, it's possible to create a stable, virtual backup of a multiterabyte database in seconds--reducing the impact on the application to potentially nothing--which leaves hours to perform a backup of that snapshot. Finally, creating synchronized snapshots on multiple systems is also fairly easy.
There's a growing list of APIs that allow different vendors' products to interface with snapshots; the network data management protocol (NDMP) and Microsoft Corp.'s Volume Shadow Copy Service (VSS) are examples. NDMP lets backup products create a snapshot, and catalog and restore from its contents. VSS allows storage vendors with snapshot capability to have the files in those snapshots listed in and restored from the Previous Versions tab in Windows Server 2003. Hopefully, this capability will be added to workstation versions of Windows and more NAS vendors will support VSS.
Another interesting development is the creation of database agents that work with snapshots. The database agent communicates with the database so that the database believes it's being backed up, when all that's really happening is the creation of a snapshot. Recoveries can be incredibly fast when the process is controlled by the database application.
REPLICATION. Replication is the practice of continually copying from a source system to a target system all files or blocks that have changed on the source system. Replication used to be what companies implemented after everything was completely backed up and redundant, which meant that few used replication. However, many people are now using replication as their first line of defense for providing backup and disaster recovery.
Replication by itself is not a good backup strategy; it copies everything, including viruses and file deletions. Therefore, a replication-based backup system must be able to provide a history by either occasionally backing up the replicated destination or through the use of snapshots. It's usually preferable to make a snapshot on the source and replicate that snapshot to the destination. That way, you can prepare database applications for backup, take a snapshot and then have that snapshot replicated.
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
ADAPTEC SWITCHES FROM TAPE TO DISK |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
Kelly Overgaard, systems manager at Adaptec Inc., was fed up with tape. "Our old system was at capacity, and something was always breaking," he says. "When we looked at disk-based solutions, our goal was to completely get rid of tape--especially for remote sites."
Adaptec chose an Avamar Technologies Inc. Axion system that uses "commonality factoring" to identify duplicate blocks of data throughout its enterprise and to transmit only the new, unique blocks of data each time it backs up. This allows Adaptec to back up and recover smaller remote offices directly to its central data center. Larger offices, or those with shorter recovery time objectives, can be backed up to a local target device at the remote site, which then replicates to a second device in its central data center. This flexibility to use (or not use) a local recovery device let Adaptec deploy this solution to several sites.
Overgaard says that because the commonality factoring is performed on the client, it requires slightly more CPU than traditional backup, but "no one has mentioned any ill effects." He considers himself a happy customer, but says he's unsure if the system will be able to back up Adaptec's large databases.
But Overgaard doesn't believe he can afford to store his firm's backups with long-term retention on the Axion system, so he also performs a monthly full tape backup of Axion clients using Adaptec's previous tape system, and then sends that offsite for several years. Avamar says he'll soon be able to make such tape backups by simply exporting the appropriate data directly from the Axion system.
|
 |
 |
 |
 |
 |
 |
 |
When used with snapshots, replication requires only tiny backup windows. The snapshot takes just seconds to create, and replication is the quickest way to back up that snapshot to another device. You can also cascade replication to provide multiple copies, such as an onsite and offsite copy. If you want to provide a tape copy of the replicated snapshot, just back up one of the destination devices. But replication software doesn't usually provide recovery features. The RTO, RPO and synchronicity requirements that you'll be able to meet will be based on how you're performing snapshots or backups, and how quickly they'll be able to recover.
DRB SYSTEMS. DRB systems were designed to answer the following questions: If only a few bytes in a file change, why back up the entire file? If the same file resides in two places on the same system, why back it up twice? Why not store a reference to the second file? And why waste server and network resources by backing up the same file across multiple systems?
By backing up a file once, and then backing up only the changed bytes, backup windows are reduced. Tape copies of disk-based backups can usually be created at any time, depending on your requirements. Some DRB products can meet aggressive RTO requirements by restoring only the blocks that have changed since the file was last backed up. The RPO and synchronicity abilities of DRB products are based on how often you back up, but it's common to back up hourly.
The biggest advantage to DRB products is that, from the user adoption perspective, they're the closest to what users know. Their interfaces are similar and they often have database agents like traditional backup software. They're also able to back up faster and more often, and use much less bandwidth.
CDP. A CDP system is basically an asynchronous, replication-based backup system. The software runs continuously on the client to be backed up, and each time a file changes, the new bytes are sent to the backup server within seconds or minutes. But unlike replication, a CDP system can roll back to any changes at any time.
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
Pros and cons of alternative backup methods |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
 |
CDP products transfer data to the backup server in different ways. Some transfer changed blocks immediately, while others collect changed blocks and send them every few minutes. They also differ in how they do recoveries. Some products are able to restore only the blocks that have changed from a particular point in time, while other programs operate in a more traditional manner by recovering the entire file or filesystem. The first method accommodates more aggressive RTOs and RPOs than the second method. Also, CDP products can meet any type of synchronicity requirement because they can recover one, 10 or 100 systems to any synchronized point in time.
Another difference in CDP products is that some are database-centric and work only with a particular database, such as Microsoft Exchange or SQL Server. Most file-based CDP products aren't going to provide interfaces for your database apps. These CDP products copy blocks to the backup destination in the same order they're changed on the client. Restarting your database causes it to go into the same mode that it would go into if the server were to crash (i.e., crash recovery mode). It examines the data files, figures out what's inconsistent, rolls backward or forward any necessary transactions or blocks, and then the database is up. If the CDP product puts the blocks back in the exact order in which they were changed, then the database should be able to recover from any point in time. Some products can even present a logical unit number or volume to your database that it can mount and test before you do the recovery.
Some CDP vendors, like Kashya and Mendocino, integrate with database vendors. In addition to continuously copying blocks from source to destination, they integrate with your apps to create consistent recovery points that can be used to recover your database without it having to go into crash recovery mode. Keeping the app out of recovery mode can save a lot of time during a restore.
Your database vendor may have a different opinion about CDP: If you're not using their supported backup method, they may not be helpful if something goes wrong. Discuss the support issue with your database vendor and include your DBA in the discussion.
Aggressive requirements
You should consider switching backup products only if your current backup product can't meet your requirements (see "Pros and cons of alternative backup methods," previous page). There are many requirements--such as remote office data protection, backing up large databases, and an app with an RPO of zero--that might have you considering alternatives.
The most common area where backup requirements are difficult to meet is the remote office. Traditional backup schemes can't meet remote office RTO/RPO requirements. There's either too much data or not enough bandwidth to support a reasonable RTO or backup window. Any CDP product can provide backup and recovery of a remote office; most offer two methods. If long RTOs are acceptable, remote sites can back up directly to your central office. In the case of a disaster, just copy the data from the central data center to a disk or tape and send it to the remote site. If this meets RTO requirements, it's the least-expensive option. For tighter RTO requirements, install a backup device at the remote office. The remote office systems can back up to it, and it can then replicate the data to the central site. This provides local recovery and disaster recovery without touching a tape.
CDP products are also superior to traditional backup methods when backing up very large databases. There isn't enough time or horsepower available to transfer several terabytes of data to tape every day. A CDP product could continually back up a database throughout the day, with no noticeable backup window or application impact. Depending on the product, a stringent RTO and short RPO could also be met. Some products also provide a disk-based copy that can be used in a disaster situation while the real volume is being recovered.
Finally, some database applications require a zero RPO. Most databases can meet such a requirement if they're configured correctly, and if the transaction log is backed up throughout the day. If your database supports that kind of functionality, it's probably best to stick with it. If not, try one of these newer methods.