If you're not enhancing your data protection system by using ATA disk arrays, you should be. New ATA disks arrays connected to SCSI or Fibre Channel (FC) converters offer high-capacity, SAN- or NAS-addressable storage for as little as $5,000/TB. I wrote about these arrays in the June ("Surprise! Cheap disks cure slow backup") and September ("Pick the right ATA array for backup") issues of Storage, and explained how people are using...
them as a disk cache for their tape-based backup and recovery systems. This article concentrates on other ways to use these incredibly inexpensive arrays to increase the level of data protection in your environment.
ATA disk arrays make an excellent target for off-site replication. Traditionally, replication is relatively high on the adoption curve. First, people make sure their data is being backed up. Then they make sure that it's protected against disasters by moving and backing it up off site. After that, most storage administrators begin to look at high-availability designs. Most high-availability systems are designed to get around individual component failure. They aren't usually designed to withstand a disaster that destroys the entire data center.There are high-availability systems that now incorporate off-site replication, but they're extremely expensive. The cost of such a system is derived from four areas:
1. The off-site data center
2. The conductivity to the off-site data center
3. The servers within the data center
4. The storage connected to servers
The use of colocation facilities helps make off-site data centers affordable for small- to medium-sized businesses. Companies wishing to replicate can also save money on servers, as they don't need to be the same speed or power as the servers in their production data center. However, until now there wasn't a way to save much money on storage. Enter ATA disk arrays: They behave much like the SCSI disk arrays populating most data centers today. They are a little slower in some applications, but a lot less expensive.
Using colocation facilities, cheap servers and an ATA disk array, even the smallest business can afford to replicate their data off site. There's even free software replication products for Unix and Windows. A directory of these products is available here.
Remote site replication
It's also possible to use a large ATA disk array as a target for the replication of several remote sites. Today, the typical backup and recovery solution for several remote sites is to place an inexpensive tape drive and some type of backup and recovery software at that location. The problem with such a setup is that it's subject to human error. A recent customer of mine found that the backups of their 100+ remote offices failed more than 50% of the time simply because the managers of the remote offices failed to insert a tape.
Replicating remote offices' data to a central ATA disk array saves thousands of dollars in tape drives, backup software and people hours. Such a system is also more reliable than a system that relies on a person to insert a tape. Of course, you could also solve that problem by leaving the tape in the tape drive, overwriting last night's backup with tonight's backup. But you know that's not a smart protocol to follow.
Another use of ATA disks and colocation facilities is to use a disk-based backup and recovery system to back up your servers off site. As I first explained in "Pick the right ATA array for backup," a true disk-based backup and recovery system understands that it's backing up to disk and takes advantage of the medium's random access capabilities. This lets you store only one copy of each unique file or block of data--no matter how many times that file or block appears on your network's computers. It can also do block-level incremental backups that back up only the changed blocks of each file, and forego occasional full backups. Since all the backups are instantly accessible, restoring from hundreds of incremental backups--even block-level incremental backups--takes the same amount of time as restoring from a single full backup. In fact, it could be faster to read hundreds of smaller backups from many disks instead of reading one large backup from one disk.
These disk-based backup and recovery systems offer two main benefits: It conserves disk space and it significantly decreases the amount of data transfer during a backup. This second benefit is why many of these products are designed to backup remote users. The amount of data transfer during a backup is small enough to be sent across a dial-up Internet connection.What if we turned the tables on this setup? Instead of using a disk-based backup in your data center to back up remote computers, what if you made the backup system remote, and used it to back up the local computers? Place one of these disk-based backup and recovery products in a remote data center or colocation facility, and back up your computers across the Internet. Once the backup has been completed, the amount of data that's transferred during a single instance store, block-level incremental backup is actually quite small.
Putting the backup system off site lets you recover from a disaster without constantly moving tapes between your on-site and off-site locations. In this case, tapes would only need to be created for archival purposes. If you want an additional copy of your backups located off site, there are products that support automated replication to a second or third facility. Similar to the replication products mentioned earlier, these products range from Windows-only products costing less than $1,000 (NetBackup Professional and NetWorker Laptop) to costly heterogeneous products designed to back up your whole data center to disk (EVault's InfoStage and Avamar's Axion).
|Server-free and client-free backups|
Server-free backups are represented by the green arrow and the client-free backup travels the red arrow path.
Server-free and client-free backup
ATA disk arrays can be used in server-and client-free backup implementations. No backup is completely LAN-free, client-free or server-free, of course. The backup server is always communicating with the backup client, even if it's just to get the metadata about the backup. These terms illustrate that the bulk of the data is transferred via a path that doesn't involve a server or the client that's being backed up.
If the SAN connected to the disk storage supports a SCSI feature called extended copy, then the data can be sent directly from disk to tape, without going through any server. There are also other--more proprietary--methods for doing this that don't involve the extended copy command. This is the newest area of backup and recovery functionality being added to SANs. Server-free backups are represented (see "Server-free and client-free backups") by arrow no. 1 , which shows a data path starting at the disk array, traveling through the SAN switch and router and arriving at the shared tape library. The data path doesn't include a server, which is why it's called server-free backup.
If a backup client has its disk storage on the SAN, and that storage creates a mirror that's split off and made visible to the backup server, that client's data can be backed up via the backup server -- the data never travels via the backup client. This is called client-free backup. Client-free backups are represented in "Client-free and server-free backups" by arrow No. 2, showing a data path starting at the disk array, traveling through the backup server, followed by the SAN switch and router, and finally arriving at the shared tape library.
Both client-and server-free backups require a consistent, frozen image of the backed up file system in order to safely transfer it to the backup system via a non-traditional data path. One of the biggest barriers to implementing a client-or server-free backup is the cost of the disks required to create this frozen image. However, there are some technologies that allow a frozen image to be created by splitting off a mirror, but they create an additional I/O on the primary disks.
Most client-and server-free backup systems use the mirroring capabilities built into enterprise class disk arrays, such as EMC's TimeFinder or Hitachi's ShadowImage. While these products are good, they can only mirror between like disks. That is, if you want to create a mirror for use in a client- or server-free backup system, you'll need to purchase enough SCSI or FC disks to create that mirror.
But if you only need this disk array for backup, does it need to be high-speed, expensive SCSI disks? The answer is no. What if you used a software volume manager (e.g., Veritas' Volume Manager) and created the third mirror on an ATA disk array? There are recently available products that can make one manufacturer's array look like another manufacturer's array. (Check out Rhapsody--a company currently being acquired by Brocade.)
You can use these disk arrays to increase the level of data protection in your environment. First, you can use them as a target for replication or replicate your entire data center to an off-site facility. You can replicate remote offices to your main data center, allowing you to back up from there. You can create a disk-based backup and recovery system and place it off site, which lets you have a completely automated backup and recovery system for off-site storage--without having to constantly move tapes back and forth. You can also use these arrays to create an additional mirror of important data that can be split off for use in a client-free or server-free backup system.
About the author:
W. Curtis Preston is the president of The Storage Group. He is the author of Unix Backup and Recovery and Using SANs and NAS.