Traditionally, backups have been usually targeted to tape drives, but more and more disk storage is being incorporated into the backup process. One common approach with disk-based backups is using disk as a staging area before copying data to tape, which is known as a two-stage backup architecture.
Other strategies include using disk as a complete replacement for tape media. Data is archived to disk in the form of virtual copies (space-saving snapshots) or full volume, byte-for-byte copies. Often hosts will have their disk backups stored on one or two dedicated disk arrays exclusively used for backup storage.
Another backup approach balances the mix of disk and tape for data protection. Instead of implementing a two-stage backup, disk and tape are both used as archival media and are incorporated into the tape rotation and retention schedule. For example, instead of using tape solely as the backup medium, disk is substituted to store and age incrementals, allocating tape exclusively for full backups.
Disk backups have many advantages, but there's a balance between cost and business continuance that must be considered. Depending on the total data requirement, change rate of data and current storage investment, substituting tape with disk might not be realistic. As a general rule, most organizations will find a sweet spot storing and aging incremental data on disk, and use tape for full backups only.
|Rotation schedules with incremental backups|
In the rotation schedule on the right, incrementals are backed up and aged on disk. The schedule has the same retention policy as the one on the left, but with 257 fewer tapes.
State of the union
A typical storage area network (SAN) architecture often includes SAN-attached tape devices. However, the majority of the servers both on and off the SAN are backed up over Ethernet. As a rule of thumb, servers that have fewer than 200GB of data can back up over Ethernet sending their data to servers that have locally attached tape devices. Even in light of SANs, most organizations don't need--or can't afford--to implement SAN-based backups on all their servers. This is largely due to the extra expense of software for tape drive sharing and robotic controls, as well as additional backup hardware, such as SCSI to Fibre Channel (FC) routers, host bus adapters (HBAs) or Fibre Channel Arbitrated Loop (FC-AL) edge switches. Except for large database and application servers, most hosts are perfectly suited for backups over Ethernet.
It's not uncommon during incremental backups for servers to spend an inordinate amount of time examining the file system to identify and capture backup data and then only copy 10MB to tape. As a result, most backup jobs are multiplexed. Even without compression, tape drives are too fast for a single host to stream over Ethernet. Multiplexing allows multiple backup jobs to be sent concurrently to a single tape drive in order to minimize the start-stop reposition, or back-hitching. The result is having a backup tape interleaved with multiple hosts' data.
This greatly speeds up the backup process, but unfortunately results in a less than optimal recovery performance. Restoring one stream of data that's been interleaved with other streams of data doesn't allow the tape drive to run at full speed during the recovery process. Tape backups also carry an inherent risk of media failure. Because tapes aren't RAID protected, most organizations keep multiple copies of tapes as well as minimize the use of incremental backup sets. Because a single bad tape can cause a large restore to fail, the more tapes that the backup resides on, the greater the chance a tape will cause a failed recovery. Knowing this, your policy for a typical tape rotation should be to perform weekly full backups.
Assuming a 2TB storage environment with a 50GB daily change rate using 80GB tape media, more than 541 tapes would be required using this rotation/retention schedule, as follows:
One tape for daily incrementals x four incrementals/week x four weekly retention + 25 tapes for weekly full backups x three weeks/month x three month retention + 25 tapes for monthly full x one monthly full/month x 12 month retention=541.
As a side note, tape media is inherently prone to overconsumption. Tapes are often used for personal copies, thrown away at the first sign of wear, rarely used to their full media capacity and are occasionally discarded after their retention period expires. When determining the amount of tape necessary for your backups, it's best to overestimate.
The advantage of a disk-tape backup solution is that it changes the backup schedule and the rotation/retention policy. If you back up daily incrementals to disk, you obviously save on tape media, but it also eliminates the need of conducting full backups every week. Disk-based backups don't suffer from the disadvantages that tape backups have. Disk backups are RAID-protected (no loss in data integrity using multiple incremental backup sets), don't require multiplexing (random-access device vs. serial) and are always online for quick recoveries.
In fact, using disk for backup data offers better business continuance. Disk storage can be made highly available, where most tape solutions are prone to failure. If a tape library has a channel failure with the robotics, tape devices are unavailable for backup and recovery. Using disk-based backups, the rotation schedule can leverage incremental backups more effectively by eliminating the need for weekly full backups.
To maintain proper data retention, weekly full backups are replaced with cumulative incrementals kept on disk for three months. After that time, the backup images expire and are purged from the file system (see "Rotation schedule with incremental backups"). Not only does this save 257 tapes, it also minimizes media management associated with those tapes, improves backup and recovery performance and can reduce the need to invest in additional tape technology.
To read the extended version of this tip and get a better understanding of the costs of using both disk and tape for backups, go to Storage magazine.
About the author: Mark Teter is the CTO of Advanced Systems Group, an enterprise computing and storage consulting firm in Denver, CO.
This was first published in March 2003