Advanced Systems Group
Published: 10 Mar 2003
Traditionally, backups have been usually targeted to tape drives, but more and more disk storage is being incorporated into the backup process. One common approach with disk-based backups is using disk as a staging area before copying data to tape, which is known as a two-stage backup architecture.
Other strategies include using disk as a complete replacement for tape media. Data is archived to disk in the form of virtual copies (space-saving snapshots) or full volume, byte-for-byte copies. Often hosts will have their disk backups stored on one or two dedicated disk arrays exclusively used for backup storage.
Another backup approach balances the mix of disk and tape for data protection. Instead of implementing a two-stage backup, disk and tape are both used as archival media and are incorporated into the tape rotation and retention schedule. For example, instead of using tape solely as the backup medium, disk is substituted to store and age incrementals, allocating tape exclusively for full backups.
Disk backups have many advantages, but there's a balance between cost and business continuance that must be considered. Depending on the total data requirement, change rate of data and current storage investment, substituting tape with disk might not be realistic. As a general rule, most organizations will find a sweet spot storing and aging incremental data on disk, and use tape for full backups only.
|Rotation schedules with incremental backups|
In the rotation schedule, incrementals are backed up and aged on disk. The schedule has the same retention policy as the one on the left, but with 257 fewer tapes.
State of the union
A typical storage area network (SAN) architecture often includes SAN-attached tape devices. However, the majority of the servers both on and off the SAN are backed up over Ethernet. As a rule of thumb, servers that have fewer than 200GB of data can back up over Ethernet sending their data to servers that have locally attached tape devices. Even in light of SANs, most organizations don't need--or can't afford--to implement SAN-based backups on all their servers. This is largely due to the extra expense of software for tape drive sharing and robotic controls, as well as additional backup hardware, such as SCSI to Fibre Channel (FC) routers, host bus adapters (HBAs) or Fibre Channel Arbitrated Loop (FC-AL) edge switches. Except for large database and application servers, most hosts are perfectly suited for backups over Ethernet.
It's not uncommon during incremental backups for servers to spend an inordinate amount of time examining the file system to identify and capture backup data and then only copy 10MB to tape. As a result, most backup jobs are multiplexed. Even without compression, tape drives are too fast for a single host to stream over Ethernet. Multiplexing allows multiple backup jobs to be sent concurrently to a single tape drive in order to minimize the start-stop reposition, or back-hitching. The result is having a backup tape interleaved with multiple hosts' data.
This greatly speeds up the backup process, but unfortunately results in a less than optimal recovery performance. Restoring one stream of data that's been interleaved with other streams of data doesn't allow the tape drive to run at full speed during the recovery process. Tape backups also carry an inherent risk of media failure. Because tapes aren't RAID protected, most organizations keep multiple copies of tapes as well as minimize the use of incremental backup sets. Because a single bad tape can cause a large restore to fail, the more tapes that the backup resides on, the greater the chance a tape will cause a failed recovery. Knowing this, your policy for a typical tape rotation should be to perform weekly full backups.
Assuming a 2TB storage environment with a 50GB daily change rate using 80GB tape media, more than 541 tapes would be required using this rotation/retention schedule, as follows:
One tape for daily incrementals x four incrementals/week x four weekly retention + 25 tapes for weekly full backups x three weeks/month x three month retention + 25 tapes for monthly full x one monthly full/month x 12 month retention=541.
As a side note, tape media is inherently prone to overconsumption. Tapes are often used for personal copies, thrown away at the first sign of wear, rarely used to their full media capacity and are occasionally discarded after their retention period expires. When determining the amount of tape necessary for your backups, it's best to overestimate.
The advantage of a disk-tape backup solution is that it changes the backup schedule and the rotation/retention policy. If you back up daily incrementals to disk, you obviously save on tape media, but it also eliminates the need of conducting full backups every week. Disk-based backups don't suffer from the disadvantages that tape backups have. Disk backups are RAID-protected (no loss in data integrity using multiple incremental backup sets), don't require multiplexing (random-access device vs. serial) and are always online for quick recoveries.
In fact, using disk for backup data offers better business continuance. Disk storage can be made highly available, where most tape solutions are prone to failure. If a tape library has a channel failure with the robotics, tape devices are unavailable for backup and recovery. Using disk-based backups, the rotation schedule can leverage incremental backups more effectively by eliminating the need for weekly full backups.
To maintain proper data retention, weekly full backups are replaced with cumulative incrementals kept on disk for three months. After that time, the backup images expire and are purged from the file system (see ""Rotation schedule with incremental backups"). Not only does this save 257 tapes, it also minimizes media management associated with those tapes, improves backup and recovery performance and can reduce the need to invest in additional tape technology.
Looking at the numbers
It's generally accepted that disk storage has a lower TCO vs. tape-based storage due to its ease of use and lower management overhead. Disk is also multifunctional compared to single-use tape. However, the inconvenience of having to manage and move tapes is greatly offset by the lower cost of the media. In our example (50GB daily for incrementals x four incrementals/week x four week retention + 250GB weekly for cumulative incremental x three weeks/month x three month retention=3TB) disk storage requirements will increase by 3TB. Consequently, there's a balance between what data is backed up to disk and what data is sent to tape.
The cost of using both disk and tape for backups is not only cost-effective, but provides a higher level of availability and business continuance (see "Using disk and tape for incremental backups," this page). Specific to database environments, another trade-off is the management of database transaction logs. Normally, full database backups truncate log files after they're copied to tape. Any logs that have already committed their transactions to the database are either archived or deleted. As indicated in Rotation schedule with incremental backups", full backups are performed at the end of every month, allowing the database log files to remain on disk. As a result, database log files will be monitored and managed.
Today, most organizations perform multiple backups during the day. Generally, this is accomplished through snapshot copies generated from the file system (Network Appliance's WAFL, Veritas VxFS), volume manager and utilities (Veritas VxVM, Sun StorEdge Instant Image) or through storage arrays (Sun/HDS ShadowImage, HP Business Copy VA). Either way, the data is readily available and safely stored on disk. Most organizations will then make a tape copy from the end-of-day snapshot--or better yet, replicate the snapshot to an off-site facility where tape copies are created.
Generating incremental snapshots during the day provides an extra level of data availability and is highly recommended. These point-in-time copies, however, don't replace the need of using third-party backup software. Commercial backup software provides proper auditing, management and configuration controls as well as extensive recoverability options for data protection requirements.
Mixing disk and tape for backups provides better flexibility, improved performance and enhanced business continuance. Disk backups are based on open standards and seamlessly integrate into an existing storage network infrastructure. The costs associated with using disk for backup purposes needs to be balanced against service level agreements and budgetary constraints. It's best for organizations to invest in extra disk capacity when budgeting, designing or building a SAN. Investing in additional capacity during a SAN deployment offers a lower cost of entry for the opportunity of incorporating disk storage into backup processes. In the long run, both disk and tape offer a cost-effective, highly available data protection solution.