This article can also be found in the Premium Editorial Download "Storage magazine: Five cutting-edge storage technologies."
Download it now to read this article plus other related content.
|iSCSI inspires low-cost storage networks|
|Backup applications vary considerably in their implementation,|
| performance and management. Here are some tuning factors to consider for leading backup applications:
VERITAS NETBACKUP RESTORE CONSIDERATIONS
Fragment size. Some applications like NetBackup let you specify backup fragment size. For instance, a fragment size of 2GB means a 100GB backup will be broken into 50 separate fragments when written to tape. When restoring data, rather than scanning the entire 100GB, NetBackup forwards the tape to the specific fragment containing the requested data, resulting in faster restores.
TIVOLI STORAGE MANAGER (TSM) RESTORE CONSIDERATIONS
Often, full TSM client restores take forever because the client data is spread across too many tape volumes. As dozens, or hundreds, of volumes are mounted for a restore, critical restore time is lost due to tape mounts, dismounts, mount wait settings and data seek time sinks. The key to successfully leveraging tape for restore in a TSM environment boils down to application policies and data classification. Streaming high-volume/large-file client data directly to tape maximizes tape drive performance in most environments. The same holds true for data restores from tape. TSM application policies also play an integral role in optimizing tape performance. TSM policies that directly affect tape restore performance include collocation, maximum number of mount points and resource utilization.
Collocation is a storage pool configuration parameter that can be configured to collocate data by client, by group of clients (available in v. 5.2 only) or by file space. Collocation by client means that a particular client's backup data is stored on tapes only for that client. This reduces the number of mount points required for a large restore and lowers the restore time. The downside: lower tape utilization requires more library space, and space reclamation processing can become time-consuming with a greater distribution of client data across physical volumes.
A handful of server and client settings also dramatically affect the amount of data TSM can move to and from tape for a given client. Increasing the client's maximum number of tape drive mount points and also increasing the client's resource utilization setting allows a client to run multiple data sessions to or from multiple tape devices.
For critical clients, many TSM users also run the occasional selective (full) backup to further reduce the number of mounts required for restores. Other critical considerations include network infrastructure settings and policy so that your network doesn't become the bottleneck once you optimize the application's tape use.
LEGATO NETWORKER RESTORE CONSIDERATIONS
Pools. Most backup applications allow data classification data based on various dependencies such as data type, backup start time, etc. Within Legato NetWorker, pools are used to distinguish what data is sent to specific volume sets. By default, NetWorker sends all data to one pool. Once a volume is assigned to a pool, only data that meets the specific criteria will be written to that volume. Most users split data based on data type, retention period or for off-site purposes when cloning is not in use. Pools should be used with caution. They provide a great way to separate mission-critical data, but can make NetWorker more difficult to administer.
Dedicated Storage Nodes. NetWorker provides the option to dedicate tape drives to specific clients, thereby upgrading them to storage nodes. The client will need to directly attach--via a SCSI or a storage area network (SAN) connection--to the library/tape drives. There are two ways to license a storage node. First, a full storage node license can be purchased, which allows backing up all local data directly to tape, as well as the ability to send other client data to the library/tape drives directly attached to the storage node. The second option is a dedicated storage node license, which doesn't allow other client machines to send their data to the storage node. The dedicated storage node license is considerably less expensive than a full storage node license, making it a wise investment when performing backups of large servers. Users also can implement dynamic drive sharing (via a SAN) to allow any system attached to the SAN to upgrade to dedicated storage nodes and to share multiple tape drives for backup and restore. The data is not multiplexed between clients--only one storage node can allocate the tape drive at a given time--allowing faster client restores.
Cloning and staging. All enterprise backup software creates duplicate copies of data. Cloning, with an off-site media rotation schedule, ensures that copies of data are available in a disaster-recovery situation, as well as retaining an onsite data set for recovery. Cloning in NetWorker demultiplexes the data, thereby creating contiguous savesets on the cloned volume. In restore situations, using the clone volume allows NetWorker to spend less time forwarding throughout the tape and more time restoring the data. Users with large amounts of disk available also can stage data to disk first (which is also demultiplexed) and then clone the same demultiplexed data to tape later. In NetWorker v. 7, using the advanced file type device option allows simultaneous read/write, which can greatly speed up the cloning process.
--Natalie Mead, with Nate Kosta
Exacerbating the tape restore problem is that few companies proactively monitor, report and remediate issues within their tape-based backup environments. This requires a great deal of effort and manpower, and an understanding of the tape infrastructure. In many cases, time is the limiting factor, leaving risky restores as an unavoidable consequence. In order to minimize the chances of an unsuccessful or time-consuming restore, it's essential that you prepare by optimizing your backup infrastructure for recovering data. This involves developing best practices for the backup infrastructure and refining overall operational approaches.
Best practices for better restores
Critical factors that are related to restore performance include backup application configuration, network configuration, media management and the client environment during a restore. The following guidelines will help increase the likelihood of successful file restores.
It's important to reduce disk drive contention for restored data. During restores, you should disable applications that may be accessing the same disks to which the data is being restored. Also, you should disable packet-reading software as well. Then there's virus-protection software, which when set to its highest protection level, scans every incoming and newly created file. During a restore, the recovered files appear as new files which would be scanned, thereby significantly slowing down the restore.
Some clients have too much data to back up over the network within an allocated backup window. For those hosts, backing up to dedicated tape drives can reduce the amount of time required to back up and recover data. Also, when possible, tune the network buffer size of the client's network card to match the tape drive buffer. This ensures that the recovered packets do not overrun or underfill the buffers. It also will help to modify the data transfer buffer sizes to match the tape drives. If data is sent in packets that are too small, the drives will end up spinning cycles waiting for data, and there will be empty space between data blocks on the tape. The further data is spread out on the tape, the longer it will take to restore.
You should also make sure you match the throughput of the host bus adapter to the drive throughput. If you attach 10 LTO-2 drives (30MB/sec each, for a total of 300MB/sec) to one 1Gb/sec host bus adapter (a theoretical maximum of 128MB/sec), data won't stream to the drives during backup. The sporadic nature of the data transfer will spread the data blocks across the tape, requiring even more time to restore the data.
It's also important for you to regularly expire your media. While the type of media you use does not typically affect restore times, the condition of the media does. As media experiences more and more read/write passes, the integrity of the media begins to break down, which can cause media errors. It's possible that data will be written to tape successfully, but then won't be readable because of the media's degradation. You should also be sure to clean your drives, too. If the backup fails because of dirty tape drives, all the preparation in the world won't help you.
This next tip may sound obvious, but you should make certain that the drive is available. The throughput of new tape drives often exceeds the total throughput of the data sent to the drives (slower networks or too few host bus adapters are typical causes). The result may be that the 10 new LTO-2 drives you just purchased actually run slower than the DLT7000 drives they replaced. There are several ways you can remedy this problem, but most of them involve making additional expenditures. These include things like upgrading the network infrastructure or replacing the backup server with a higher end system.
There is an alternative way to create a balanced tape, network and backup server infrastructure, which is to reduce the total number of drives being used during a backup. Not only will this improve backup performance, it also will leave some drives free for restores in the case they are needed. If a restore request is received during the backup window, and all the drives are actively backing up data, either the restore isn't performed until the backups finish, or active backup jobs are killed to handle the restore request.
If the data is critical to your business operations (and therefore, has a short recovery time objective), you should consider implementing additional solutions, such as snapshot or raw-disk backups, to improve backup and recovery performance. Either of these processes may incur additional expenditures, but if the data really is truly mission-critical, it becomes easier to justify the cost of going beyond traditional tape backup.
This was first published in October 2004