This article can also be found in the Premium Editorial Download "Storage magazine: Tips for unifying storage management."
Download it now to read this article plus other related content.
|Three ways to use disk for backup|
Another way to reduce the effects of forever incremental (or progressive incremental, as TSM calls it) is to keep the backups of a given tape together. If you mix the backups of several different systems onto a set of tapes, you increase the number of tapes that must be loaded in order to restore each system. However, if you tell each system's backups to stay together, you minimize the number of tapes needed to restore each system.
NetBackup users could create a pool for each critical system. TSM users can turn on collocation. Just remember that either of these procedures requires at least one tape per system. However, you could specify several smaller pools of tapes, and point subsets of clients to each pool. That would reduce the effects of forever incremental without having to go to the extreme of complete collocation. You could also mix and match these methods based on the criticality of a given client.
It should be mentioned that incremental forever techniques, when used with a traditional backup system, currently only work with traditional file system backups. Databases still require the occasional full backup. Only replication-based backup systems can perform incremental-forever backups of databases.
Although many of the following disruptive technologies can be integrated into a traditional backup system, many of them could completely replace an older backup system. In other words, these technologies are turning the backup world upside down.
One example is replication-based backup. If you were to look at the typical data protection hierarchy diagram, it would start with backups, followed by mirroring and RAID, high availability and finally replication. Replication used to be the thing you did once you've done everything else. With replication-based backup, that's no longer the case. The big advantage to replication-based backup systems is that they usually use block-level incremental backups to constantly maintain a full native-format copy of the data on the backup system. There is no need to perform full backups again.
One disadvantage is that if data is fully replicated, logical corruption--such as the deletion of a file--is replicated as well. The system needs to be able to create and maintain states of the replicated data. This can be done via copy-on-write snapshots, by logging or by backing up the replicated data using a traditional backup system.
Replication-based backup comes in a variety of flavors, and is being provided by a number of companies that can be divided into three groups. There is storage-based replication (see "Storage and host-based replication"), such as EMC Corp.'s SRDF and Network Appliance Inc.'s SnapVault. The largest group contains host-based replication products, such as Veritas' Volume Replicator or NSI Software's Double-Take. A newer group contains independent products that are trying to combine the features of both (See "Replication-based, disk-based backup system").
Another difference with the implementation in "Storage and host-based replication" is that the tape-based backup of the replicated system is replaced with another replicated system in an off-site location. If this system is also maintaining state data using snapshots or logging, there's now an on- and off-site backup without tape. The only reason to use tape here is for archiving.
In order to consider a product a replication-based product, the data must remain in its native format for at least one leg of the system. However, there is another type of product that is similar, but does not maintain data in its native format. Storactive Inc. provides real-time backup (with logging) of applications such as Microsoft Exchange. While the backup system does not maintain the data in its native format, it provides constant, replication-type backup of Exchange without ever performing a full backup again.
Another very interesting product area contains those products that recognize that all data consists of blocks of ones and zeros, many of which are replicated throughout your environment. In various ways, these products treat each file (or sometimes a block) as a backup object. The big advantage of these products is that if a particular object has been backed up before, it doesn't need to be backed up again. This is also referred to as single instance store. Of course, a true single instance store system would never need to perform a full backup again, as most of the files that would be backed up in a full backup already reside on the backup system.
|Block-based single instance store|
A block-based single instance store stores only unique blocks or files in the backup system.
File-based single instance store systems compare files of similar names to ensure that they are the same, and only store one copy of a given file on the backup system. Block-based systems actually look inside a file, and store only unique blocks on the backup system (see "Block-based single instance store"). File 1 consists of blocks a, b, c and d. When it is backed up, all of these blocks are new, so they are transferred and stored by the backup system. However, when it backs up file 2, file 2 also contains a block that is identical to block d in file 1. Blocks e and f are new, so they are transferred to the backup system. However, block d is not sent again. The backup system only takes note that block d resides somewhere else as well. Since all blocks are simply patterns of ones and zeros, block d could be inside any two (or more) files, regardless of file, application or operating system type.
There are some systems that are designed from the ground up to provide file or block-level single instance store, such as Avamar or Connected. However, such functionality is starting to creep into other products such as TSM and NetBackup.
All of the backup systems mentioned in this article are available today. (There are many companies that provide similar systems that are not listed in this article due to reasons of space. Check the software directory at http://www.storagemountain.com/software-directory.html for a comprehensive, up-to-date listing of such products.) Whether or not any of them are right for your storage environment will depend on your particular application, and how far along you fall on the adoption curve. Some environments prefer systems that are tried-and-true; others prefer cutting-edge technology. The choice, along with the advantages and disadvantages, is up to you.
This was first published in February 2004