This article can also be found in the Premium Editorial Download "Storage magazine: Is storage virtualization ready for the masses?."
Download it now to read this article plus other related content.
A snapshot is an image or copy of a defined collection of data created instantly at a point-in-time. Copies are made almost immediately within the disk subsystem, despite the size of the volume.
A primary use for a snapshot is to facilitate non-disruptive backups. Essentially, the snapshot image becomes the source of the backup. After quiescing the application, the copy only takes a moment to create, so the user shouldn't notice any delay.
Traditional backups require the application to be shut down during the backup routine. This process typically occurs at night or off-hours. As more data has to be copied to tape, the race to sunrise is performed each night by the operations staff. Adjustments to systems and processes periodically must be made to meet the morning production deadline.
Since the snapshot provides a near-line, or additional disk-based copy of the data, the snapshot can be used as a source for restoring information. The most common reason to restore information is user error. For example, a user may inadvertently delete a file or make changes that need to be reversed. The ability to have another copy of the data readily available on disk provides a quick and easy way to locate and reinstate selected files.
Snapshot images also provide a convenient source for testing and training environments and for data mining purposes. Traditional methods of duplicating large amounts of data can be expensive and time-consuming, thus, the efficiency of snapshot is becoming increasingly valuable.
Know what you're getting
"Not all snapshots are the same - in fact, they vary from product to product," says Dennis Martin, storage management software analyst for the Evaluator Group. "The best advice is to make sure you know what you're trying to accomplish, and then investigate how each product meets your objectives so you can select the best fit."
Implementations of snapshot vary from vendor to vendor. Some implementations allow the snapshot image to be written or updated, although some may be tightly integrated with the backup software. Additionally, some techniques require less disk space for the copy. The two primary techniques are copy-on-write and split-mirror.
When a copy of data is requested using the copy-on-write technique, the disk subsystem simply sets up a second pointer - a snapshot index - and represents it as a new copy. Just as a Windows shortcut to an application appears to be a complete copy of the application, the snapshot volume appears to be a full copy of the data. To the user, it's the same.
Here's how it works: A snapshot is a logical copy of the data that gets created by saving the original data to a snapshot index whenever data in the base volume is updated. Essentially, the snapshot process creates an empty snapshot index, holding the original values that later change in the base volume after the time of snapshot creation. The snapshot only takes as long as needed to build a snapshot index - again - a nearly instantaneous creation. It's recommended that the base volume be quiesced during the snapshot, so a stable image of the moment in time is available.
The snapshot is actually seen by combining the base volume data with the snapshot index containing original data changed in the base volume. Thus, the snapshot gives an accurate image of an exact copy of the data at the moment the snapshot was taken. This copy-on-write technology enables the instantaneous nature of the snapshot, while only requiring a fraction of the base volume disk space (see "Taking a copy-on-write snapshot").
In addition to the convenience of the instantaneous nature, copy-on-write technology provides efficiency by requiring only a fraction of the base volume disk space. The average disk space requirements for a snapshot copy are 10% to 20% of the base volume space. The actual space depends on how long the snapshot is active and how many writes are made to the base volume (i.e., snapshot index). Except in a heavy write environment or when the copy is required to be active for a long time, copy-on-write is efficient.
A copy-on-write snapshot is effective as a backup source image. Since the disk space requirements are less than a full volume copy, periodic snapshots can be made throughout the day as copy points to reference in the event a restore is needed - for example, to restore a file that was inadvertently corrupted or deleted. If only one hour of lost productivity can be tolerated, a snapshot could be taken every hour and copied at night to tape for archival or disaster recovery purposes, using the snapshot as the source of the backup.
Managing multiple snapshots
Checkpointing is a tool to manage multiple snapshot images in aggregate. This is beneficial because only one index update is required for the group. This is similar to notifying a family doctor of an address change and having all respective family members get their file updated. Using the checkpointing analogy, only one address change would need to be entered, and all associated family members would reflect the change.
This was first published in June 2002