This article can also be found in the Premium Editorial Download "IT in Europe: Tips for cost-effective disaster recovery."
Download it now to read this article plus other related content.
The most common type of snapshot is the copy-on-write snapshot. A copy-on-write snapshot system copies a block before it overwrites it with a new block. Typically, the previous version of the block is copied to another volume, which has the advantage of leaving the structure of the source volume unchanged. One would think this would have performance advantages, but the opposite is true. That's because each write requires three separate I/O operations: a read of the previous block, a write of the previous block and a write of the new block. Over time, this can create quite a performance degradation on the primary storage system, which is why it's extremely rare to use a copy-on-write snapshot system for this purpose. Typically, copy-on-write snapshots are only used to create a stable image as a source for another backup system. This can be a traditional backup system that copies the snapshotted volume to a backup system or a more advanced system that replicates the snapshot to another storage system. If the snapshots are replicated, this allows you to leave very few snapshots on the primary system with all previous snapshots stored on the secondary system. This has the effect of minimizing the performance impact of the snapshots on the primary system while maintaining historical versions for operational recovery. If you're using a copy-on-write storage system and wish to move to a near-CDP-style backup, you'll need to adopt one of the approaches that
allows you to limit the number of snapshots on the primary volume.
Redirect-on-write is a less common type of snapshot that writes the new block in a new location, leaving the previous version of the block in its place. The advantage of this approach is that it requires only one I/O operation to update a block (as opposed to three I/O operations with copy-on-write). This is why storage systems using this style of snapshot can store dozens or hundreds of snapshots without a significant degradation in performance. And it's precisely that feature that makes redirect-on-write-style snapshots the preferred snapshot method to use for a near-CDP backup system.
Two caveats about redirect-on-write snapshots
There are two disadvantages to the redirect-on-write-style snapshots. The first is that all blocks -- both current and all previous versions -- are stored in the same volume. Over time, this can cause the current versions of the blocks that comprise a given volume to become fragmented. Be sure to consult with the vendor whose product you're considering to see how they deal with this fundamental design issue of redirect-on-write volumes.
The second, and much more dangerous, disadvantage of redirect-on-write snapshots is that the historical versions of blocks can cause the volume to become full, stopping all further writes to the volume until the issue is corrected. Copy-on-write systems avoid this issue by storing the historical versions of blocks in a different volume. If the history volume becomes full, it only stops updating the snapshots -- the current version of the volume is unaffected. However, redirect-on-write snapshots must keep the current and historical blocks in the same volume, creating the risk of filling up the volume with historical blocks. This is why users who opt for this approach to snapshots on their volumes must keep extra space in reserve, and must constantly monitor the volumes to ensure there's enough reserve space to keep up with the level of changes of any given volume. The more blocks change and the more frequently they're changed, the more space you're going to need for snapshots.
Vendors that don't offer redirect-on-write snapshots often use these disadvantages as FUD (fear, uncertainty and doubt) when talking to potential customers. Don't believe the FUD, but consider it a source of information that must be verified.
The first potential disadvantage (fragmentation) is easy to test for in a proof-of-concept test: Test the performance before/after the creation of dozens or hundreds of snapshots -- after updating thousands of blocks, of course.
The second potential disadvantage is a very real one and simply must be monitored. If you run out of space because of your snapshot data, your volume will stop updating and your application will crash. If you're not experienced with this type of snapshot, follow the vendor's most conservative estimates on how much space to keep in reserve. Over time, monitoring how much space is taken up by snapshots should allow you to develop a much better estimate that's more appropriate for your environment. If you monitor things properly, the worst that should ever happen is that you have to delete more snapshot history than you would to make sure your volume doesn't stop functioning (see "Sampler: Storage systems with redirect-on-write snapshots," below).
Hold that pose
Any type of structured data requires special treatment before creating a snapshot of the volume it's stored on. At a minimum, without this special attention, your app will go into crash recovery mode after recovery and possibly cause a given snapshot to be completely worthless for restore. Therefore, be sure to research the proper way to prepare your application prior to creating a snapshot.
This was first published in January 2011