- W. Curtis Preston, Druva
How do you beat the backup window? Use snapshots and you can forget about backup windows.
Snapshot-based backup, also known as near-continuous data protection (or near-CDP), currently offers one of the most cost-effective, worry-free and efficient ways to provide operational recovery. Before we get into why, let's first look at a few of the alternatives.
Tape systems, when used in conjunction with a traditional data backup system, are the most cost-efficient way to provide long-term storage of backups and archives; however, the mismatch between the speed of backups and the speed of modern tape drives creates quite a challenge when performing operational backup and recovery directly to and from tape.
A popular option is to augment tape with a data deduplication target for your traditional backup system. This significantly improves the efficiency of traditional backup systems and reduces the amount of ongoing maintenance they need; however, dedupe systems aren't cheap. And while deduplication technology makes backups more efficient, it does so using a very inefficient process: Move 100 GB across the network, process it to shrink it to 10 GB and store it, and then process it again to restore or copy the 100 GB to tape.
Source deduplication is arguably more efficient than traditional backup as it removes the duplicate data before it's sent across the network. It's also a completely disk-based system that tends to offer good performance for both backup and restore. But restores from source dedupe systems are still performed just like those from traditional systems -- a bulk copy of data from one place to another.
Continuous data protection (CDP) systems are also very efficient and they can offer tighter recovery point objectives (RPOs) than any other type of system. However, they come with their own set of challenges. Most true CDP products are specialized so they work only with a particular operating system or application. There are a few CDP products designed to work with any operating system or application, but they're often expensive. Still, if you need RPOs measured in seconds or minutes you should absolutely consider a true CDP system.
Near-CDP: Snap and replicate
Near-CDP systems, which do snapshot and replication-based backups, are very efficient because, like CDP systems, they only transfer and store new blocks of data. The changes (or "deltas") can be easily stored on the primary system and replicated to a secondary system for backup. Snapshots must be replicated or backed up to tape because they depend on the primary volume for their data.
The true value of a near-CDP system is demonstrated during operational recovery. Their RPO can be measured in minutes, and their recovery time objective (RTO) is measured by how long it takes you to point the application from the primary storage system to the secondary storage system.
CDP vs. near-CDP
Just when is "continuous" not really continuous? In the real world the definition is pretty straightforward, but when it comes to continuous data protection (CDP) and near-CDP, things get a little hazy. The Storage Networking Industry Association (SNIA), a leading storage vendors' trade organization, offers its own definition of CDP:
"Continuous data protection (CDP) is a methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery points from any point in the past."
A true CDP system captures every change or new piece of data as soon as it's committed to disk and immediately replicates that change to another system. Near-CDP performs a similar function but does it periodically -- every 15 minutes, 30 minutes, hour, etc. -- so it's truly not "continuous" at all. Because it offers RTOs and RPOs that are very close to what CDP offers, many people refer to it as near-CDP; however, the term "near-CDP" isn't officially recognized by SNIA.
Not all snapshots are the same
If you're using one of the leading backup software products, you should ask the vendor how they accomplish near-CDP functionality. Some of them provide it completely within their product, but most accomplish it by controlling and reporting on the snapshot replication capabilities of a storage or virtualization system.
If you're going to rely entirely on snapshots for historical preservation of data, you need to ensure that the existence of dozens or hundreds of snapshots doesn't negatively impact the performance of your storage system. Therefore, the feasibility of a snapshot-based backup system depends highly on what type of snapshots your storage system provides.
The most common type of snapshot is the copy-on-write snapshot. A copy-on-write snapshot system copies a block before it overwrites it with a new block. Typically, the previous version of the block is copied to another volume, which has the advantage of leaving the structure of the source volume unchanged. One would think this would have performance advantages, but the opposite is true. That's because each write requires three separate I/O operations: a read of the previous block, a write of the previous block and a write of the new block. Over time, this can create quite a performance degradation on the primary storage system, which is why it's extremely rare to use a copy-on-write snapshot system for this purpose. Typically, copy-on-write snapshots are only used to create a stable image as a source for another backup system. This can be a traditional backup system that copies the snapshotted volume to a backup system or a more advanced system that replicates the snapshot to another storage system. If the snapshots are replicated, this allows you to leave very few snapshots on the primary system with all previous snapshots stored on the secondary system. This has the effect of minimizing the performance impact of the snapshots on the primary system while maintaining historical versions for operational recovery. If you're using a copy-on-write storage system and wish to move to a near-CDP-style backup, you'll need to adopt one of the approaches that allows you to limit the number of snapshots on the primary volume.
Redirect-on-write is a less common type of snapshot that writes the new block in a new location, leaving the previous version of the block in its place. The advantage of this approach is that it requires only one I/O operation to update a block (as opposed to three I/O operations with copy-on-write). This is why storage systems using this style of snapshot can store dozens or hundreds of snapshots without a significant degradation in performance. And it's precisely that feature that makes redirect-on-write-style snapshots the preferred snapshot method to use for a near-CDP backup system.
Sampler: Storage systems with redirect-on-write snapshots
- Cirtas Systems Bluejet Cloud Storage Controller
- Compellent Technologies Inc. Storage Center
- IBM XIV Storage System
- NetApp FASxxxx Series
- Nimble Storage Inc. CS-Series
- Oracle Corp. ZFS Storage 7xxx Series
Two caveats about redirect-on-write snapshots
There are two disadvantages to the redirect-on-write-style snapshots. The first is that all blocks -- both current and all previous versions -- are stored in the same volume. Over time, this can cause the current versions of the blocks that comprise a given volume to become fragmented. Be sure to consult with the vendor whose product you're considering to see how they deal with this fundamental design issue of redirect-on-write volumes.
The second, and much more dangerous, disadvantage of redirect-on-write snapshots is that the historical versions of blocks can cause the volume to become full, stopping all further writes to the volume until the issue is corrected. Copy-on-write systems avoid this issue by storing the historical versions of blocks in a different volume. If the history volume becomes full, it only stops updating the snapshots -- the current version of the volume is unaffected. However, redirect-on-write snapshots must keep the current and historical blocks in the same volume, creating the risk of filling up the volume with historical blocks. This is why users who opt for this approach to snapshots on their volumes must keep extra space in reserve, and must constantly monitor the volumes to ensure there's enough reserve space to keep up with the level of changes of any given volume. The more blocks change and the more frequently they're changed, the more space you're going to need for snapshots.
Vendors that don't offer redirect-on-write snapshots often use these disadvantages as FUD (fear, uncertainty and doubt) when talking to potential customers. Don't believe the FUD, but consider it a source of information that must be verified.
The first potential disadvantage (fragmentation) is easy to test for in a proof-of-concept test: Test the performance before/after the creation of dozens or hundreds of snapshots -- after updating thousands of blocks, of course.
The second potential disadvantage is a very real one and simply must be monitored. If you run out of space because of your snapshot data, your volume will stop updating and your application will crash. If you're not experienced with this type of snapshot, follow the vendor's most conservative estimates on how much space to keep in reserve. Over time, monitoring how much space is taken up by snapshots should allow you to develop a much better estimate that's more appropriate for your environment. If you monitor things properly, the worst that should ever happen is that you have to delete more snapshot history than you would to make sure your volume doesn't stop functioning (see "Sampler: Storage systems with redirect-on-write snapshots," below).
Hold that pose
Any type of structured data requires special treatment before creating a snapshot of the volume it's stored on. At a minimum, without this special attention, your app will go into crash recovery mode after recovery and possibly cause a given snapshot to be completely worthless for restore. Therefore, be sure to research the proper way to prepare your application prior to creating a snapshot.
Windows solves this problem using Volume Shadow Copy Service (VSS). A backup system that's about to create a snapshot of a volume simply needs to communicate its intention to VSS. (To do this, it must be capable of being a VSS requestor.) VSS then provides the requestor a list of applications for which it requires VSS intervention prior to taking snapshots. The requestor then communicates with each application's VSS writer. Once an application has been prepared for the snapshot, the requestor asks VSS to create the snapshot. VSS then informs the VSS snapshot provider to create the snapshot. (The snapshot provider can be Windows itself, or a storage or virtualization system like those discussed earlier.) Once the snapshot has been successfully created, the requestor can inform the supported applications (via its VSS writer) that they have been backed up, which allows them to do things like truncate their transaction logs.
Unfortunately, VSS functionality (or any meaningful equivalent) doesn't exist for Unix- or Linux-based operating systems. So if you plan to use snapshots with Unix systems, you'll need to use an application that can accomplish the same steps, or you'll have to write a script that communicates directly with the applications.
Restoring snapshot data
There are a number of ways to do restores with near-CDP systems. The most common is to make the historical versions of files available as a subdirectory underneath the originating directory. When a previous version of a file is needed, a user can simply point their file browser to the appropriate directory, locate the file, and then copy and paste it.
Another type of restore happens when a user is looking for a file and isn't quite sure where or when it was last seen. This type of restore is very easy to do in traditional backup products because they have a database that tracks the location of all files and all versions of those files. However, most near-CDP storage systems don't have similar capability. It's one reason why many companies use their traditional backup product to configure, schedule and report on their near-CDP backups. Depending on the capabilities of your backup product, it can create a catalog of all snapshots it's controlling, allowing you to search this catalog during restores.
The most valuable type of restore a near-CDP system can perform is when you lose an entire volume or a directory containing the virtual disk volumes that comprise a virtual machine (VM). While in most cases it must be performed manually, it's a relatively simple process to point NFS or CIFS clients to a different server, or to mount a VM from a different location. This is when a near-CDP system truly pays for itself because it allows you to perform this "restore" in a few moments, rather than several hours. Once the problem with the production volume has been corrected, you can do a reverse restore from the backup system to the primary system and revert back to the primary system once that restore has been completed. After you've done this type of restore once, you won't want to go back to the "old days."
Finally, it's critical to monitor and report on the success/failure of your near-CDP backups. This functionality may be provided by your storage vendor, but it's most likely provided by your backup software vendor and their partners. This is another reason why you should consider controlling your near-CDP backups via your backup software product, even if all it's doing is acting as a traffic cop. Having all your backup functionality in one place is a good thing.
Don't embark on a near-CDP backup project hastily. Check out your vendor's capabilities and perform a proof-of-concept test before signing any purchase orders. And make sure all the good things about your current backup system -- centralized scheduling, cataloging, monitoring and reporting -- don't disappear when you deploy your shiny new near-CDP system.
BIO: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert. Curtis has worked extensively with data deduplication and other data reduction systems.