Continuous data protection; it's back!


This article can also be found in the Premium Editorial Download "Storage magazine: Good match: iSCSI and vSphere."

Download it now to read this article plus other related content.

Each write is transferred to the first recovery device, which is typically another appliance and storage array somewhere else within the data center. This proximity to the data being protected allows the writes to be either synchronously replicated or asynchronously replicated with a very short lag time. Even if a CDP system supports synchronous replication, most users opt for asynchronous replication to avoid any performance impact on the production system. A CDP system may support an adaptive replication mode where it replicates synchronously when possible, but defaults to asynchronous during periods of high activity.

The data is stored in two places: the recovery volume and the recovery journal. The recovery volume is the replicated copy of the volume being protected and will be used in place of the protected volume during a recovery. The recovery journal stores the log of all writes in the order they were performed on the protected volume; it's used to roll the recovery volume forward or backward in time during a recovery. It may also be used as a high-speed buffer where all writes are stored before they're applied to the recovery volume. This design allows the recovery volume to be on less-expensive storage as long as the recovery journal uses storage that is as fast as or faster than the protected volume.

Once data has been copied to the first recovery device it can then be replicated off-site. Due to the behavior of WAN links, the CDP system

Requires Free Membership to View

needs to deal with variances in the available bandwidth. So it has to be able to "get behind" and "catch up" when these conditions change. With some systems you can define an acceptable lag time (from a few seconds to an hour or more), which translates into the RPO of the replicated system. The CDP system sends all of the writes that happened as one large batch. If an individual block was modified several times during the time period, you can specify that only the last change is sent in a process known as "write folding." This obviously means that the disaster recovery copy won't have the same level of recovery granularity as the on-site recovery system, but it may also mean the difference between a system that works and one that doesn't.

Modern continuous data protection also offers a built-in, long-term storage alternative. You can pick a short time range (e.g., from 12:00:00 pm to 12:00:30 pm every day) and tell the CDP system to keep only the blocks it needs to maintain only those recovery points, and to delete the blocks that were changed in between. Users who take application-level snapshots typically coordinate them to coincide with their recovery points for consistency purposes. This deletion of extraneous changes allows the CDP system to retain data for much longer periods of time. For longer retention periods, it's also possible to back up one of these recovery points to tape and then expire it from disk. Many companies use all three approaches: retention of every change for a few days, hourly recovery points for a week or so, then daily recovery points after that, followed by tape copies after 90 days or so.

The true wonder of continuous data protection is how it handles a recovery. A CDP system can instantaneously present a LUN to whatever application needs to use it for recovery or testing, rolled forward or backward to whatever point in time desired. (As noted, many users choose to roll the recovery volume back to a point in time when they created an application consistent image. Although this means they'll lose any changes between that point in time and the current time, many prefer rolling back to a known consistent image rather than going through the crash recovery process.)

Depending on the product, the recovery LUN may be the actual recovery volume (rolled forward or backward), a virtual volume designed mainly for testing a restore, or something in the middle where the recovery volume is presented to the application as if it has already been rolled forward or backward, when in reality the actual rolling forward or backward is happening in the background. Some systems can simultaneously present multiple points in time from the same recovery volume.

Once the original production system has been repaired, the recovery process is reversed. The recovery volume is used to rebuild the original production volume by replicating the data back to its original location. (If the system was merely down and didn't need to be replaced, it's usually possible just to update it to the current point in time by sending over only the changes that have happened since the outage.) With the original volume brought up to date, the application can be moved back to its original location and the direction of replication reversed.

Compare that description of a typical CDP-based recovery scenario to the recovery process required by a traditional backup system, and you should get a good idea of why continuous data protection is the future of backup and recovery.

BIO: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert.

This was first published in August 2010

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: