Continuous data protection; it's back!

When CDP products first appeared a few years ago, the benefits were clear, but implementation and other issues quickly stifled interest. Now CDP is making a comeback, and it might just be the future of data backup.

This article can also be found in the Premium Editorial Download: Storage magazine: Good match: iSCSI and vSphere:

When CDP products first appeared a few years ago, the benefits were clear, but implementation and other issues quickly stifled interest. Now CDP is making a comeback, and it might just be the future of data backup.

Continuous data protection (CDP) and related products are the future of backup. There's no question CDP products failed to live up to the hype when they first appeared several years ago. But it's also true that the way CDP was (and is) designed solves virtually every major problem that has plagued backup and recovery systems for decades, and offers recovery time objectives (RTOs) and recovery point objectives (RPOs) that traditional backup systems can only dream of. Current CDP products have also addressed most of the shortcomings the first batch of products had. The CDP buzz may be gone, but the reality of CDP is stronger than ever.

A few years ago it seemed like every other booth at storage trade shows was occupied by a CDP vendor, and a steady stream of technical articles extolled the virtues of continuous data protection. But hardly anybody bought the story or the products. Some pundits even joked that CDP stood for "Customers Didn't Purchase." The failure of continuous data protection was so complete that only two of the original CDP vendors were left standing. The others were acquired by larger companies that believed in the technology enough to buy a product that often had few or no customers.

Why CDP 1.0 tanked

So, if CDP was such a good idea, why didn't anyone buy it? There are several reasons.

First, most of the companies offering CDP were startups. The worry, of course, was that you would invest money, time and energy in a startup and its product only to see the company go out of business. Sadly, those fears were realized in this case: Asempra Technologies Inc., Double-Take Software Inc., FilesX, Kashya Inc., Mendocino Software Inc., Revivio Inc., Storactive Inc. and XOsoft Inc. were all acquired by other companies, and some (although not all) of these acquisitions resulted in very rocky experiences for the few customers who had purchased their CDP products.

Continuous data protection was also a big pill to swallow. While you could technically run a CDP system in parallel with your traditional backup system, very few people had the budget or time to do that. Therefore, you had to justify replacing your production backup system with CDP. But because it was so different from what people were used to, CDP was hard to fully understand and was a hard sell to replace traditional backup.

Another real problem was that the products sometimes weren't fully up to the task. For example, users were often forced to choose between an on-site or off-site copy of their data because most CDP products couldn't deliver both. This meant one product had to be used for operational recovery and another for disaster recovery (DR). Many CDP products were also ignorant of the applications they were backing up. Continuous data protection vendors said they had no more of a requirement to understand applications than a storage array did. Technically true perhaps, but it didn't give users the warm fuzzy feeling they were used to; they wanted a CDP product that was application-aware. CDP also required a lot more storage than traditional backup products or snapshots, so CDP customers were unable to have very long retention periods. This required them to have a separate solution for long-term retention.

Finally, many people viewed CDP as the Star Trek of the backup industry -- a great idea before its time. Star Trek, maybe not fully understood when it first aired, was canceled after three seasons. Similarly, many people thought CDP was a solution looking for a problem, and most shops could meet their backup and recovery requirements without completely changing the way they did backups, which was required with continuous data protection.

What is near-CDP?

When continuous data protection (CDP) products first appeared, they created quite a buzz, and marketing departments love buzz. But there were other companies with products that continuously protected data, and they wanted to use the CDP moniker, too.

CDP vendors like Kashya Inc. and Revivio Inc. objected, saying that snapshots weren't CDP. They also noted that snapshots can only recover to a particular point in time, while continuous data protection can recover to any point in time. Hence the term near-CDP was coined, allowing snapshot-based vendors to steal some of the CDP buzz.

But years later, the term near-CDP is still not in the Storage Networking Industry Association (SNIA) lexicon. Purists say you're either continuous or you're not, but others think it's still the best term we have to describe snapshots coupled with replication.

Near-CDP systems have more in common with CDP than with traditional backup. CDP and near-CDP systems transfer only changed blocks to the backup system. There are no repeated full backups, and if only a few bytes change in a file, only a few bytes are sent to the recovery system. They also transfer the changed blocks to the recovery systems throughout the day, rather than in a large batch process at night. And both CDP and near-CDP systems provide instantaneous recovery and can offer recovery points from a few seconds to an hour, depending on implementation.

The only important difference between CDP and near-CDP is the ability of continuous data protection to offer a recovery point objective (RPO) of zero (or almost zero), and it doesn't require the creation of application-aware snapshots up front. However, most CDP users create snapshots anyway and recover to those snapshots, preferring a known stable point in time to a more recent recovery point that will require a crash recovery process. So, maybe the CDP vs. near-CDP debate is a lot of arguing over nothing.

New life for CDP

There are now several CDP products that are doing quite well, so what changed? Perhaps the most important change is that most of today's CDP products are offered by mainstream backup vendors. In fact, almost every major backup software company now has a CDP offering. Users don't have to accept an all-new paradigm and an all-new backup vendor to get CDP functionality.

The next big reason for the resurgence of CDP is that the products have come a long way since they first appeared on the market. For example, you no longer have to choose between an on-site and off-site copy; you can have both with a single product.

Today's successful CDP systems also know a lot more about the data they're backing up. They offer integration points with many popular applications such as Microsoft Exchange, Oracle and SQL Server. While a true CDP product doesn't need to create snapshots and can recover to any point in time, this integration allows the application or backup system administrator to create points in time where a known good copy of the data resides. Administrators may opt to not use these known good recovery points during a recovery operation, but they have the peace of mind of knowing they're there.

And, like Star Trek, it may be time for CDP: The Next Generation. Some servers have grown tremendously in just the last few years, and the RTOs and RPOs for those large servers have become more stringent. Consider a 300 TB database that's mission critical for a company, with potentially millions of people using their service 24/7. The database backup system has to provide an instant recovery with no loss of records; this is only possible with CDP.

Also figuring into the picture are data loss notification laws, enacted by 35 states and the European Union, that require many companies to add encryption systems to allow them to safely transport personal information on backup tapes. However, encryption systems can be expensive, cause slow backups and require management of encryption keys. With CDP, a company can have on-site and off-site copies of their data without ever touching a tape, thus avoiding encryption entirely.

Server virtualization has taken off during the last few years, and the technology could benefit from continuous data protection. While you may not have individual servers with data stores in the double-digit terabyte range, it's possible the storage used by VMware, Microsoft Hyper-V or Citrix Systems XenServer is indeed that big. Consider what would happen if a 15 TB storage array containing virtual machine (VM) images suddenly disappeared -- it could take out dozens or hundreds of virtual machines. Couple that with the fact that backing up and recovering those virtual machines using traditional methods is one of the more difficult tasks a backup system architect has to consider. Physics is your enemy; 20 virtual machines on a single physical machine perform like one physical machine during backup.

But if physics is your enemy, CDP is your best friend. A good CDP product places no more load on your VM than a typical virus protection package, and it's able to recover one or all of your VMs instantaneously with no data loss. Server virtualization alone could herald the comeback of continuous data protection.

CDP product sampler

AppAssure Software Inc. Replay 4
Atempo Inc. Live Backup
CA Technologies CA ARCserve Replication
Cofio Software Inc. AIMstor CDP
EMC Corp. RecoverPoint
FalconStor Software Inc. FalconStor Continuous Data Protector
IBM Tivoli Storage Manager FastBack
InMage DR-Scout
Symantec Corp. NetBackup RealTime
Vision Solutions Inc. Double-Take Backup

A look inside CDP

The Storage Networking Industry Association (SNIA) defines CDP "as a methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery points from any point in the past . . . data changes are continuously captured . . . stored in a separate location . . . [and RPOs] are arbitrary and need not be defined in advance of the actual recovery."

Please note that you don't see the word "snapshot" above. While it's true that many of today's CDP systems allow users to create known recovery points in advance, they're not required. To be considered CDP, a system must be able to recover to any point in time, not just to when snapshots are taken.

CDP systems start with a data tap or write splitter. Writes destined for primary storage are "tapped" or "split" into two paths; each write is sent to its original destination and also to the CDP system. The data tap may be an agent in the protected host or it can reside somewhere in the storage network. Running as an agent in a host, the data tap has little to no impact on the host system because all the "heavy lifting" is done elsewhere. CDP products that insert their data taps in the storage network can use storage systems designed for this purpose, such as Brocade Communications Systems Inc.'s Storage Application Services API, Cisco Systems' MDS line and its SANTap Service feature or EMC Clarion's built-in splitter functionality. Some CDP systems offer a choice of where their data tap is placed.

Users then need to define a consistency group of volumes and hosts that have to be recovered to the same point in time. Some CDP systems allow the creation of a "group of groups" that contains multiple consistency groups, creating multiple levels of granularity without sacrifice. Users may also choose to perform application-level snapshots on the protected hosts, such as placing Oracle in backup mode or performing Volume Shadow Copy Service (VSS) snapshots on Windows. (Remember, snapshots aren't required.) Some CDP systems simply record these application-level snapshots when they happen, while others provide assistance to perform them. It's very helpful when the continuous data protection system maintains a centralized record of application-level snapshots, as they can be very useful.

Each write is transferred to the first recovery device, which is typically another appliance and storage array somewhere else within the data center. This proximity to the data being protected allows the writes to be either synchronously replicated or asynchronously replicated with a very short lag time. Even if a CDP system supports synchronous replication, most users opt for asynchronous replication to avoid any performance impact on the production system. A CDP system may support an adaptive replication mode where it replicates synchronously when possible, but defaults to asynchronous during periods of high activity.

The data is stored in two places: the recovery volume and the recovery journal. The recovery volume is the replicated copy of the volume being protected and will be used in place of the protected volume during a recovery. The recovery journal stores the log of all writes in the order they were performed on the protected volume; it's used to roll the recovery volume forward or backward in time during a recovery. It may also be used as a high-speed buffer where all writes are stored before they're applied to the recovery volume. This design allows the recovery volume to be on less-expensive storage as long as the recovery journal uses storage that is as fast as or faster than the protected volume.

Once data has been copied to the first recovery device it can then be replicated off-site. Due to the behavior of WAN links, the CDP system needs to deal with variances in the available bandwidth. So it has to be able to "get behind" and "catch up" when these conditions change. With some systems you can define an acceptable lag time (from a few seconds to an hour or more), which translates into the RPO of the replicated system. The CDP system sends all of the writes that happened as one large batch. If an individual block was modified several times during the time period, you can specify that only the last change is sent in a process known as "write folding." This obviously means that the disaster recovery copy won't have the same level of recovery granularity as the on-site recovery system, but it may also mean the difference between a system that works and one that doesn't.

Modern continuous data protection also offers a built-in, long-term storage alternative. You can pick a short time range (e.g., from 12:00:00 pm to 12:00:30 pm every day) and tell the CDP system to keep only the blocks it needs to maintain only those recovery points, and to delete the blocks that were changed in between. Users who take application-level snapshots typically coordinate them to coincide with their recovery points for consistency purposes. This deletion of extraneous changes allows the CDP system to retain data for much longer periods of time. For longer retention periods, it's also possible to back up one of these recovery points to tape and then expire it from disk. Many companies use all three approaches: retention of every change for a few days, hourly recovery points for a week or so, then daily recovery points after that, followed by tape copies after 90 days or so.

The true wonder of continuous data protection is how it handles a recovery. A CDP system can instantaneously present a LUN to whatever application needs to use it for recovery or testing, rolled forward or backward to whatever point in time desired. (As noted, many users choose to roll the recovery volume back to a point in time when they created an application consistent image. Although this means they'll lose any changes between that point in time and the current time, many prefer rolling back to a known consistent image rather than going through the crash recovery process.)

Depending on the product, the recovery LUN may be the actual recovery volume (rolled forward or backward), a virtual volume designed mainly for testing a restore, or something in the middle where the recovery volume is presented to the application as if it has already been rolled forward or backward, when in reality the actual rolling forward or backward is happening in the background. Some systems can simultaneously present multiple points in time from the same recovery volume.

Once the original production system has been repaired, the recovery process is reversed. The recovery volume is used to rebuild the original production volume by replicating the data back to its original location. (If the system was merely down and didn't need to be replaced, it's usually possible just to update it to the current point in time by sending over only the changes that have happened since the outage.) With the original volume brought up to date, the application can be moved back to its original location and the direction of replication reversed.

Compare that description of a typical CDP-based recovery scenario to the recovery process required by a traditional backup system, and you should get a good idea of why continuous data protection is the future of backup and recovery.

BIO: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert.

This was first published in August 2010

Dig deeper on Data management tools



Enjoy the benefits of Pro+ membership, learn more and join.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: