Many experts say CDP will replace traditional backup. But before you take the plunge, here are some key points...
Continuous data protection (CDP) is getting almost as much press as Apple's iPod. CDP tracks data modifications and stores changes independent of the primary data, and lets you recover data in seconds from any point in the past. Some pundits have postulated that CDP will replace backup. But is that just new technology hyperbole or is CDP a must-have technology?
CDP provides a recovery point objective (RPO) and recovery time objective (RTO) of essentially zero data loss with a very fast recovery. Defined another way, CDP is a time-stamped backup stored on secondary disk. The appeal of CDP is its ability to quickly rewind applications to any point in time to find a consistent image of the data.
But don't confuse CDP with mirroring, which provides data protection only from hardware failures. If data is corrupted or deleted on the primary system, it will be on the mirrored copy, too. CDP provides protection from both hardware and data failures.
CDP is also sometimes confused with so-called "fine-grain" snapshots. Snapshot products capture changes as points in time, with every snapshot checked for consistency before the next one is taken. This is analogous to a digital camera rapidly taking multiple photographs vs. a video camera capturing every nuance of the entire sequence. There are time gaps between snapshots, but CDP products capture changes continuously--without any gaps or missing data.
CDP ensures there aren't any gaps because it captures every file, block or table change as it occurs. And while it's possible to restore data to any point in time with CDP, there's a consistency issue because it's impossible to determine which point in time to rewind to. Until recently, most CDP products weren't able to identify the most recent point in time when the data was verifiably consistent.
So restoring CDP data has become a trial-and-error process; an administrator must guess a point in time to restore from. If the guess ends up being a recovery point after the corruption occurred, the data must be recovered again from an earlier point in time--greatly increasing the recovery time. If the administrator plays it safe and chooses a recovery point too far back before the known corruption, the CDP recovery time can be worse than when using frequent snapshots or backup to disk. This guessing game to find the exact point in time from which to recover can nullify the fast recovery CDP is supposed to provide.
Raising the CDP bar
The "what point in time to recover" problem is being addressed by at least five CDP vendors--Asempra Technologies, Asigra Inc., Atempo Inc. (through its acquisition of Storactive Inc.), Mendocino Software and XOsoft Inc. (recently acquired by CA)--that offer a feature known as enhanced recovery management. Mendocino (including its OEM vendors EMC Corp. and Hewlett-Packard [HP] Co.) inserts event markers into the collection process, which are then monitored by a policy engine. By incorporating CDP-awareness of business processes and events, it's more likely data will be correctly restored on the first attempt.
CA XOsoft's CDP takes a different approach. It continuously captures application- or database-specific writes and update events. For recovery purposes, CA XOsoft creates a journal entry for each write and event. When corruption occurs, damaged data is rewound to the last consistent state. Because only changes since the latest consistent state are rolled back from the journal, recovery is fast. Atempo's CDP is aimed at the Windows environment (Microsoft Exchange and file systems) with technology that works similar to CA XOsoft's.
Asempra's CDP is transaction-aware and application-specific. Its technology communicates directly with Exchange, SQL Server and Windows file systems (including CIFS). Before a transaction is copied, it checks the integrity of all of the data prior to forwarding it to the recovery server. This allows Asempra to detect data corruption as it occurs and provide a marker to determine the best recovery point.
Asigra has a completely different CDP methodology. It's a two-stage continuous backup that agentlessly backs up any changes on Windows servers to the local collector (DS-Client) as they occur. The local collector aggregates the changes, compresses and encrypts them, and then sends them to the central collector (DS-System). The central collector automatically checks and verifies the data for consistency and recoverability. If it determines a file can't be recovered, it automatically asks for it again from the local collector. This provides a known consistency point for all recoveries, again providing a quick recovery time.
Asempra, Asigra, Atempo and CA XOsoft also allow an app (such as Exchange) to be recovered first, and to be up, running and writing transactions even if all the data, mailboxes and transactions haven't been recovered. As the data is recovered, the CDP system continues to protect the live app that's running.
|Three types of CDP|
Block-based Continuous data protection (CDP) excels at transparent data capture and presenting views of different points in time. It can require additional integration work with the application. Some CDP products support a tag to specific "times" that are matched with application events, such as quiescing a database, to allow for discrete recovery points.
Application-based CDP is specific to a particular app, such as Oracle Corp.'s 10g. It's responsible for performing all of the continuous protection and journaling necessary to roll back to any point in time. Queries, rows, columns, tables, transactions or the entire database can be rewound to any point in time without disrupting the running application. Neither block- nor file-based CDP usually has that level of visibility. The value of application-based CDP is its extensive application-awareness.
File-based CDP products typically run on the protected app's servers or workstations. Most are agent-based and are conceptually similar to application-based CDP but, for these, the app is the file system. A key advantage of file-based CDP is its flexibility in setting policies for different file groups. Recovery point objectives and recovery time objectives can vary widely based on each group. Block-based CDP doesn't have that level of granularity. Recovery is also more granular because it can be to any point in time on a specific file or group of files.
The many faces of CDP
A CDP product can be file-, block- or application-based (see "Three types of CDP," at right). CDP may be provided as a standalone product; as a feature of backup, replication or database management software; as a separate backup appliance; or as array-based data protection software. Some CDP apps are agent-based, while others don't use agents at all. A CDP product may run on a LAN appliance or on an intelligent switch (see "A sampling of CDP products," below right).
And some products that claim to be CDP don't really fit into the category because they lack either a continuous protection capability or any-point-in-time recoverability. For example, Microsoft Corp.'s Data Protection Manager (DPM) lacks continuous protection and the ability to recover from any point in time. Numerous server replication products capture every change continuously but can't rewind beyond the most current replication. That type of replication is analogous to mirroring and protects only against hardware failures.
CDP and Microsoft Exchange
CDP is designed to provide the highest level of data protection for apps that can't afford to lose any data, such as database management systems, point-of-sale systems, financial transaction systems and e-mail.
The main app driving CDP sales these days is Exchange, which is extraordinarily difficult to restore with most data protection apps. Restoring Exchange is a complex and frustrating endeavor, and can be particularly time consuming depending on the number of mailboxes and messages that need to be restored. Typically, these are the major steps of an Exchange restore:
- Apply the last full backup.
- Apply the transaction logs (if they're available).
- Restore messages and transactions to each mailbox; this is time consuming, so this step is often skipped, leaving a lot of data that's never restored.
- During the restoration process, Exchange is down and a temporary server is required.
|A sampling of CDP products|
|Click here for a comprehensive list of CDP products (PDF).|
The benefits of CDP for apps other than Exchange are less certain. For example, most databases include an older, well-known variation of CDP called journaling. In a disaster, a DBA can restore the database from a known good snapshot or backup and then journal forward to the transactions that occurred after the good copy was recorded to recreate an up-to-date restored database. Oracle Corp. has gone beyond simple journaling by incorporating CDP into its latest 10g database. A DBA can rewind the database to a consistent version at any point in time with the click of a mouse and then journal forward to all of the transactions that have occurred since the time the version was created.
The key issue with CDP is how much value it provides over other data protection products. If the value of the restored data in the RPO timeframe is greater than the added cost of CDP, then a CDP product makes sense. If file-recovery speed is an issue, CDP also makes sense.
A CDP product can provide value by helping to meet regulatory compliance recovery timeframes. Some regulations require certain types of files, e-mails, messages and other data to be locatable and restorable within a specified period of time, and CDP can help meet those compliance requirements. Of course, the value of CDP soars if it makes the difference between not being in compliance and having a compliant storage operation.
When evaluating a CDP product, you want to be certain that it can resolve consistency issues. Other capabilities to look for in a CDP product include:
- Support for specific server operating systems and the company's critical applications.
- The ability to scale to accommodate three years of data growth.
- Data-retention policies that match the organization's primary data-retention policies.
- The elimination of protected data based on time or policy rules while providing digital certification.
- The capability to automatically roll up older data copies into aggregated master copies, which reduces storage space and saves time during restores.
- The ability to encrypt CDP data.