continuous data protection

Continuous data protection (CDP), also called continuous backup, is a storage system in which all the data in an enterprise is backed up whenever any change is made. In effect, CDP creates an electronic journal of complete storage snapshots, one storage snapshot for every instant in time that data modification occurs.

A major advantage of CDP is the fact that it preserves a record of every transaction that takes place in the enterprise. In addition, if the system becomes infected with a virus or Trojan, or if a file becomes mutilated or corrupted and the problem is not discovered until some time later, it is always possible to recover the most recent clean copy of the affected file.

A CDP system with disk storage offers data recovery in a matter of seconds -- much less time than is the case with tape backups or archives. Installation of CDP hardware and programming is straightforward and simple and does not put existing data at risk.

How CDP works

CDP was originally introduced as a mechanism to circumvent the problem of shrinking backup windows. Prior to the introduction of continuous data protection software, most organizations performed a nightly tape backup. The problem, however, was that many organizations found themselves having to protect a constantly growing data set within a strict backup window. Although there are a number of techniques for expediting tape backups, there is a limit to the amount of data that can be backed up within a given period of time.

Continuous data protection software sought to solve this problem by transitioning from tape-based backup to disk-based backup. The technology works by creating an initial data copy and then using changed block tracking to back up the storage blocks that have been modified -- or newly created -- since the previous backup. This technique minimizes the amount of data that must be backed up within each cycle and effectively eliminates the backup window. As such, backups occur every few minutes, as opposed to once per night.

Recovery response times

CDP works by incrementally backing up the changes in the state of the data over some period of time or when a record, file or block of information is created or updated. In some cases, there is only one initial full backup and all subsequent backups are incremental to the original backup. This "incremental forever" approach is in contrast to the standard techniques for data backup, but has been gaining greater adoption.

CDP systems may be block-, file- or application-based and can provide fine granularities of restorable objects to infinitely variable recovery points by providing:

  • A baseline reference to the original state of the data.
  • Continuous or near-continuous tracking of the state of a file, block or volume to recognize when a change has occurred.
  • Granular recovery for multiple point-in-time states of the data.

Benefits of continuous data backup

Continuous data protection software allows for tiered storage and the rise of hierarchical storage management (HSM) helped reduce the manual process of storage tiering. Today, software automation shuttles the data dynamically between different storage systems, drive types or RAID groups in real time, in ways that are largely transparent to the user. The short-term tier stores the most current data and usually exists on a high-performance storage array whose disks can collectively deliver a sufficient level of IOPS for efficient data protection and recovery. As recovery points age, the continuous data protection software moves them out of the short-term storage tier and on to the long-term storage tier. This tier may use commodity disks, but it is more common for it to use tape or cloud storage.

CDP vs. near-continuous backup

Both CDP and near-CDP support instantaneous recovery, allowing an application to immediately mount a recovery image when the primary image is damaged. The difference between the two is the recovery point objective (RPO) that they offer; CDP offers an RPO of 0, and near-CDP offers an RPO of however-often-you-are-taking-a-snapshot (typically one hour).

CDP vs. disk mirroring

A mirror backup, like any type of full backup, requires a lot of storage capacity. Disk mirroring, also known as RAID 1, fully replicates data to two or more disks so if one drive fails, the organization can use the mirror copy. Until the advent of cloud storage, SMBs that were running only one server and a handful of laptops were less likely to implement CDP due to cost and complexity. 

This was last updated in July 2009

