Continuous data protection (CDP), also known as continuous backup, is a backup and recovery storage system in which all the data in an enterprise is backed up whenever any change is made. In effect, CDP creates an electronic journal of complete storage snapshots, one storage snapshot for every instant in time that data modification occurs.
A major advantage of CDP is the fact that it preserves a record of every transaction that takes place in the enterprise. In addition, if the system becomes infected with a virus or Trojan, or if a file becomes mutilated or corrupted and the problem isn't discovered until sometime later, it's always possible to recover the most recent clean copy of the affected file.
A CDP system with disk storage offers data recovery in a matter of seconds -- much less time than is the case with tape backups or archives. Installation of CDP hardware and programming is straightforward and simple and doesn't put existing data at risk.
How does CDP work?
CDP was originally introduced as a mechanism to circumvent the problem of shrinking backup windows. Prior to the introduction of continuous data protection software, most organizations performed a nightly tape backup. The problem was many organizations found themselves having to protect a constantly growing data set in a strict backup window. Although there are several techniques for expediting tape backups, there's a limit to the amount of data that can be backed up in a given period.
Continuous data protection software sought to solve this problem by transitioning from tape-based backup to disk-based backup. This reliance on disk has the added benefits of overcoming tape capacity limitations and reducing the amount of time required for data restorations.
Continuous data protection technology works by creating an initial data copy to a protection server, usually residing in the organization's own data center, and then using changed block tracking to back up the storage blocks that have been modified -- or newly created -- since the previous backup. This approach minimizes the amount of data that must be backed up in each cycle and effectively eliminates the backup window. As such, backups occur every few minutes, as opposed to once per night.
Although there are exceptions, most modern CDP platforms work by creating incremental forever backups. Once an initial full backup has been written to physical disk storage, there's no need to back up the data again. Instead, only modified or newly created storage blocks are backed up. This approach makes it easy to perform a bulk or granular recovery of data as it existed at a previous point in time.
To maintain business continuity (BC), organizations must have the ability to create off-site backups. Although CDP servers generally reside in an organization's own data center, most can create secondary tape backups or replicate backups to the cloud or to a backup data center. That way, if something were to happen to the organization's primary backup and recovery server, a secondary backup copy exists elsewhere that can be used for disaster recovery purposes.
What are the benefits and drawbacks of continuous data backup?
As with any other technology, there are both advantages and disadvantages to using a continuous data protection. In most cases however, the advantages far outweigh the disadvantages.
- CDP backups eliminate the need for a backup window.
- CDP backup servers are generally scalable and overcome the capacity limitations associated with tape-based backups.
- Unlike tape, disk isn't a linear medium, which often makes it possible to restore data more quickly than might be possible using a tape-based system.
- CDP systems enable point-in-time recoveries without needing to retrieve a tape from off-site storage.
- Many modern CDP platforms can perform instant recovery of virtual machines by running the VM directly on the backup server while a more traditional restoration occurs in the background.
- CDP platforms can be cost-prohibitive for smaller organizations.
- If not properly architected, a CDP backup server can become a single point of failure.
True CDP vs. near-CDP
The primary difference between CDP and near-continuous backup is that the two offer differing recovery point objectives (RPOs). True CDP systems guarantee that all newly created data is backed up. These systems, which tend to be designed for the protection of structured data, are more costly and complex than near-continuous backup platforms. They are heavily used in financial services and other industries that must guarantee the protection of all data.
When most people use the term CDP or continuous data protection, they're usually referring to near-continuous backup platforms. Rather than performing instantaneous backups as a true continuous data protection platform does, near-continuous backup platforms perform block-level backups on a scheduled basis. The frequency of these scheduled backups varies based on the platform, but most have an RPO in the range of 30 seconds to 15 minutes.
CDP vs. disk mirroring
A mirror backup, like any full backup, requires a lot of storage capacity. Disk mirroring, also known as RAID 1, fully replicates data to two or more disks, so if one drive fails, the organization can use the mirror copy. Until the advent of cloud storage, small and medium-sized businesses (SMBs) running only one server and a handful of laptops were less likely to implement CDP due to cost and complexity.
CDP vs. traditional backup
CDP effectively solves the biggest challenges associated with traditional backups. Most notably, CDP eliminates the backup window. Whereas traditional backups often backed up data at the file level, CDP is a block-level technology. As such, CDP immediately backs up any newly created or modified storage blocks. This effectively eliminates the need for a nightly backup window.
CDP also helps address traditional backup challenges by reducing the RPO. A traditional nightly backup occurs once every 24 hours, and any data created since the time of the most recent backup is potentially subject to loss. If an organization's nightly backup completes at midnight and there is a major data loss event at noon, then any data created between midnight and noon will be lost. In contrast, CDP platforms back up data almost immediately, meaning that an organization should never lose more than a few minutes' worth of data.