Compared to the pre-SAN era, the shift is quite dramatic. But amid this data management revolution, backup is a...
notable exception. Despite advances such as SAN-based shared tape drives and disk technology like virtual tape libraries, the backup process is fundamentally the same as it was 20 years ago. Backup remains a costly and highly intrusive batch operation that's prone to error and consumes an exorbitant amount of time and resources.
--From the SNIA Data Management Forum CDP Special Interest Group
Based on this definition, products like Microsoft Data Protection Manager (DPM) and other snapshot-based solutions aren't technically CDP because they're not continuous--they don't immediately store every change. But if you view data recovery as a continuum with nightly backups at one end of the scale and true CDP at the other, snapshot management tools must be viewed as dramatic enhancements to recoverability. Each environment has specific recovery needs--for those currently dependent on backups, DPM represents a leap forward despite falling short of the CDP ideal.
The emerging technology lifecycle
Technologies exist today to replace traditional backup and to effectively eliminate the nightly backup cycle. It's possible to provide data protection in an integrated and transparent manner without the invasiveness of nightly backups. This can be accomplished in a number of ways. More importantly, it can be done affordably.
I'm referring to continuous data protection (CDP) and snapshot-based CDP-like products that have emerged in the market. Perhaps the greatest promise of these products is their ability to shift the focus of data protection from backup to where it should be--recoverability.
Although quite promising, these products aren't considered part of the data protection mainstream yet. All new technologies face hurdles, but the adoption curve here, compared to VTLs for example, seems to be particularly long. At what point does a technology evolve from "emerging" to "arrived"? The ultimate metric is the number of adopters, but that begs the question, "What compels people to become adopters?" Here are some considerations:
- The technology must provide significant benefits over current approaches
- There must be multiple vendors of the technology
- The adoption risk can't be too high
Let's measure CDP against those criteria:
- Initial reaction to CDP products is typically excitement about the possibility of eliminating backups and having near-zero recovery time objectives (RTOs) and recovery point objectives (RPOs) without spending a fortune.
- The number of vendors in the CDP arena is expanding. The Storage Networking Industry Association's "CDP Buyer's Guide" lists approximately nine vendors/products. If you also include snapshot-based, near-CDP products, the list grows.
- Adoption risk is the sticking point for CDP-type technologies. Initial positive reactions may be replaced by skepticism or questions about a product's maturity and reliability.
Enter the giants
There are signs CDP is gaining traction, as evidenced by the large vendors embracing the concept. Oracle has incorporated CDP-like functionality, called Flashback, into Oracle 10g (see "Oracle Flashback," at right) that enables fast rewind of databases to earlier points in time. IBM is testing the waters with IBM Tivoli CDP for Files, a product focused primarily on protecting desktops and laptops. Symantec/Veritas and EMC are also talking about introducing CDP products.
Since the introduction of RMAN in Oracle 8.0, Oracle has steadily improved data protection functionality in its products. Introduced in Oracle 9i and significantly enhanced in Oracle 10g, Flashback provides a set of SQL commands that lets users view data as it existed at various points in time. This allows you to quickly identify points of corruption and to restore a database or table to a point immediately prior to the corruption.
Designed to protect against logical corruption only, Flashback must be combined with other technologies, such as backup and replication, to protect against physical loss. While not a complete continuous data protection solution, it's a dramatic step toward a significantly reduced recovery point objective and recovery time objective.
Microsoft DPM works with Windows 2000 Server, Windows Server 2003 and Windows Storage Server 2003 to protect server volumes, folders or shares. A DPM agent initially creates and sends a replica of each protected object to a DPM server. The agent then logs byte-level changes and periodically (typically hourly) replicates those changes to the DPM server. The DPM server catalogs this information in its SQL Server database and uses VSS to create point-in-time copies of protected objects based on administrator-defined policies.
The default policy is to create shadow copies three times a day. Given the VSS limit of 64 shadow volumes for each protected object, this provides approximately 30 calendar days (20 business days) of disk-based DPM recoverability. The granularity can range from hourly to daily, and the oldest copy is removed when the 64-copy limit is reached. This limit dictates that an organization should establish an archiving policy to ensure maintenance of older data.
Data restores can be performed by administrators or by permitting users to browse previous versions using Windows Explorer or Microsoft Office 2003 applications.
Some factors to consider:
- DPM depends on Active Directory to manage access to data. Because of this, it can also identify systems or volumes that aren't protected, which can be valuable for discovering "orphan" systems and ensuring they're properly protected.
- In its initial version, DPM protects only files. It doesn't handle e-mail or databases, although that functionality will be added in subsequent versions.
- DPM is a "Windows Server-only" solution. It doesn't protect desktops, laptops or non-Windows servers.
- Microsoft is positioning DPM as a solution for businesses with 10 to 99 file servers, and for enterprises implementing centralized data protection for branch offices.
- The DPM server must be protected through replication, backup or a combination of the two. It's important to note that Microsoft is positioning DPM in a disk-to-disk-to-tape architecture and has provided an interface to enable backup software vendors to integrate DPM support into their products. With a fully integrated backup product, client restores from tape, if required, can be performed directly without the intermediate step of recovery to DPM.
A number of promising storage technologies have taken time to gain traction: virtualization almost died but has been reborn; iSCSI was long awaited and is now growing; and intelligent switches--well, they're still emerging. Is it finally time to think about dumping your batch-oriented backup infrastructure? For most companies, the answer is "Not yet." However, it's clearly time to assess where these new technologies can be applied. It'll take a few years, but I expect that one day we'll be referring to transparent backup as a "best practice."