Host-based solutions require installing an agent on a server that captures changes at either the block or file level. How these agents capture and copy changed information varies, so it's important to understand both the architecture of the CDP application and the characteristics of the agents.
CDP products create a client/server architecture with a central server that communicates with an agent on each server. This configuration allows the central server to store a replica of each server's data. The central server can then manage the replication and restore of each server's replicated data. CDP products put an internally generated time stamp on each image, rather than relying on each server's time stamping. Administrators can pick any point in time to recover data.
Topio Inc.'s Data Protection Suite typifies this new architecture. Their replication client agents reside on servers that communicate back to the central replication server manager. Each client makes an initial copy of its server's data, sends it to the replication server and, once that initial copy is completed, journals all write I/Os. The replication server houses recoverable copies of each server's data; if the data is copied back to the original server, the application can resume activity. While this approach doesn't provide an instantaneous application or data recovery, it can reduce the recovery time to just a few minutes.
CDP products use a variety of different architectures and techniques to provide data protection; here's how to determine which one might be right for your storage environment.
Administrators also need to examine the characteristics of these product's agents in terms of where they install on the host server's application stack. For instance, both Softek Storage Solutions Corp. and Veritas Software Corp. install their drivers below the file system and just above the host's volume manager. These drivers ignore all read I/Os, but capture every block write I/O just before it would be written to disk, making a copy of the write I/O to a reserved memory buffer. The drivers copy the write I/O, make a journal entry of that write I/O to memory and then move the copy of that I/O to a secondary data store creating a copy of the data.
Topio's product allows administrators to configure the agent to reside either between the server's file system and volume manager or between the volume manager and the LUN level. While most block-level agents install just above the volume manager, some administrators may need to protect just an individual LUN as opposed to an entire volume group. An administrator can configure this option after installation.
XOsoft Inc.'s Data Rewinder takes a file-based approach. Rather than trying to capture every write, it protects only designated files on a server. Running at the file level gives it a couple of advantages over its block-based competitors: It integrates more deeply with e-mail and database applications and needs only to journal changes that affect those applications, as opposed to every change to a volume or LUN. And it doesn't need to make an entire copy of the primary data store as the block-based products do.
Block-based CDP products require an equal amount of secondary data storage as the primary storage that is assigned to the application. Block-level products may employ different technologies to ensure the integrity of the data at the secondary site, but they all follow the same basic process. The CDP software inserts checkpoints that ensure the primary data is in a consistent state. When consistent, the CDP program inserts a token into the data that is copied to the secondary data copy, and signals that the data is in a consistent and recoverable state. Another token is then inserted to indicate new writes are coming and that they should be appended to the consistent secondary image.
Yet, administrators need to read the fine print on all of these products to ensure a safe recovery. Softek cautions that unless the appropriate database transaction logs and journaled file system entries are applied, the image is only consistent up to the last time a checkpoint occurred. To prevent data corruption, Veritas doesn't allow reads at the secondary site and encourages use of its FlashSnap product to create a recoverable image.
This was first published in October 2004