This article can also be found in the Premium Editorial Download "Storage magazine: Evaluating the benefits of IP SANs."
Download it now to read this article plus other related content.
|Why network replication makes sense|
Because replication is more communications-intensive than storage-intensive, network-based replication devices would be an efficient way to manage the process.
Using replicated data
Remote data replication systems need to be able to fulfill their primary purpose if a disaster strikes. The replication system will need to be switched from its receiving-writer mode to a live production mode. When this happens, the remote storage target will be used by a different production system.
Data replication may be implemented in such a way that the remote replicated volumes are established as read-only volumes. This is done to ensure data integrity--if the only writer to the volume is the replication writer, then there's no chance that external processes would inadvertently corrupt the replicated data. Obviously, before the replication storage target can be used for live I/O, it must be changed from read-only to read-write.
If the remote replication storage target is needed to come online for production purposes, it may be desirable to replicate new write I/Os to another remote site. This isn't the sort of thing that happens as a result of being lucky--it can only be done as a result of diligent disaster planning. Companies that have more than one data center will often replicate data between them. Many of the remote data replication products allow bidirectional operations, so that two sites can form a replication pair, protecting the information assets of both sites.
Another common use for replicated data is the creation of snapshots. In general, the remote storage target has far less activity than the primary storage target. This makes it much easier to perform snapshot operations. There are different methods for creating snapshots, which are beyond the scope of this article, but the general idea is to provide either historical, read-only access to replicated data or to generate additional copies of replicated data that can be used for analysis, testing and backup.
One technique that's sometimes used in remote data replication is time stamping. A data replication product that uses this approach attaches a system time stamp to each write I/O that's transmitted to remote storage. These time stamps can be incorporated in the algorithms of the data replication logic, or they can be used by an administrator who is managing the replication or failover process.
Just as virtual storage can be implemented in the host systems, storage subsystems and network devices, remote data replication can also be implemented in any of these locations along the I/O path.
Historically, remote data replication has been sold by enterprise disk subsystem vendors as an optional software offering. These subsystem-based solutions tend to be pricey, but customers who have used them will tell you that they were worth every penny.
On the negative side, one of the shortcomings with subsystem-based replication is that these solutions are homogeneous: They require similar subsystems from the same vendor on both ends of the connection. This means that you can't use cheaper storage from another vendor on the remote site. Disk subsystems aren't exactly open platforms encouraging third-party inventions and open-systems pricing. It's expensive for the subsystem vendors to develop for these platforms, and the task is an impossible proposition for anybody else. To raise the ante on the pricing front, the subsystem-based products tend to lock customers into their subsystem vendors for extended periods of time.
|Fresh approaches cut costs|
Architecturally, the main problem with subsystem- based replication is that the replication process is contained within a single subsystem. While this may not seem like such a big deal, restricting an important function within the confines of a particular resource is certainly a scalability limitation. In order to maintain write ordering and data integrity, the applications must be restricted to using only storage exported by a single subsystem. The reason for this is the lack of synchronization between subsystems performing remote data replication. If the systems, subsystems and all their communications are not all tightly coupled, there's no way to guarantee write ordering and data integrity.
One of the most compelling arguments for adopting SANs is the ability to connect virtually any storage resource to the network and make it available to almost any application. However, if the application requires availability insurance via remote data replication, subsystem-based replication severely restricts the flexibility that SANs were designed to deliver.
Another way to replicate remotely is to place replication software in the host system. There have been several companies, such as NSI, that have provided host-based file replication over the years. However, there has been only one vendor, Veritas with its Veritas Volume Replicator (VVR) that currently sells real-time store-and-forward remote volume replication that competes with subsystem-based remote data replication.
Host-based remote volume replication partially solves the big limitation of subsystem-based replication: To work together, all the subsystems must come from the same vendor. Volume replication at the host, on the other hand, can be established to work with nearly any storage target that the host can address in the SAN. However, the host software approach restricts the replication function to applications running on that particular host. Applying remote volume replication to multiple host systems may require a separate license for each system. Special cluster-aware volume replication that uses time stamps would be necessary to synchronize write I/Os between cluster nodes to ensure write ordering.
Host-based remote volume replication depends on the availability of system resources to manage the queuing and transmission of replicated write I/Os. This is much more resource-intensive than most volume management software functions. Organizations that want to use host-based remote volume replication should plan to oversize their systems to make sure they have both the disk capacity for queuing write I/Os (including situations where the remote communications link is down) as well as the CPU capacity for process replication functions. As these systems mature, unless additional disk and CPU resources can be added, the burden of host-based remote volume replication can become more noticeable.
Network device-based replication
The third architectural choice offered by SANs is to put the data replication function inside a network device, such as a router, switch or dedicated appliance. Unlike in-band storage virtualization, which I have viewed suspiciously for years, remote data replication may be an excellent application to run within the network. Whereas virtualization is primarily a storage function, remote data replication is mostly a communication function. There's very little storage work done with remote data replication, and the difficult aspects encompassing data integrity and write ordering can be addressed most directly with effective communications technology, not storage technology.
A dedicated system or appliance for remote data replication could be established and used by almost any application running on the server in the SAN. In addition, the remote data replication system or appliance could work with a wide variety of storage products, circumventing the vendor lock-in that exists with subsystem-based solutions today. The ability to manage data replication, including the failover process from a single management point, shouldn't be overlooked (see "Why network replication makes sense," above).
Furthermore, placing remote data replication in a dedicated system or appliance almost completely removes the burden of replication from host systems. While it would likely be necessary to include host write I/O redirectors that would mirror writes to a replication appliance, these redirection agents would likely be thin, and they would consume far fewer resources then a host-based volume replication system.
On the flip side, while there are many successful stories of subsystem-based remote data replication products, there are none yet for network device-based replication products. They seemed to have key architectural advantages, but they aren't yet proven in the market. (For an overview of remote data replication companies and their products, see "Key remote data replication companies" ).
This was first published in July 2003