Replication appliances can work across heterogeneous storage devices, but not all appliances are appropriate for every environment.
With growing heterogeneous server and storage environments, some enterprises are finding that host- and storage array-based replication approaches no longer satisfy their requirements. A replication appliance may be the solution. Operating system- and storage array-agnostic, replication appliances give companies increased flexibility to address one-time and continuous data replication needs while allowing applications to continue processing without data loss or to recover with minimal downtime.
|Click here for a comprehensive list of replication appliances (PDF).|
Replication appliances allow administrators to:
- Deploy the appliance in their existing networked storage infrastructure
- Choose from a variety of replication options that best suit particular applications
- Replicate data between different brands and tiers of storage
- Replicate data locally or remotely to provide different recovery options
- Create consistent database snapshots for near real-time application recovery
But all replication appliances don't necessarily offer all of these features or deliver them in the same way. FalconStor Software's IPStor, McData Corp.'s UltraNet Replication Appliance (obtained with its acquisition of Computer Network Technology Corp.) and Sun Microsystems Inc.'s MirrorStore (part of its StorageTek acquisition) appliances provide an in-band Fibre Channel (FC) architecture that resides in the data path between the server and storage array and mirrors data to different storage arrays in a pass-through configuration. Kashya Inc.'s KBX5000, Topio Inc.'s Topio Data Protection Suite (TDPS) and Xiotech Corp.'s TimeScale use both in-band and out-of-band techniques. The appliances use FC connections to present disk targets to hosts, but also introduce out-of-band, host-based agents that copy write I/Os to the logical unit numbers (LUNs) that their appliances present to the host.
Replication appliances bring along their own set of administrative headaches, however. In-band FC disk targets take time to set up because administrators may be unfamiliar with how to configure the LUNs the appliances present or uncertain as to how an appliance that intercepts write I/Os will impact their applications. In addition, host-based agents require administrative time to install, possible server reboots and the introduction of drivers that may be incompatible with hosts or applications. There may also be a performance hit during an application's most intensive write I/O periods.
Replication appliance vendors are trying to address these concerns by combining in-band and out-of-band techniques to give administrators more ways to implement and manage the data replication process. But even with new options that attempt to address changing storage environments, replication appliances may still fall short in providing enterprise-wide solutions.
There are five general configuration options for implementing a replication appliance:
- Pass-through. The replication appliance sits between the server and storage, and presents the storage array LUNs to the host. I/O passes through the appliance with only write I/Os copied by the appliance. This approach doesn't require host-based agents, but it does requires more time to set up and configure; it may also present problems for path failover software (see "Pass-through" next page).
- Data Mover. The appliance replicates data from one storage array to another in the background with minimal host or application disruption. This implementation is ideal for data migrations from one storage array to another without the use of host-based agents (see "Data mover" next page).
- Split Write over TCP/IP. An agent runs on the host and sends copies of the write I/Os to the replication appliance over the host's TCP/IP interface. This is a cost-effective approach that works well for low-end or lightly used servers that aren't connected to a FC SAN, but need the same replication benefits offered to FC- attached servers (see "Split Write over TCP/IP" ).
- Split Write to Fibre Channel Target. An agent runs on the host and copies write I/Os to the FC targets presented by the replication appliance to the host's FC interface. This technique offloads the I/O processing from the server to the FC host bus adapter (HBA), and sends data to the LUNs presented by the replication appliance without directly introducing the appliance in the data path between the server and the storage (see "Split Write to Fibre Channel target").
- Split Write at Fibre Channel Switch. The replication appliance sends commands to a service running on the FC director to copy and send write I/Os to the replication appliance. Unlike other Split Write methods, this approach doesn't require a host-based agent as commands are sent from the replication appliance to the FC director over FC (see "Split Write at Fibre Channel switch", next page).
FalconStor's IPStor, McData's UltraNet Replication Appliance and Sun's MirrorStore each offer pass-through as one of the ways users may implement their replication appliance for continuous replication. This option may be selected when one or more of the following circumstances exist:
- Host-based replication products don't support the operating system or the version of the operating system on the host.
- Placing host-based replication software on the hosts is problematic in the environment.
- The replication requires an approach that's minimally disruptive to the host.
- Users are moving from one vendor's storage array to another vendor's storage array.
- Users want to place data on a larger sized volume on a new array than on the existing volume on the current array.
Users, however, will generally want to avoid the pass-through approach for ongoing data replication. Load balancing and path failover software like EMC Corp.'s PowerPath, Hitachi Data Systems' Hitachi Dynamic Link Manager (HDLM) or IBM Corp.'s Subsystem Device Drive (SDD) may not work with the LUNs presented by these appliances or, if they do work, may not result in a vendor-certified configuration. Second, appliances using the pass-through method limit the number of LUNs that can be presented on each port. While McData's UltraNet Replication Appliance offers 256 LUNs per port, administrators must account for this limitation in larger deployments, especially if the ports will be shared among multiple servers. Finally, all I/O between the server and storage array must go through the replication device. So even though only the write I/O is intercepted and copied by the appliance, putting this device in front of the storage array in effect virtualizes the LUNs presented by that array--the only benefit realized is the new replication capability.
Sun partially circumvents this situation by deploying most of its MirrorStore replication appliances as data movers and using them for one-time replication requirements. Used that way, MirrorStore leaves the primary I/O path between the server and storage array intact. Instead, MirrorStore is given read-only access to the LUNs assigned to the server on the storage array. With this block-level access, MirrorStore migrates LUNs from the existing storage array to the new one in the background, while tracking changes to LUNs on the old array and moving those blocks as the changes occur.
Once all of the data on the LUNs is on the new array, the application is stopped so that the final writes can be synchronized by MirrorStore to the LUNs on the new storage array. The SAN is then reconfigured to allow the server to access the LUNs on the new array, and the server discovers the data on these new LUNs dynamically or is rebooted to allow LUN discovery to occur. The application is then restarted using data that resides on the LUNs on the new storage array.
To replicate data on a continuous basis and alleviate user concerns about in-band implementations, vendors offer a host agent that performs a copy of the write I/O--a Split Write--that gets sent to their replication appliance. This Split Write serves two purposes. First, it eliminates the need to put the appliance directly in the data path between the server and the storage array. Second, it takes advantage of the replication appliance's connection to the FC SAN and its ability to present LUNs to the host's FC interface.
There are two ways to implement the Split-Write feature: at the host or at the fabric layer. Loading the agent on the host may impact the host in a couple of ways. From an implementation perspective, it may require a reboot of the server to load the driver that copies the write I/Os or to discover the LUNs presented by the replication appliance. And copying the write I/Os may impact application performance, although vendors universally claim these agents only impact applications that have high write I/O transaction rates.
For servers without FC connections, Topio's TDPS offers configurations that support sending the Split Write over a TCP/IP connection. TDPS copies the write and either stores the write to cache or disk before transmitting the write I/Os over IP. In the event of a loss of IP connectivity, the writes are stored until the link is restored. This option gives servers that have either internal disk or SCSI-attached external disk replication capabilities similar to those FC disk offers.
Ease of implementation and performance impact will be the two key factors when choosing between a TCP/IP or FC connection to send the Split Writes. Using a TCP/IP connection is easier to implement, but it generates additional CPU and memory overhead on the host and more traffic on the network, so it should be used only for lower performance applications. Conversely, sending the Split Writes through the server's FC HBA can take more time to set up, but it offloads the I/O processing to the HBA, minimizing application impact. But this method of re-introducing a host-based agent results in the same issues as host-based replication methods from NSI Software Inc., Softek Storage Solutions Corp. and Symantec Corp.--some host operating systems agents aren't available and, in other cases, implementation may be disruptive to hosts.
To address these shortcomings, replication appliances now interact with services that run at the fabric level on FC director blades. Both Kashya's KBX5000 and Xiotech's TimeScale replication appliances work with Cisco Systems Inc.'s SANTap Service that runs as a feature on Cisco's Storage Services Module (SSM) line cards. By sending SCSI-FCP commands over the FC interface from the replication appliance to the SANTap Service on Cisco's SSM, these implementations can capture and copy all reliable write I/Os to the replication appliance. Cisco is currently the only vendor to support this type of feature. But replication appliance vendors say they're working with Brocade Communications Systems Inc. and McData, who plan to offer similar functionality on their FC directors.
The main advantage of this approach is that write I/Os may be captured without virtualizing a portion of the storage environment or deploying host agents. But it also has its downsides. Configuring specific point-in-time recoverable images for applications is more difficult because without an agent on the host there's no certain way to stop and start I/Os to ensure the creation of a recoverable image. And some replication appliance vendors are still not entirely clear on how to isolate write I/Os from specific servers or applications, so the practical benefits of this feature may be minimal in the near term. Still, this approach seems to have the inside track to become the preferred method to implement data replication on a wider scale in large, heterogeneous networked storage environments.
As storage networks grow, the types of applications they host necessitate different types of recovery scenarios. Because any application, server, network, storage array or site can fail, replication appliances need to offer different replication options. Although replication appliances are deployed to recover from large disasters like site failures, they also include options that permit local replication to recover from local system failures.
Topio's TDPS out-of-band appliance typifies the approach enterprises would take for disaster recovery or to centralize all backups to one site. Topio's architecture calls for a replication appliance at an offsite location to function as both a management workstation and a central target to which all of the agents send their data over IP in asynchronous mode. The snapshot option can also be configured so that the server agent holds the copied writes on the server until a consistent, recoverable image is constructed on the central management server. This option allows the regular production write I/Os to continue without impacting the application.
In-band approaches such as FalconStor's IPStor or McData's UltraNet Replication Appliance allow synchronous replication between two different storage arrays. This option lets users keep a real-time copy of the data on two arrays in the same location; if one array fails or needs to go offline for maintenance, the server can continue its processing using data on the other array. Alternatively, these appliances include snapshot functionality that breaks off copies of the data periodically--in the event of data corruption, the application can recover from a specified point in time.
But there are limitations to in-band and out-of-band approaches when doing replication. Appliances from most vendors are limited to the number of appliances they can replicate across and have only limited ways of synchronizing the data if multiple replication appliances exist. Conversely, products like TDPS, which synchronize the data from multiple incoming replication data streams, fail to adequately address local replication needs and are not well suited for high write I/O applications. Even the less-intrusive, fabric-based techniques are currently limited to Cisco SAN director deployments, and there's little practical knowledge at this point about the impact this has on write I/Os or even how the SANTap feature fully works.
Replication appliances continue to gain new features and functions as vendors adapt to myriad user environments. Yet most replication appliances remain limited to handling either a small number of high-performing apps or a large number of lightly performing apps, while fabric-based replication features continue to evolve. As a result, enterprises should resist the urge to deploy any replication appliance on an enterprise-wide basis, and install them only as application demands dictate.