By Jeff Boles, Contributor
Remote replication used to be just a one-to-one volume replication technology for offsite disaster recovery. But the technology
has matured and can now be used throughout the data center for myriad tasks, including continuous data protection (CDP) and host failover support. In this podcast, Jeff Boles, senior analyst and director, validation services at Hopkinton, Mass.-based Taneja Group, describes remote replication's role in enterprise data storage, the many uses of the technology and available replication products.
Download for later:
Remote replication podcast
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As
Table of contents:
>>What is remote replication and how is it used?
>>Where does remote replication fit in the enterprise data storage market?
>>How to match storage needs to available remote replication products
Replication is a great technology. In my opinion, it's one of the more interesting areas of storage technology. Over the past several years we've seen an enormous number of solutions come into both the disaster recovery (DR) and data protection product markets that have replication technology at their foundation. Those solutions have given end users a whole new range of capabilities compared to what replication used to be a decade or more in the past.
Replication started out as a one-to-one, full-copy-type technology. It would take this volume and copy it here or there. Maybe at the block level, maybe at the file level, but full volume, copy everything here or there [in] a one-to-one relationship. As a full-copy technology, some vendors started optimizing replication so that they could replicate asynchronously and better manage bandwidth consumption and intermittent connections. Then we had replication also come to market as a really affordable host-based technology from people like Double-Take [Software Inc.] or SteelEye Technology Inc. today. This has become the basis for host failover systems or host-based clustering.
In the past few years, we've seen replication really come into its own, and you can have it in all of these flavors. But you can also get replication systems or solutions that can replicate multiple volumes in a site, replicate heterogeneous systems, replicate full volumes as well as snapshots, and even replicate any point-in-time data so your replicated volume can serve as a big collection of all of your data changes, and you can select any point in time at which to recover and serve up a volume. In fact, in my view, one of the biggest changes in this market has been an enormous amount of innovation that took place around CDP and applying this continuous data change capture technology into these replication layers. So today we see a range of patterns in the data center, with some flavor of all of these broad replication technologies found in 95% of midsized and larger enterprises today. That might be local replication; it might be replication behind a geocluster -– a cluster scattered across two sites, typically for DR; and it might be complex inter- and intra-site replication topologies with all kinds of systems and volumes involved.
The typical enterprise customer today really needs to reassess replication in general, and take a big-picture look at what requirements they see replication able to fulfill in their enterprise today. Replication has the ability to be a broad framework within which you perform all of your near-line data protection as well as DR protection. You can use replication management frameworks to orchestrate snapshots, and then retain and move these snapshots in a policy-based lifecycle. Or you can cost-effectively obtain replication as a single point solution for protecting a single host. That's a tremendous range, so make sure you understand what you're trying to accomplish and plan for the long term when you start making investments.
The first step is thinking about what you're trying to accomplish. Then you need to get an understanding of the capabilities that are available to you in the marketplace. I think the market breaks down into three categories of solutions fairly neatly, even though there's a pretty broad range of capabilities in each of these categories.
My first category is array-based solutions. Today, almost every storage array vendor has some array-centric replication technology to offer. For SAN or NAS customers, this is often deployed to meet basic DR or cross-array, near-line data protection requirements. Using such technologies, organizations can replicate full volumes or snapshots locally or remotely, and keep them online for nearly instant access to volumes in case of a disaster or failure. Some organizations also use such technologies to create replicas from which they may perform data protection so that backup jobs don't interfere with active hosts. This is especially important with virtual servers. Examples of things I put in this category are EMC's Symmetrix Remote Data Facility, Hitachi Data Systems' TrueCopy, IBM's Peer to Peer Remote Copy and NetApp's SnapMirror. And then midrange arrays have a whole other set of technologies that are usually associated with them.
The second category is host-based replication or application-based replication. Lots of organizations have host-based replication today for specialized protection needs, such as Web server and application server failover and geoclustering. Host-based replication can give you pretty granular host clustering capabilities and automate some things, like the startup of a passive server and the assumption of the primary server's identity. There are lots of capabilities here, but if your protection needs are broad and you need to protect a whole bunch of systems, host-based replication may become very hard to manage. Running many of these from one site to another may [also] bring you bandwidth management headaches. Examples in this category are technologies like Double-Take -- that's a classic example, it has broad recognition. There's also open-source rsync, which can do some version of this -- not in an active-active way -- but it can help you move data and replicate it from one site to another. NEC [Corp.] has ExpressCluster. [And then] there's SteelEye's LifeKeeper.
There's another layer above host-based, and that's the application layer, and you find unique technologies there, too, like technologies specially integrated for SQL Server, Oracle and Exchange. Third-party solutions are integrated there as well, like things from Neverfail, Mimosa's NearPoint and more.
The third category is site-wide technologies. This is where replication technology really gets cool and cutting edge. We've seen a lot of innovation here. There are a number of [remote replication] solutions on the market today that can aggregate the data from multiple systems, optimize that data, and replicate it to other storage volumes or sites. You can buy a single replication solution that can handle heterogeneity across hosts and storage, and give you tremendous anywhere-to-anywhere capabilities. Moreover, a lot of these solutions have enabled these capabilities with an underlying continuous data protection technology. This means these systems can capture every change as it happens and replicate a storage volume in any way you want. It can create an any-point-in-time copy at specific consistency points as snapshots, whole volumes or more. Those technologies today include things like InMage Systems Inc.'s DR-Scout and EMC's RecoverPoint.
And then separately, but not to be forgotten, nearly all of the storage
virtualization solutions also bring this capability to market, things like FalconStor and IBM
SVC and DataCore frequently have these replication capabilities built into [them] at a CDP-like
level. But not every technology is the same. Some site-wide technologies can just help you
orchestrate replication, too, which is important and shouldn't be forgotten. [Those products
include] Hitachi's Replication Manager and EMC's Replication Manager. That's especially important
when you move into more complex applications that might have several volumes that you need to make
sure get consistently replicated from one place to another, such as a big database or an ERP suite.
Then you'd think about an orchestration layer.
This was first published in September 2009