What are the benefits and drawbacks of using host-based and array-based data replication techniques?
Host-based replication typically operates at a much more granular level than array-based replication products, and that means array-based replication is typically easier to manage and does not consume host processing power.
Host-based replication is software that is installed as an agent or driver on a host operating system (OS) or within a virtual machine (VM). Since the replication software is installed locally on the host or machine image, it gains the advantage of observing the active changes to data inside the host or VM. This means replication tasks can be customized to the type of applications running within that host and can also perform additional data safety functions such as periodic pausing of a database.
It also has the advantage of being hardware-independent and, as a result, can replicate data regardless of the type or combination of arrays in use. This is especially helpful in keeping disaster recovery site costs to a minimum, since data can be replicated from an expensive tier-one array at the primary data center to an inexpensive tier-two or tier-three array at the DR site.
The downside to host-based replication is that there must be specific support for the OS and potentially for the underlying application. In other words, the replication process can't be installed universally. This not only means you might have to deploy a separate replication product for each operating system, but you will need to manage them separately.
If only a few hosts or VMs need replication, this is a relatively straightforward process to manage. But if dozens of hosts or VMs need to be replicated, the management and daily monitoring of replication processes could be overwhelming.
Array-based replication has the advantage of being universal, at least to the data on that array. It's simply replicating bits of data as they change from one array to another, so any host that connects to the array can be replicated regardless of the OS that it's running. It also doesn't consume any of the host resources, so they can remain dedicated to application performance.
The downside to an array-based data replication technique is that it does not have the granularity that host-based replication does. Typically, the entire volume or logical unit number needs to be replicated, rather than specific data set on that LUN. This is especially problematic in a virtualized environment where multiple VM hosts can be on a LUN.
Another drawback is that most array-based replication utilities can only replicate to a like array from the same manufacturer. This may be more expensive than the host-based alternative that can replicate to less-expensive hosts.
About the expert:
George Crump is a longtime contributor to TechTarget, as well as president and founder of Storage Switzerland LLC, an IT analyst firm focused on the storage and virtualization segments.
Dig deeper on Data management tools
Related Q&A from George Crump
To determine the amount of flash to use with your hyper-converged system, take into account whether it will be used as a pool across all servers or ...continue reading
Hyper-converged architectures make provisioning storage for VMs simpler by integrating it into configuration options.continue reading
George Crump of Storage Switzerland offers insight on finding the best choice for backing up Active Directory in this Expert Answer.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.