This article can also be found in the Premium Editorial Download "Storage magazine: Storage Products of the Year 2010."
Download it now to read this article plus other related content.
Array-based replication's greatest shortcoming is its requirement for similar source and target arrays, limiting its use to homogeneous storage environments. Most storage vendors don't even support replication between their own array families. Among major storage vendors, NetApp is the lone exception, supporting array-based replication between any of its arrays. Another noteworthy vendor is Hitachi Data Systems, whose Virtual Storage Platform (VSP) and Universal Storage Platform (USP) are able to reach out to other arrays via storage virtualization. And with very few exceptions, such as Dell EqualLogic arrays, replication is an extra-cost option that's charged for by device or replicated capacity.
Block-based FC and iSCSI arrays replicate block changes on volumes and LUNs. Since only changed blocks of a few hundred bytes need to be replicated, it's very fast and efficient. Executed beneath the file system, block-based replication is operating system agnostic and supports replication between any platforms attached to the array. Block-based replication has the potential to take advantage of advanced array features such as deduplication, compression and encryption, and some vendors have enhanced their replication offerings accordingly. For instance, NetApp, with the 8.0.1 release of Data Ontap, added the ability to only replicate data changes in FlexClone volumes between parent and clone images. A
FlexClone volume is a thin-provisioned clone, requiring very little actual disk space; but until this latest release, the complete volume had to be replicated instead of the disk-efficient FlexClone.
NAS systems usually replicate at the file system-level, which has the benefit of file system metadata awareness, which can be leveraged during the replication process and enables replication based on criteria such as file size and file type. But it's slower and usually less efficient than block-based replication. The performance impact increases with the number of files and folders in a replication set that need to be parsed, and the larger the tree, the longer it takes to parse it. For that reason, BlueArc introduced the object-based JetMirror technology, replacing time-consuming sequential file parsing with an object-based metadata store. "Backups with JetMirror are 2.8 times faster than with NDMP [Network Data Management Protocol] and replication times for very large file stores can be reduced by an order of magnitude," BlueArc's Chalaka claimed.
KEY CRITERIA FOR SELECTING A REPLICATION PRODUCT
Enlarge KEY CRITERIA FOR SELECTING A REPLICATION PRODUCT diagram.
Network-based replication usually comes into play in heterogeneous storage environments. It'll work with anyone's array and supports any host platform. Situated in the network, between hosts and arrays, the splitting of I/Os is performed in either an inline appliance or in a Fibre Channel fabric. The I/O splitter looks at the destination address of an incoming write I/O and, if it's part of a replication volume, forwards a copy of the I/O to the replication target. In many ways, network-based replication combines the merits of array- and host-based replication. Having only arrived on the market several years ago, it has the smallest market share, trailing both array-based and host-based replication in revenue and numbers, but it's growing at a quicker rate than array-based replication, according to IDC.
Compared to the multitude of array- and host-based replication offerings, there are fewer network-based replication products on the market, and they can be broken into two groups: inline appliances and fabric-based replication products.
Inline appliances, such as the IBM SVC, don't depend on intelligent switches from Brocade Communications Systems Inc. or Cisco Systems Inc. for splitting I/Os; instead, I/Os are terminated and forwarded in the appliance to storage targets. Unlike the wire-speed splitting of fabric-based products, the overhead of terminating and initiating new I/Os causes a small delay. While fabric-based products are based on a split-path architecture where data that isn't part of a replication or virtualized volume is simply passed through, in an inline appliance all traffic has to traverse the replication appliance. As a result, they're more likely to hit a scalability threshold than their fabric-based counterparts. "A variety of hardware options, including cache and number and speed of processors, have enabled IBM to address scalability and performance concerns for the most part," said Greg Schulz, founder and senior analyst at Stillwater, Minn.-based StorageIO Group.
While fabric-based replication products may be technologically superior, with better performance and scalability, they're significantly more complex and require intelligent switches. To use them in environments that don't have intelligent switches, fabric-based replication products usually provide host agents that perform the splitting of I/Os on hosts instead of in the fabric. EMC RecoverPoint, with its continuous data protection (CDP) and remote replication capabilities, is the most prominent fabric-based replication product.
HP StorageWorks SVSP and LSI StoreAge SVM -- the former being an OEM product of the latter -- combine the simplicity of an inline appliance with the performance and scalability of a fabric-based product. The products use a split-path approach where management is handled in-band; however, data movement and normal data flow occur out of band, leading to improved scaling and performance.
Other network-based replication players are FalconStor Software Inc. with its MicroScan and Delta Resync replication features, and InMage Systems Inc.
This was first published in February 2011