WAN mirroring and replication

This overview of WAN mirroring and replication discusses remote issues, rolling disasters, security and cost concerns.

One of the easiest ways to protect vital data is to mirror (or copy) the data to a second location. If one copy fails, the second copy can be used. In many cases, data is mirrored locally in a storage array using a RAID 0 disk group, but this basic replication method does not offer any protection against disasters.

Data can also be mirrored to one or more remote locations across a wide area network (WAN), such as the Internet. Remote replication helps to ensure business continuity by placing company data in secure locations outside of potential danger zones. Mirroring can also allow offices in a particular region to access company data that has been replicated to a local site. This enhances storage performance by reducing the number of routers and other intervening networking equipment between a remote office and a data center possibly located in another region.

Chapter 1 of this WAN guide highlights general WAN concerns. Now, let's look at the major considerations involved in remote mirroring and replication. (Editor's note: For this article, the terms mirroring and replication are used interchangeably.

Synchronous vs. asynchronous

Data can be moved between sites synchronously or asynchronously. Both approaches allow data to be moved safely outside of danger zones while still supporting shorter recovery point objectives (RPO) and recovery time objectives (RTO). Replicated data is an exact duplicate of the original file setup -- data is not restrained by backup formats -- so it's possible to recover individual files, folders or entire systems from the remote copies. Still, it's important to maintain security so that replicated data is available only to authorized users.

Synchronous replication makes RTOs and RPOs possible by holding the local write acknowledgement until the remote write acknowledgement is received. This ensures that the remote copy is written properly in real time, so the RPO remains essentially zero and the RTO is typically on the order of minutes. However, synchronous replication performance is tied to latency between sites. Since latency is related to distance and the networking hardware between sites, synchronous replication is normally limited to less than 100 miles.

Asynchronous replication uses a "store and forward" approach where local writes are acknowledged before the remote write acknowledgements. Consequently, remote writes often fall behind local writes, which can push out an RPO to minutes or even hours. But since asynchronous replication is not time sensitive, it will work for distant sites located anywhere.

Remote replication reliability concerns

When it comes to remote replication tasks, reliability is a real concern. No WAN link is 100% reliable, so network administrators must account for disruptions. Obviously a complete loss of WAN service will shut down the replication process completely. But mere service degradation can also interfere with replication plans and even a slight loss of service quality can grind synchronous application performance to a halt.

Although asynchronous replication can tolerate some degradation in connectivity, the effects will still inflate an RTO and RPO, possibly to unacceptable levels. Users suggest regularly monitoring WAN quality and immediately taking steps if line performance drops below safe Quality of Service (QoS) levels.

If a mirror site falls out of synchronization, it will need to be resynchronized. This process can demand a considerable amount of bandwidth. Savvy administrators will reserve enough extra bandwidth to handle resynchronization. Another option is to negotiate an agreement with the WAN provider to provide extra bandwidth on demand.

Rolling disasters

While replication can protect data against accidental loss, it will not prevent rolling disasters where faults or data corruption take place over a period of time. For example, if the data center is infected with a virus that is deleting files, the virus will eventually appear at the replication site. Once that happens, those changed/deleted files will eventually be deleted from the replication site, as well. In other words, the damage rolls through to the replication site(s).

Similarly, if a user accidentally deletes a file, that file will eventually be deleted from the replication site. Some replication software uses transaction time stamps to support recovery points, but this feature is only suitable to asynchronous replication. Changes are made immediately in synchronous replication. Consequently, most data centers implement another more traditional form of backup or other data protection to guard against damaged or deleted data.

Security and cost concerns

Security is always a concern when valuable corporate data is placed in a remote location. In most cases, remote replication sites are little more than unmanned bunkers where service personnel can be hours away. Accordingly, precautions must be taken to limit access to authorized personnel and to guard servers and storage systems against theft. Remote storage providers typically handle issues of physical security and access.

Remote replication also carries costs, such as leasing or acquiring the physical site, along with facilities costs, such as power, backup power, cooling and security. Storage costs can also be significant, since storage systems often duplicate the equipment used in the data center (e.g., Symmetrix-to-Symmetrix). Replication also requires software to manage the mirroring process, and such software must be upgraded and maintained over time through service agreements. Installation, maintenance and repairs must also be handled, often with personnel who may incur travel expenses to reach the remote site. An organization must weigh all the costs before making any replication choice.


Dig Deeper on Data storage strategy