Replication is the copying of data from one system to another across a network. Most often in high availability circles, data is replicated from one site to a secondary site, so that the data is protected in the event that something happens to the primary site. Most often replication is used for data protection; to make sure that if something happens to the primary site the data is still safe and available for almost immediate use at the secondary site.
Of course, backups serve the same function. Properly administered and distributed backups ensure that data is available at the secondary site as well. The difference is that with replication, data is copied in an ongoing basis, while backup data is copied to tape only occasionally, and in many enterprises, sent off site even less often than that.
When data is protected via offsite backups only, and a calamity occurs that causes the primary site to become unavailable, any data that was written to disk after the backup completed is lost and probably irretrievable. When that same data is replicated to the secondary site, it can be available there mere minutes or even seconds after it was written to the primary site's disks. How out of date the replicated data is on the secondary side depends on how often it is replicated. If the data is replicated synchronously, then it is written to the remote side as often as it is written to the local side, but with a potential performance penalty as the writes queue up
For the ultimate in data protection, you can't beat synchronous replication. However, there is almost always performance impact. Asynchronous replication could fall a handful of writes behind the primary, but in return for the potential loss of that few transactions, you get the data protection without as much performance impact.
Different implementations of replication will introduce different levels of performance impact. In my experience, software-based synchronous replication can be significantly faster than hardware-based. And most hardware vendors do not offer asynchronous replication, delivering periodic replication instead. If you are considering implementing replication to protect your data against loss, be certain that you run performance benchmarks that reflect the actual quantity and size of the data you'll be replicating, over the same kind of network. Otherwise, the benchmark is pretty much invalid.
Replication is a far more complex topic than we can cover thoroughly in a brief tip like this one. Before you go too far down the road toward implementing it, be sure you do your research, and that you understand all of your options.
Evan L. Marcus is the Data Availability Maven for VERITAS Software. You can reach him at firstname.lastname@example.org.
This was first published in March 2003