There are quite a few replication products on the market today. The techniques are divided into Host based replication and Hardware based replication. There are also two or more ways to replicate data. Synchronous and asynchronous are the main techniques used. (Semi-sync and multi-hop are others). I will focus this reply on hardware and software sync and async.
The reasons for data replication are obvious, especially in light of this past week?s tragedy. The idea is to separate data and computing resources in case of hardware failure, and by distance in case of a disaster. Clustering will afford hardware fault tolerance for the computing resources, but requires centralized storage, and does not in all cases provide for disaster recovery. Therefore, data redundancy is imperative.
One of the better recovery techniques is to allow for "wide-area" clustering, such as Digital provided under the VMS operating system. Up to 32 nodes of a single cluster could be separated over an Ethernet or "CI" (cluster interconnect) over distance, and share all data resources among all nodes. If one node was impacted, the rest of the nodes would automatically absorb the load of the failed node. Data residing on disks attached to each node could be either direct attached and shared among cluster members, or centrally shared between all nodes.
You may be hearing of this today as "stretched clusters". On open (non-proprietary) systems environments, Microsoft cluster server and Veritas cluster server application resources can be stretched between two sites, and the data replicated between those sites by either hardware or software based solutions. On Solaris, you can use SNDR from Sun or VVR from Veritas to do "host" based replication to your disaster site. Under SNDR, each write I/O is routed to both the local disk and the remote disk in sync. Veritas VVR can also do this, but they also offer async replication with time stamping and sequence IDs for transactional data integrity at the remote location. These solutions provide for seamless failover of resources to a remote site, or "hot-standby" sites that can be brought up in minutes of a disaster.
This was first published in September 2001