In a recent expert response, Christopher Poelker offered the following approach to data replication and an overview of the replication market.
Your approach to data replication should be determined by the following:
1. Amount of data needing to be copied on a daily basis
The amount of data that needs to be copied will determine the bandwidth of the network required to move that amount of data. This is the physics of remote replication and usually the most expensive part. This is also the part that people don't really think about until they find out how much bandwidth they really need.
2. Available network bandwidth between locations
If you only have a dial-up connection between sites, you may as well back up the Chevy truck and start loading tapes to be shipped to your disaster site. As a good rule of thumb, you will need about 10M bit of bandwidth for each M byte of data you need to copy per second. As an example, a T3 link can handle almost 5M bytes of data per second.
3. Distance between locations
The distance will determine what kind of remote copy solution you can use, synchronous or asynchronous. Under sync replication an I/O is not complete until it is written to both sides. This is a good thing because your transactions stay consistent. Every write written to the primary side is written "in-order" to the remote side before the application sees an "I/O complete" message.
The problem here is that Fibre Channel protocol requires four round trips to complete every I/O under sync replication. Even using dark Fibre cables between sites, the speed of light becomes your limiting factor because of the four round trips -- you lose about a millisecond for every 25 miles. Sync is limited in distance to about 100 kilometers. After that, application performance goes in the toilet. Async can go around the planet. So the farther you go, the more you need async remote copy.
4. Type of operating systems and how many servers involved
Software-based replication products work great. The problem arises when you have hundreds of servers to copy data from. Buying a software license for 200 servers at the primary location and another 200 licenses for the servers that need to be at the remote site can get very expensive. Also, I don't know of a software package yet that can be used with every operating system. If you have AIX, Solaris, Netware, NT, Windows 2000 and VMS, you may need a several separate software solutions. For a homogenous NT or Unix environment though, software works great and can save you money.
5. Whether or not clustering is being used
Most cluster solutions require real-time connectivity for heartbeat and locking for quorum resources. If you use clustering software like MSCS and want to stretch the cluster between locations so that all your applications transparently fail over, you will need to be within sync replication distances.
6. Availability of storage, servers and floor space at the remote site
If you have your own data center for your remote site, you're fine. If you need to lease space from a provider, you want to make sure your solution is as compact as possible. Server and storage consolidation must be considered prior to introducing hosted disaster recovery solutions. Hey, when you're paying by the foot you want to have very small feet!
7. And last but not least, your available budget
This is a no-brainer. Many companies, when faced with the real-world costs of disaster recovery, tend to get shell-shocked. Consider the costs:
- Floor space
- Servers for the recovery site
- Staff for the recovery site
- Storage hardware and licenses
- Software licenses
- Services to implement the solution
- Services to determine what needs to be copied, and why
- Network links (this is usually the most expensive part)
- Network-based SAN extension gear
The costs can add up quick. This sometimes makes the CTAM method look like a wonderful idea. (CTAM = Chevy Truck Access Method or, dump your backup tapes in the back of a truck and drive your data to the remote site).
An overview of available replication solutions
- Most common hardware-based data replication solutions:
- EMC SRDF
- IBM PPRC
- HP DRM
- HDS TrueCopy sync
- HDS TrueCopy async
- HP DRM
Adaptive (bulk) copy:
- All the hardware vendors
- The most common appliance-based data replication solutions:
- HP StorageApps
- The most common software-based solutions:
- Host-based mirrorsets over iSCSI
- Legato Octopus and RepliStor
- Topio SANSafe (this also does async with write-order fidelity)
- NSI Doubletake
- Veritas VVR and VSR (Veritas VVR does async with write-order fidelity)
- Fujistu Softek TDMF
- Peer software's PeerSync
- Diasoft File Replication Pro
- Insession replication suite
- SoftwarePursuits SureSync
- XOsoft WANSync
- Sunopsis Java replication suite
- All the data storage companies out there that let you connect to their storage over a fast connection. Let them worry about keeping the data safe.
SteelEye combines Linux server failover and remote data replication
Data Replication ( File Replication, Database Replication …
Data Replication Software ( Storage Replicators, Snapshot ...
Data replication services well-suited to SMBs
What is storage replication service? – a definition from Whatis …