Problem solve Get help with specific problems with your technologies, process and projects.

A seven-step approach to data replication -- Part 1

I read your articles on replication and we have some business critical need for replication. Could you suggest some contemporary products for storage replication such as getting files replicated over LAN/WAN to a remote server? Can you suggest some of the best ones to compare?

Your approach to data replication should be determined by:

1. Amount of data needing to be copied on a daily basis

The amount of data that needs to be copied will determine the bandwidth of the network required to move that amount of data. This is the physics of remote replication and usually the most expensive part. This is also the part that people don?t really think about until they find out how much bandwidth they really need.

2. Available network bandwidth between locations

If you only have a dial up connection between sites, you may as well back up the Chevy truck and start loading tapes to be shipped to your disaster site. As a good rule of thumb, you will need about 10Mbit of bandwidth for each Mbyte of data you need to copy per second. As an example, a T3 link can handle almost 5MB of data per second.

3. Distance between locations

The distance will determine what kind of remote copy solution you can use, synchronous or asynchronous. Under sync replication an I/O is not complete until it is written to both sides. This is a good thing because your transactions stay consistent. Every write written to the primary side is written "in-order" to the remote side before the application sees an I/O complete" message. The problem here is that Fibre Channel protocol requires four round trips to complete every I/O under sync replication. Even using dark Fibre cables between sites the speed of light becomes your limiting factor because of the four round trips, you loose about a millisecond for every 25 miles. Sync is limited in distance to about 100 kilometers. After that, application performance goes in the toilet. Async can go around the planet. So the farther you go, the more you need async remote copy.

4. Type of operating systems and how many servers involved

Software based replication products work great. The problem arises when you have hundreds of servers to copy data from. Buying a software license for 200 servers at the primary location and another 200 licenses for the servers that need to be at the remote site can get very expensive. Also, I don?t know of a software package yet that can be used with EVERY operating system. If you have AIX, Solaris, Netware, NT, Windows 2000 and VMS, you may need a several separate software solutions. For a homogenous NT or Unix environment though, software works great and can save you money.

5. Whether or not clustering is being used

Most cluster solutions require real time connectivity for heartbeat and locking for quorum resources. If you use clustering software like MSCS and want to stretch the cluster between locations so that all your applications transparently fail over, you will need to be within sync replication distances.

6. Availability of storage, servers and floor space at the remote site

If you have your own data center for your remote site, you're fine. If you need to lease space from a provider, you want to make sure your solution is as compact as possible. Server and storage consolidation MUST be considered prior to introducing hosted disaster recovery solutions. Hey, when you're paying by the foot you want to have very small feet!

Click for Part 2

Dig Deeper on Data storage strategy

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.