Seven steps to data replication

Seven steps to data replication

In a recent expert response, Christopher Poelker offered the following approach to data replication and an overview of the replication market.

Your approach to data replication should be determined by the following:

1. Amount of data needing to be copied on a daily basis

The amount of data that needs to be copied will determine the bandwidth of the network required to move that amount of data. This is the physics of remote replication and usually the most expensive part. This is also the part that people don't really think about until they find out how much bandwidth they really need.

2. Available network bandwidth between locations

If you only have a dial-up connection between sites, you may as well back up the Chevy truck and start loading tapes to be shipped to your disaster site. As a good rule of thumb, you will need about 10M bit of bandwidth for each M byte of data you need to copy per second. As an example, a T3 link can handle almost 5M bytes of data per second.

3. Distance between locations

The distance will determine what kind of remote copy solution you can use, synchronous or asynchronous. Under sync replication an I/O is not complete until it is written to both sides. This is a good thing because your transactions stay consistent. Every write written to the primary side is written "in-order" to the remote side before the application sees an "I/O complete" message.

The problem here is that Fibre Channel protocol requires four round trips to complete every I/O under sync replication. Even using dark Fibre cables between sites, the speed of light becomes your limiting factor because of the four round trips -- you lose about a millisecond for every 25 miles. Sync is limited in distance to about 100 kilometers. After that, application performance goes in the toilet. Async can go around the planet. So the farther you go, the more you need async remote copy.

4. Type of operating systems and how many servers involved

Software-based replication products work great. The problem arises when you have hundreds of servers to copy data from. Buying a software license for 200 servers at the primary location and another 200 licenses for the servers that need to be at the remote site can get very expensive. Also, I don't know of a software package yet that can be used with every operating system. If you have AIX, Solaris, Netware, NT, Windows 2000 and VMS, you may need a several separate software solutions. For a homogenous NT or Unix environment though, software works great and can save you money.

5. Whether or not clustering is being used

Most cluster solutions require real-time connectivity for heartbeat and locking for quorum resources. If you use clustering software like MSCS and want to stretch the cluster between locations so that all your applications transparently fail over, you will need to be within sync replication distances.

6. Availability of storage, servers and floor space at the remote site

If you have your own data center for your remote site, you're fine. If you need to lease space from a provider, you want to make sure your solution is as compact as possible. Server and storage consolidation must be considered prior to introducing hosted disaster recovery solutions. Hey, when you're paying by the foot you want to have very small feet!

7. And last but not least, your available budget

This is a no-brainer. Many companies, when faced with the real-world costs of disaster recovery, tend to get shell-shocked. Consider the costs:

  • Floor space

  • Servers for the recovery site

  • Staff for the recovery site

  • Storage hardware and licenses

  • Software licenses

  • Services to implement the solution

  • Services to determine what needs to be copied, and why

  • Network links (this is usually the most expensive part)

  • Network-based SAN extension gear

The costs can add up quick. This sometimes makes the CTAM method look like a wonderful idea. (CTAM = Chevy Truck Access Method or, dump your backup tapes in the back of a truck and drive your data to the remote site).

An overview of available replication solutions

  • Most common hardware-based data replication solutions:
    Synchronous:
  • EMC SRDF

  • IBM PPRC

  • HP DRM

  • HDS TrueCopy sync

  • Asynchronous:
  • HDS TrueCopy async

  • HP DRM

  • Adaptive (bulk) copy:
  • All the hardware vendors
  • The most common appliance-based data replication solutions:
  • HP StorageApps

  • FalconStor
  • The most common software-based solutions:
  • Host-based mirrorsets over iSCSI

  • Legato Octopus and RepliStor

  • Topio SANSafe (this also does async with write-order fidelity)

  • NSI Doubletake

  • Veritas VVR and VSR (Veritas VVR does async with write-order fidelity)

  • Fujistu Softek TDMF

  • Peer software's PeerSync

  • Diasoft File Replication Pro

  • Insession replication suite

  • SoftwarePursuits SureSync

  • XOsoft WANSync

  • Sunopsis Java replication suite
  • Outsourcing:
  • All the data storage companies out there that let you connect to their storage over a fast connection. Let them worry about keeping the data safe.

About the author: Christopher Poelker is a storage architect for Hitachi Data Systems and SearchStorage.com's resident SANs expert. Ask him a question or view his recent responses.

 

RELATED INFORMATION

SteelEye combines Linux server failover and remote data replication
SteelEye Technology has combined its failover clustering and remote replication software in one bundle, creating an all-in-one data ...

Data Replication ( File Replication, Database Replication …
Read a description of Data Replication. This is also known as File Replication, Database Replication, Replication of Data, and Replication.

Data Replication Software ( Storage Replicators, Snapshot ...
Read a description of Data Replication Software. This is also known as Storage Replicators, Snapshot Replication Software, File Replication Software, …

Data replication services well-suited to SMBs
Data replication services enable cost-effective disaster recovery for small companies and help providers predict a customer's real storage …

What is storage replication service? – a definition from Whatis …
Other terms for this type of service include file replication, data replication, and remote storage replication.


This was first published in January 2003

Dig deeper on Storage Resources

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close