How do we actually get the DR site up once a failure on the primary site occurs and what are the procedures involved? Also, what are the configuration prerequisites at the DR site? Can we have an automatic failover?
The answer to your first question is, it depends. There are several ways to restore a DR site. These vary from loading backup tapes from core systems at the DR site to recovering from a point-in-time with the help of data replication software. Most sites today are using replication tools such as HDS' TruCopy/Nanocopy, EMC's SRDF/TimeFinder or host-based tools such as Veritas' VVR and Compaq's DRM. These replication tools insure that you have the most recent data at the DR site in either a synchronous or an asynchronous manner. Check the Web sites to determine which software would work best for your situation.
The procedure in getting a DR site "up" involves:
1. Suspending the links between the primary and DR site.
2. Set the DR disks as the primary storage. This is called a primary takeover.
3. Boot (or drvcfg) DR systems to recognize the new disk.
4. Run scripts to insure you have stable databases and other applications.
5. Once systems are stable, the failover is complete.
I suggest you automate as much as possible since your ultimate goal is to minimize downtime. A DR site at minimum should have the computing, storage and network to support core business functions. Also, a way to get the data on these systems is needed.
Another consideration is the location of your DR site. In most cases a DR site 10-20km away is sufficient. But be aware of the natural disasters in your area that could take place. In the Bay Area I've worked on DR as far away as Arizona and Florida in one case (which I never really understood since they were counting on the fact that a hurricane and an earthquake would not happen at the same time).
Automatic failover is possible if you have a system up and running at the DR site ready to assume the workload. Almost all of the replication products listed above support automatic failover to a degree. True automatic failover depends on the application and the cluster awareness of that application. Clustering the two sites is possibly the only way to achieve true automatic failover.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our .tPOoai6feFV^1@.ee83ce2!viewtype=threadDate>Administrator Central discussion forum.
This was first published in February 2002