Real disaster recovery testing


This article can also be found in the Premium Editorial Download "Storage magazine: Lessons learned from creating and managing a scalable SAN."

Download it now to read this article plus other related content.

Production workarounds
We often hear that "We can't encroach on production" to do DR tests. While I wholeheartedly agree with this principle, with proper planning there's no reason that production needs to be affected.

In a best-case scenario, the DR site "becomes" the production site, with servers, networks and storage all taking similar or identical identities of the production platforms being replaced. However, there are some scenarios, most notably during testing, where production is still up and running while a recovery event or test is occurring. While much of this can be handled via networking and routing, sometimes the storage environment can't be completely isolated, especially if some type of real-time replication is being used. One way to alleviate this problem is to use snapshots of the replicated production data for DR testing, rather than the actual replicas. This will allow detailed testing to a specific RPO without the need to terminate or alter the background replication processes during the test. Some observers may suggest that this will require more storage at the DR site, but you'll probably have snapshots or mirrors at your DR site if you're replicating, especially to protect from rolling disasters. Use a subset of those copies for your DR testing.

There are times when you may need to recover from a DR event when production is still up. One scenario is a bomb threat or any other event that puts the entire data

Requires Free Membership to View

center at risk--even if it's still up and running. One such event I can recall was the "Chicago Flood" of 1992. This event was caused by a piling driven into the Chicago River bottom, which caused a leak in one of Chicago's underground freight tunnels. The rush of water spread through much of the system's miles of tunnels, flooding sub-basements and disrupting utility service throughout "the Loop." When my Chicago-based company was notified that our sub-basement might be affected by this flooding, we immediately declared a disaster. Throughout our entire recovery process, our primary data center was still up and running, as the pending flooding hadn't yet occurred. Due to the parallel "production environments," including live network links between them, our recovery team had to perform numerous workarounds for all networking and routing. If we performed recovery as planned, we could have caused downtime by broadcasting that the same hardware and apps were available in two distinct locations. Ultimately, our primary site wasn't affected, so this DR event was probably the best test we could have devised.

Other software approaches for addressing the production issue use specialized snapshots of production data in the data center while the testing occurs at the DR site. While this software is an additional expense, it may be worthwhile, depending on test frequency and the criticality of production data.

The nontechnical and technical aspects of recovering from a disaster are equally important and should be a critical part of every DR test. The details and technical aspects of DR tests will ultimately be the foundation for successful testing. If you take the best policy and procedure, and map them to poor technology and infrastructure, the results will be disappointing at best.

This was first published in July 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: