Real disaster recovery testing


This article can also be found in the Premium Editorial Download "Storage magazine: Lessons learned from creating and managing a scalable SAN."

Download it now to read this article plus other related content.

Success criteria for initial DR tests
Test one
  • Did you successfully recover at least 50% of your applications?
  • Were you able to correct/update your disaster recovery (DR) plan based on initial results?
  • Did you come close to your recovery time objectives (RTOs) and recovery point objectives (RPOs)?
Test two
  • Did you

Requires Free Membership to View

  • successfully recover at least 75% of your applications?
  • Was your recovery plan 95% accurate and up-to-date?
  • Did you successfully recover application groups with data interdependencies?
  • Did you meet your RTOs and RPOs?
Test three
  • Did you successfully recover all of your applications, including multiple application groups?
  • Did you simulate a recovery glitch, such as corrupt or missing data?
  • Did you perform successful backups of your data recovery site?
  • Did you simulate and benchmark performance of your applications at the recovery site?
  • Did you identify opportunities to improve on your RTOs or RPOs?

Application consistency groups
Regardless of whether DR is based on real-time replication or tape, the concept of application consistency groups is critical. The BCP plan should have a documented RTO and RPO for each application group (e.g., SAP, order processing, etc.). These RTOs and RPOs cumulatively become the DR service-level agreement with the application owners and end users. It's rare for a critical application group to be hosted by a single server or within one large database. Therefore, a consistent recovery from an application group perspective is vital. Application grouping and categorization requires research and preparation from an architecture standpoint, and should be one of the primary drivers for all DR testing activities. If all components aren't recovered in a coherent manner, the application may not work at all--even if each individual component is "successfully" recovered (see "Success criteria for initial DR tests," at right).

To ensure your DR testing recovers the entire application group, the architecture and recovery methodology must consider these questions:

  • If your recovery is based on tape, do all backups within a group complete at the same time?
    • What about any updates to data within the application group that occur during the backup?
    • How do the updates get incorporated?
  • If your recovery is based on real-time storage replication, is all data replicated at the exact same time?
    • If so, what if one of the applications within the groups falls behind?
    • Is replication halted on the others until synchronization is re-established?
    • What about middleware apps such as Tuxedo or MQSeries messaging queues, which don't readily lend themselves to replication?

This was first published in July 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: