Real disaster recovery testing


This article can also be found in the Premium Editorial Download "Storage magazine: Lessons learned from creating and managing a scalable SAN."

Download it now to read this article plus other related content.

Regardless of the recovery approach, you need to define your DBA's responsibilities within the consistency group. Note, for example, if they rely on Oracle archive and redo logs, perform hot/cold backups, if the database is replicated via a combination of methods and how flat-file data is synchronized with database data.

Finally, you need to consider the truly heterogeneous application groups, which can include mainframe, open systems and even NAS platforms, all on different tiers of storage. It's not too difficult to replicate data on a single storage array, but it gets much more challenging when the data to be synchronized is spread across several different server and storage platforms.

So why ask these questions specific to DR testing? Because if your order-entry system has been recovered five hours prior to your warehouse and shipping systems, and you have thousands of dollars of inventory on the shipping dock with no customers attached to it, that's a big deal and potentially a huge expense. Data consistency and synchronization are bigger issues than just getting the data offsite. If the data isn't usable or if the recovery point becomes unreasonable, then you're putting your money in the wrong areas of DR.

Think of your company's most mission-critical application. Chances are pretty good that there's a core system or application that has hundreds, or maybe even thousands, of data feeds going in and out during a 24-hour period. And if

Requires Free Membership to View

it's a 24/7/365 application, at what point do you stop to synchronize? Recurring DR testing will help to address and remediate these issues.

When DR testing becomes routine
  • Include failback to the primary data center in your tests (and be careful not to affect production).
  • Emulate problems in the running of production at the disaster recovery (DR) site (e.g., host bus adapter failure, file restore requests, performance issues).
  • Do it "unscheduled."

Getting end users involved
Application group recovery is where the rubber really hits the road for DR testing, and it can be a costly endeavor for each DR test. I worked with a client whose primary success criteria was being able to place a new order, start to finish, on their proprietary, in-house application running fully in the DR site. That might have been that user's key success metric, but we also determined that equally important--to the tune of millions of dollars--was the ability to continue those in-process orders and ensure that all critical components within the application group, including all sales inputs, fulfillment, shipping, billing/invoicing and accounts receivable, were in synchronization in accordance with the documented RPO.

So a DR test must allow adequate time for the application owners and end users to participate (see "When DR testing becomes routine," at right). Time must also be allocated to work through remediation of the identified critical issues, which includes interfacing with the IT recovery team, as well as programmers, DBAs, application owners and end users. This may take several hours or days. Make the remediation action plan a critical part of the test--don't treat it as an afterthought.

This was first published in July 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: