Will your disaster recovery plan work?


This article can also be found in the Premium Editorial Download "Storage magazine: Better disaster recovery testing techniques."

Download it now to read this article plus other related content.

Are you confident your disaster recovery (DR) plan will work if a disaster strikes? When TheInfoPro Inc., a NY-based research network, posed that question to several hundred IT executives, the results weren't exactly reassuring. Only 55% of the managers surveyed were confident they could recover their open-systems data in an emergency. The rest were only somewhat confident or not confident their DR system would work.

Requires Free Membership to View

Losses incurred in a disaster
Direct losses Indirect losses
  • Loss of revenue
  • Loss of employee productivity
  • Possible regulatory penalties
  • Service commitment penalties
  • Possible liability and litigation
  • Reduced customer satisfaction
  • Competitive disadvantage
  • Loss of goodwill, tarnished reputation
"It is a little disturbing," says Ken Male, founder and CEO of TheInfoPro. The lack of confidence, the study suggests, lies in the testing of DR plans or the lack of it. "Almost half of the respondents test only once a year and that's not really enough," says Male. The obstacles to more frequent testing are resources and money. "Almost all of the [survey's] respondents would test more often if they could. Resources and cost are overwhelmingly the barrier," he notes.

No matter how many checklists a company makes and distributes, the number of disaster scenarios it considers or even how assiduously it backs up its data, managers can't be confident in a firm's ability to recover data if the systems haven't been tested thoroughly. "You have to test to see if your disaster recovery processes really work," says Michelle Zou, research analyst with the storage software team at IDC, Framingham, MA. "Not everybody does enough testing."

Testing is difficult. "It's a complicated process. You're talking about mission-critical applications that companies don't want to take down," Zou says. As a result, tests have to be scheduled far in advance and, to do it right, the testing will likely require the involvement of a large number of people. All of this drives up costs. "And what if the tests don't work?" Zou asks. The organization has to go back through the entire process to identify and fix the problems and then test again--which means more time, money and disruption.

Large mainframe IT shops can often offer a model for DR testing. MasterCard International Inc., for instance, has been honing its DR processes for 15 years and continues to refine them. The current testing plan calls for two major test exercises a year in April and October. Each exercise tests up to 40 of what MasterCard classifies as its Tier 1 systems to meet a corporate DR mandate of testing every Tier 1 system at least once a year.

Disaster recovery testing costs
  • IT staff time
  • Business staff time
  • Activating the hot site
  • Travel and accommodations
  • Shipping backup tapes
  • Redirecting communications links
  • Remapping the databases to test business transactions
The tests start on a Tuesday and extend through Sunday, and typically involve 50 to 70 people at an alternate site 200-plus miles away. To get the maximum value from this effort, "we do a lot of training during the test," notes Randy Till, MasterCard's vice president, global business-continuity management. "We make sure key recovery people run the procedures. We push for end-to-end tests and we don't test just one application alone, but all the system interdependencies, too."

DR costs
Such mainframe-style DR testing is expensive, something only the largest companies, and those that require bulletproof DR, can afford. "Even small tests can cost $30,000 per test," reports Male. "Large tests can run $1 million a test." Direct costs include the time of the people involved, telecommunications costs, the cost of activating a hot site or another remote facility, and travel (see "Disaster recovery testing costs," this page).

Obviously, testing costs would seem trivial if you couldn't recover from a disaster in a timely way (see "Losses incurred in a disaster," this page). According to a recent Forrester Research study, companies with annual revenues of at least $1 million from an online business average losses of $8,000 per hour during a systems outage, which comes to $192,000 for each 24-hour period the site is down. An earlier study by Meta Group (now Gartner Inc.) found that unplanned downtime of critical systems could cost a large company as much as $1 million per hour due to lost revenue, reduced employee productivity and possible regulatory penalties. And the $1 million figure doesn't include the negative impact on the business' reputation. It's no wonder companies like MasterCard don't skimp on DR testing.

Even organizations like Harvard University feel a compelling need for DR, although costs are a significant issue governing how well each of the university's various apps are protected. "We do primary infrastructure recovery for those applications willing to pay for it," says Ron Hawkins, senior technical architect at the Cambridge, MA-based institution. Not surprisingly, the only business units willing to sign up for Harvard's IT infrastructure recovery are those responsible for critical apps such as payroll, financial and data warehousing. "Everybody wants DR until they see the price," he notes.

Faced with widespread demand for less-costly DR options, Hawkins has explored a variety of options, including VMware, snapshots and remote replication, in an effort to provide a less-expensive recovery service that would still be effective. He's also tried working with business units to juggle their recovery point objectives (RPOs) and recovery time objectives (RTOs) to come up with something less costly to implement, test and maintain (see Disaster recovery testing tips).

The only alternative Harvard has come up with to sending a team to its hot site for several days is a collocation facility located off campus. "They'll sell us rack space and we can replicate some of our critical infrastructure systems there," says Hawkins. These systems include the e-mail hub and DNS service, and perhaps a half- dozen small, but critical, utility services that have to remain accessible to apps outside of Harvard even if all systems go down in a disaster. By replicating them to the collocation facility, they can be recovered nearly instantaneously.

This was first published in October 2005

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: