This article can also be found in the Premium Editorial Download "Storage magazine: The hottest storage technology for 2007."
Download it now to read this article plus other related content.
There's recovery, and then there's recovery
When problems progress far enough that we declare a disaster rather than just an operational problem, we have to consider what we're recovering. This is what often trips up IT folks: True disaster recovery should include much more than simply data recovery. In fact, true disaster recovery must include resumption of all business activities, including personnel and facilities, as well as technologies like storage, servers, applications and communications.
But in IT infrastructure, we can only concern ourselves with the technical elements of the disaster plan. As long as we communicate with the rest of the business that other areas aren't included, we can focus on doing our part. Even when restricted to applications and system infrastructure, we can quickly see that disaster recovery extends further than the traditional storage realm. Whole applications must be considered, which can include multiple elements on many different systems. And we must understand the requirements of the systems' users, something IT is often ill-prepared to do.
It can be helpful to consider business expectations when thinking about recovery requirements. Operational recovery will often require shorter recovery times or involve less data loss than a true disaster. Of course, people tend to be understanding when a large-scale disaster hits a company. Frequent operational outages will need to be addressed much more quickly and data loss
Another factor is the time required to actually recover data and restart systems. Even with a crate of backup tapes and sufficient bare systems, as would be provided at a contract disaster recovery site, recovery can take a surprisingly long time. Assume it will take up to 10 times as long to restore data from tape as it took to write it to the tape. This may seem excessive, but my own experience in disaster recovery tests has shown that tape-based data recovery is finicky to the extent that it can become nightmarish. Identifying tapes, locating the correct equipment, loading cartridges, indexing data and restoring can lead to numerous restarts, especially when an unfamiliar location is used.
When preparing a true disaster recovery plan, we must be pragmatic about the prospects of a disaster happening as well as our real capabilities. If only a few applications can be restored in less than four hours, a few more in less than a day and most others taking a week or more, the applications must be clearly classified so that those with the shortest term requirements can be dealt with first.
Payroll is a good example of an application that might be somewhat tolerant in the event of a real disaster. Most payroll companies will, on request, duplicate the previous period's payments to ensure that employees get paid despite the IT disaster. Also, many payroll processors keep the data at their end, so a reinstall of the application is all that's needed to update records.
In fact, most companies have very few applications that truly demand recovery in less than a few hours in the event of a site-wide disaster. Point-of-sale, flight operations, fund trading and similar time-sensitive systems will probably need this type of reliability, but many can be protected in more creative ways, such as by using a wide-area database cluster as a recovery solution for critical systems. If these ultra-critical applications can be removed from the disaster recovery solution, there may not be as much to protect as first thought.
This was first published in December 2006