This month's column continues the mini-tutorial in four parts describing the activities involved in disaster recovery...
planning. Here, we dig into the second phase of the three-phase planning model introduced last time: The design phase.
Most of the tasks in the design phase (see below graphic) involve the development of strategies and logistics for restoring infrastructure and recovering data required by mission-critical applications in the wake of an interruption event. Each task requires you to consider multiple alternatives so that you can arrive at the strategy most effective within the constraints imposed by budget.
If a strategy entails a high cost, you should expect to have to justify its expense to senior management. Thus, wherever possible, seek strategies that offer dual-use value: Strategies that not only cope with disaster exposures but also deliver value to the organization in its day-to-day operations.
For example, consolidating storage into a fabric to share a tape library may afford better data protection than what is currently being done. But, it is a pricey fix. Discover what other value might accrue from the new topology, such as server consolidation and software licensing cost reduction, or the prolonging of the useful life of legacy tape silos, etc. The more robust the business case, the more likely your strategy is to get the nod from senior management.
People often ask me which of the design tasks should be given priority if you are doing a DR plan on a budget. The answer is two-fold: Disaster avoidance systems selection and data recovery strategy development.
The disaster avoidance system task focuses on the deployment of technologies that help to identify conditions or events that could develop into disasters so they can be dealt with before causing interruptions. Avoidance systems (including management software, security controls, fire detection and suppression systems, etc.) not only prevent avoidable disasters, they may also help to save the lives of personnel.
Data recovery planning is the other task I consider to be most important. Unlike networks, systems and user work areas, which avail themselves of strategies based either on redundancy or replacement, data protection is effectively limited to a single strategy: redundancy. To be safeguarded from loss or corruption, data must be replicated.
The approach selected for data redundancy typically falls somewhere on a spectrum between disk-to-tape (backup) and disk-to-disk (mirroring). A growing number of options -- once presented by the storage industry as an either/or solution set -- are now appearing between these alternatives. The figure below provides a current spectrum of data protection solutions organized by objective (time-to-backup versus time-to-restore).
In this spectrum, you can see the broad range of solutions that the industry has been bringing to the table to address the issues of time and cost in data protection.
To shorten backup times, various software features have been introduced, such as that supporting incremental backups, hot backups, inode snapshots, electronic vaulting and disk-to-disk functionality that emulates tape. Conversely, other technologies have been suggested to reduce the price tag of multi-hop, disk-to-disk mirroring. Some examples of these include disk-to-disk replication, commonality factoring from Avamar, software and hardware-based multi-targeting and network caching.
So, there are many alternatives to fit many environments and budgets, and a lot of options for planners to consider.
How important is factoring in data protection in disaster recovery planning? Increasingly, you'll see the metric of "time-to-data" in use in DR planning. This determines the amount of time required post-disaster to restore data access by mission-critical applications. Time is money, and time-to-data is synonymous with the cost that accrues to a disaster.
If companies have limited funds for DR planning, they are best spent on a combination of disaster avoidance and data protection.
This is not to minimize the importance of planning for application (and host platform), network or end user recovery. The more logistics that you can preplan in these important dimensions of DR, the better.
However, many companies have found that, even when carefully defined system, network and user work area recovery schemes fall prey to the unplanned consequences of a disaster event, recovery of these elements can often be accomplished "on the fly." Data, however, cannot be replaced if it has not been duplicated in advance of a disaster.
After all of the strategies have been designed and documented, the final phase (phase 3) of the planning project involves:
- The creation of recovery teams and their training
- Testing of the plan
- Implementation of a change management process
For the final point, above, implementing a change management process will capture test results and provide a "360-degree feedback loop" to the data store created earlier. This allows the plan to be reiterated and kept up to date with changes in the business and in its IT infrastructure.
We will look at this phase in greater detail in next month's column.
About the author: Jon William Toigo has authored hundreds of articles on storage and technology along with his monthly SearchStorage.com "Toigo's Take on Storage" expert column and backup/recovery feature. He is also a frequent site contributor on the subjects of storage management, disaster recovery and enterprise storage. Toigo has authored a number of storage books, including "Disaster recovery planning: Preparing for the unthinkable, 3/e".