This article can also be found in the Premium Editorial Download "Storage magazine: Storage products of the year 2003."
Download it now to read this article plus other related content.
|Calculating recovery point objective|
In this scenario, a particular system is backed up each night at 8:00. Let's assume that the backup tape is sent offsite each morning at 8:00. If a disaster resulting in loss of the site were to occur at 7:59 a.m., the only data that would be available for recovery would be the previous night's backup, which is already 36 hours old at that point in time.
Aligning need and cost
To ensure an effective DR capability, you must understand your requirements which are based on recovery point objective (RPO) and recovery time objective (RTO).
RPO is the worst-case data loss that's acceptable for a specific class of data (see "Calculating recovery point objective"). RTO is the time from the disaster to the resumption of business.
Some of the key trade-offs to consider are the availability dedicated vs. allocated DR assets, online vs. tape recovery and the extent of automation in the recovery or failover process. A major cost element is the availability of assets, specifically standby servers on which to recover. Maintaining a hot site requires dedicating assets that are likely to have low utilization rates and therefore wouldn't be affordable in many situations. On the other hand, the lack of a functioning recovery site could increase RTO to such an extent that some companies could be mortally wounded by the time they recovered.
Many companies that have investigated building advanced DR infrastructures based on remote replication end up abandoning these plans because of the high recurring communications expense. Is it possible to build an effective DR strategy without replication? Where should the investment and focus be placed?
A recent GlassHouse Technologies engagement resulted in the recovery options shown in "RPO/RTO impact of potential DR solutions" on this page. The intent was to show potential RPO/RTO improvements based on an increasing level of technology investment. A variety of alternatives were being considered. Due to distance requirements, synchronous replication wasn't an option. Current DR tests showed a recovery capability of greater than eight days for RTO and RPO because of problems in the backup infrastructure. Remediation steps would reduce the RTO/RPO to two days at a low cost. By weighing the incremental gains with the costs, the company developed a road map based on business requirements.
Process, process, process
While reductions in RPO are largely a technology investment, a significant improvement in RTO can be realized by having a well-executed DR process. Because this is usually a more difficult area to tackle, it often gets overlooked as people search for technology solutions. That's unfortunate because improvement in RPO by implementing technologies is measured in hours; gains in RTO can be measured in days.
In a true DR scenario, people are performing non-routine tasks often in unfamiliar surroundings. There can be confusion over responsibilities and the sequence of recovery tasks. The potential for error is high.
For these reasons, you should invest as much in DR process as in technology. This doesn't mean just creating a DR planning document that sits on the shelf. It means developing a process that ensures DR is considered in all IT planning, recovery plans are reviewed and updated regularly and realistic testing is done on a regular and irregular basis.
This was first published in January 2004