Achieving the best results when it comes to data backup and recovery involves the use of two important metrics: recovery time objective and recovery point objective. Both metrics are essential when developing data backup and recovery plans, as well as traditional business continuity and technology disaster recovery plans.
It's important to examine each of these metrics, their role in the areas identified above, how to compute them and their cost implications and how to build them into a variety of resilience plans.
What is RTO?
Recovery time objectives (RTOs) measure the amount of time from the occurrence of a disruptive event to when the affected resources must be fully operational and ready to support the organization's objectives. Figure 1 depicts the RTO metric.
When a resource is disrupted, several actions might be needed, e.g., replacing damaged components, reprogramming and testing, before the resource can be placed back in service and business as usual (BAU) can return. An inverse relationship exists between the time for recovery and the cost needed to support recovery. Specifically, the shorter an RTO is in terms of time, the cost for recovery increases, and vice versa. Therefore, it's very important to have business unit leaders involved when determining RTO values. They might want a 30-minute recovery, for example, as the target time, but the cost to achieve that goal might be prohibitive.
This article is part of
What is RPO?
Recovery point objective (RPO) is especially important when it comes to data backup and recovery activities. Organizations -- e.g., banks, credit card firms -- that conduct many transactions over the course of a day will probably need backups to occur more frequently, almost in real time, so they will have the most current critical data for their specific needs available for future transactions. This means data must not age very much from when it was last backed up, meaning the data will be as up-to-the-moment as possible. This is the RPO, to have backed up data as current as possible. Figure 2 depicts the RPO and its relationship to the RTO.
Again, we see an inverse relationship between the RPO value and the cost to achieve it. A very short RPO, e.g., 10 to 30 seconds, means that data must be backed up very frequently, necessitating the use of high-speed backup technologies such as data mirroring or replication, especially if backups are stored off site in a cloud or other arrangement. Add to that the network bandwidth needed to transmit large quantities of data, and the cost can be significant to achieve the required data availability.
Core similarities and differences
Both metrics are important elements used in data backup and data recovery plans. Ideally, both should be key backup and recovery features to ensure that critical data and systems are available when needed, especially in the aftermath of a disruptive event.
Aside from their use in recovery plans, they are quite different in practice. RTOs are designated after an event occurs. RPOs are used before an event occurs. However, when the two are linked, a short RTO usually requires an equally short RPO, particularly when data protection is the requirement. If we are considering the backup and recovery of systems only, an RTO value might be sufficient to determine how recovery will take place. However, if the system to be recovered also processes critical data, then both metrics should be synchronized.
Computing RPO and RTO
A business impact analysis (BIA) is designed to identify relevant RTO and RPO values. Risk analyses can also provide valuable input to assigning values to these metrics. BIAs identify mission-critical business processes and identify the technologies, people and facilities needed to ensure BAU. They might also identify the financial implications -- e.g., loss of revenue, imposition of fines -- caused by the disruption.
Based on input from business unit leaders and senior management, numeric values are defined that represent the best-case scenarios for recovering from disruptions from a business perspective. Now, no mathematical formulae exist to compute RTO/RPO values. They are strictly numeric time values. For example, an RTO for a fairly critical server might be one hour, whereas the RPO for less-than-critical data transaction files might be 24 hours, and might also support the use of backup tape storage equipment.
As mentioned earlier, as RTO/RPO numeric values decrease, costs to achieve those metrics are likely to increase. The only way to determine the true cost is to first identify the desired RTO/RPO values, then conduct research to determine what is needed to achieve the metric if a disruption occurs. It might then be necessary to advise business unit leaders and senior management of the added investment.
This is where potential conflicts might occur, because if management doesn't want to spend additional funds to achieve the desired metrics they specified, they must understand that such resistance might incur additional risk if a disruptive event occurs. Ideally, management must be made aware of the potential financial issues and other implications from an event -- e.g., damage to reputation -- before they decide.
Building RTO/RPO into data backup and recovery plans
The inclusion of RTO/RPO metrics in data backup, data recovery and other resilience -- e.g., BC/DR -- plans is essential, and ensures that the procedures, personnel and technology resources used to achieve the metrics are appropriate. RTO/RPO values can be included in plans for reference and an indication of where the recovery bar has been set.
For data backup and recovery, these metrics are essential for planning, as they help determine the optimum data backup and technology configuration to achieve the goals. They are also important from compliance and audit perspectives, for example, as auditors might look for evidence of these values as key data backup/recovery controls.