Time to change your thinking about data protection
If you're adding disk to your backup mix, try this step-by-step approach for faster backups and recoveries.
For years, IT administrators approached data protection the same way: They backed up data to tape, crossed their fingers that the backup process worked and was complete (that everything that needed to be backed up was actually backed up), and then prayed they'd never be asked to do a restore.
It wasn't pretty, but it was all administrators had available to them at the time, so it became the accepted practice. Everyone knew about the potential problems of backing up to tape, but no one said much about it. Backup had become "IT's dirty little secret." And, more importantly, no one did anything differently.
Today, we're more aware of the need to find and recover data quickly. The reason for our heightened awareness may be due to growing governmental and corporate scrutiny; the maturing of disk-based backup; the availability of higher density, lower cost disk drives in storage systems; the increasing availability of recovery-focused products; or the ongoing drain of rising data volumes. The reason doesn't matter; what's important is that a decisive shift has occurred and the industry no longer thinks about data protection in pure backup terms.
After all, what good is backing up data if you can't restore it when you need to? It's a question whose answer goes without saying, but given the history of the data protection market, it's one the Enterprise Strategy Group (ESG) believes organizations should continually ask themselves, especially as their business objectives change.
The bottom line is that it's no longer a question of if data can be restored, but how quickly it can be recovered and how much data loss an organization can tolerate. It's about making sure that recovery time objectives and recovery point objectives (RTOs/RPOs) match the value of data at any given point in the data lifecycle. It's also about being "recovery minded."
Disk-based data protection environment
ESG sees being recovery minded as a four-step process:
- Think in terms of "3DR," the ESG framework that defines three levels of data protection.
- Build a data protection ecosystem, or continuum, within the 3DR construct.
- Grow that ecosystem over time.
- Drive additional value by layering in technologies such as continuous data protection (CDP), data de-duplication and snapshots over time when it's clear they'll add value to your environment.
|Tape's vanishing act|
Tape replacement is happening now...
18% of organizations have already permanently replaced tape with disk (and that number is rising quickly) 58% would consider permanently replacing tape with disk
Of the 58% who would consider replacing tape libraries, 80% believe they'll do so within 24 months
... and in a big way
50TB to 99TB is the mean capacity migrated to disk by early disk-based backup adopters
26% of disk-based backup adopters have migrated more than 100TB
The majority of users say that 40% of their current tape-based capacity will be migrated to disk three years from now
Source: ESG Research, March 2005
ESG's 3DR concept describes three levels of data protection. 1DR is "data recovery," 2DR is "disaster recovery" and 3DR is "doomsday recovery" (see "Storage Bin: Introducing 3DR," Storage, June 2006)
The 3DR model is built on two basic premises: All recoveries should be from disk, and tape should be reserved for long-term and offsite archival. Let's take a closer look at each of the three DR levels.
1DR: This is the primary data protection tier. In an ideal world, this tier is 100% disk based, although it's likely to be a blend of both disk and tape. Over time, we expect (and advise) organizations to create an all-disk-based 1DR tier and to repurpose existing tape resources for long-term or offsite archival purposes (see "Tape's vanishing act," this page).
The 1DR tier is all about recovery. It's about making sure that the data protection environment meets defined RPOs and RTOs for data over its changing lifecycle. It's also about improving backup performance and reducing management issues. Typical disk targets for this tier include virtual tape libraries (VTLs), "dumb" disk, NAS, content-addressed storage, midrange arrays and possibly even high-end disk systems.
2DR: This disaster recovery tier is also disk based. Its objective is to make a replica of 1DR data and keep it offsite for disaster recovery purposes. This replica of the data could be created using expensive and complex storage system-based, remote-mirroring solutions; less-expensive, host-based replication solutions; or even some type of disk-to-disk-to-disk scenario in which data is backed up to disk (an array, NAS device or VTL) and then moved offsite over the WAN to another disk system.
3DR: While 1DR and 2DR are the disk-based tiers in this scheme, 3DR is tape based. This is the end of the line, so to speak. The idea is to keep a finite amount of tape copies for the worst-case scenario--when your 1DR and 2DR copies are destroyed or otherwise made unavailable.
Build a data protection ecosystem
The next step is to build a data protection ecosystem, or continuum, that spans the 3DR framework. This is achieved by creating classes of data based on the value of the data to the organization at any given point in time and then creating policies for these data types based on the age of the data (or its "currency"), its frequency of access and so forth. Once you've done that, you can match tools (hardware and software) and services accordingly.
Classifying data is a difficult, but necessary, task. It's one that will become increasingly important over time as data volumes increase, corporate governance and regulatory compliance guidelines become stricter, and organizations look to differentiate themselves from their competitors by fully leveraging the data they generate.
Over the past year or so, a new category of management tools has emerged which, among other things, can help companies categorize their data (usually at creation) into information groups to which policies and rules can then be applied. ESG refers to this category as "intelligent information management." The key point is to make sure the right data protection tools are applied to the right data at the right time.
This type of data categorization or classification can have several important benefits, and can help organizations achieve the following:
- Better meet service-level agreements by making sure RPO and RTO objectives are not only aligned with the criticality (or value) of the data at hand, but are also achievable.
- Keep data protection costs down by ensuring that critical and non-mission-critical data isn't treated equally.
- Avoid or minimize potentially stiff regulatory penalties if data isn't recoverable when it needs to be.
- In general, provide insight into data repositories, which could lead to new business strategies, etc., that go beyond data protection.
Build as you grow
In an ideal world, organizations would implement all three of these disaster recovery levels from the get-go, but that's generally not practical from an end-user standpoint. There are many considerations, including budgeting and investment protection of existing tape-based infrastructures.
The good thing is that because 3DR is an adaptive and flexible framework, organizations don't have to rip out existing tape-based infrastructures to realize immediate benefits. In fact, many 1DR technologies, such as VTLs, complement tape-based infrastructures nicely. They leverage existing backup and recovery applications and, in many cases, backup processes. Of course, this assumes the organization has categorized its data into data classes or information groups as described earlier. VTLs and other disk-based backup technologies are great backup targets for many, but not all, and they're becoming more affordable options thanks to data deduplication, which can reduce the backup capacity footprint (and disk requirements) significantly.
As an example of this build-as-you-grow strategy, we've seen organizations insert 1DR technologies like VTLs into their existing backup environments and then change their backup policies so they're backing up daily to disk and monthly to tape, rather than daily to tape. The change is nondisruptive and has immediate benefits, including better backup performance, which means fewer backup window-induced headaches and faster, more granular recoveries (i.e., improved RTO and RPO).
A good next step would be to add a 2DR component--that is, some type of remote replication capability for disaster recovery purposes.
Drive additional value
Once you've introduced some type of 1DR component and, possibly, a 2DR component, the next step is to drive additional value from your data protection environment by layering in technologies such as CDP, data deduplication, snapshot, compression, etc. The idea is to leverage these technologies to achieve specific recovery, capacity or security objectives. For example, if your objective is to improve RTO and RPO, you could introduce some type of CDP or snapshot capability. If you want to reduce capacity requirements, you could try data deduplication (you should also consider data deduplication during the evaluation process of 1DR technologies), compression and so forth. If your objective is to add a disaster recovery component, you should consider remote mirroring or replication.
Data deduplication, with its ability to reduce the backup capacity footprint by up to 25 times or more, can have a significant impact on both 1DR and 2DR processes. These include:
- Lower backup-related costs. Simply put, more organizations are likely to implement disk-based products if they're more affordable. In fact, ESG's research shows that the cost of a disk-based solution is the No. 1 reason organizations don't implement these types of products. In the case of 1DR, data deduplication can lower total disk costs significantly; in some cases, to less than that of a similarly sized tape library.
- Longer retention periods. Organizations can keep data on 1DR technologies longer. Data deduplication frees up lots of disk space, which can be used to protect other data types that are still backed up to tape or to enable longer retention periods for 1DR data.
- Reduced WAN traffic. Less data to back up means less data to be moved over the WAN during the 2DR process. For some organizations, data deduplication may be the difference between replicating (or not replicating) data for disaster recovery purposes. For those firms already replicating data, it means a significant reduction in WAN traffic and WAN-related costs.
Thirty-one percent of organizations say they'll experience significant revenue loss or another adverse business impact within one hour or less of application downtime. When you factor in traditional tape-based backup and recovery processes with RPOs averaging 12 hours or longer, and RTOs ranging from four to 24 hours, it's no wonder organizations are rethinking their data protection strategies.
If you've permanently replaced your tape-based infrastructures with some type of 1DR technology, kudos. If you haven't, now is the time. Start big or start small, it doesn't matter. Just do something. Thinking in terms of 3DR will help you get your feet wet.