The cost of data protection is regarded by analysts as a large component of annualized storage costs. Since 2005, analysts have noted that roughly half of the disk-based storage infrastructure deployed by companies is used to hold copies of the data stored to the other half. This partly reflected the increased preference for disk, rather than tape, as a repository for data backups. However, recent studies suggest that a large percentage of the data replication happening today is to facilitate the "template cut and pasting" or "on-the-fly re-hosting" of virtual machines in virtual server environments.
In many shops pursuing aggressive server virtualization strategies, existing storage fabrics (SANs) are being replaced by DAS -- storage kits cabled directly to each virtual server host -- in an effort to circumvent the complexities of routing virtual machine (VM) I/O to SAN storage shares. To facilitate the ability of VMs to transition to, or fail over to, other server hosts, their data must be replicated on an ongoing basis to the remote server's DAS array.
The need to replicate a VM's data to the storage of every potential server host has caused a huge acceleration in storage demand, ostensibly for the purposes of high availability and continuous operations. Leading analysts peg the resulting capacity demand growth curve at between 300% and 650% per year. The idea that any sort of reasonable storage efficiency might be realized under these circumstances is absurd.
The simple fact is that not all applications are mission-critical, requiring high availability or "recovery point zero" protection for their associated data assets. Typically, less than 10% of all applications require nonstop continuity strategies. Most applications can tolerate hours, or even days, of access interruption, so their data can be protected adequately by less expensive and less storage-intensive methods.
Tape is still used as the protection media for up to 80% of the world's data. Restoration from tape to disk may be a slow process -- between 1 TB and 2 TB per hour -- but is adequate for the preponderance of applications and their data today. Plus, at roughly 10 cents to 20 cents per TB per tape vs. $100 per TB per SATA disk, it's much more affordable.
Conversely, disk-to-disk mirroring (within a 70 kilometer range for synchronous writes) and disk-to-disk replication (beyond 70 km for asynchronous writes) provides data replication services better suited to critical, always-on applications and their data. These services entail a significant data protection cost for robust networks, duplicate hardware and multiple sites, but they may be justified based on the per-hour outage costs that would accrue to an interruption event.
The trick is to associate the right kind of data protection to the right application workload and data, based on criticality and outage tolerance or cost. An efficient storage practice must devote some attention to this requirement.
The three keys to proper data protection service allocation are straightforward:
1. Know your applications and data characteristics. Applications derive their criticality from the business process they serve. Data derives its criticality from the applications it serves. You need to do the heavy lifting of business impact analysis to divine the criticality of apps and data so you can determine what kind of protection the data requires.
2. You should look for some means to classify a data protection service so it can be applied efficiently to specific application workloads. You need to understand the protection afforded by each service in terms of data access restoral timeframes. Some use metrics such as recovery time objective and recovery point objective to characterize the fit of each data protection service, but a simpler metric is time-to-data -- the aggregate of how long it will take to recover data, re-host applications and reconnect access networks.
3. Find a mechanism that will associate a data protection service with a workload. There are many ways to do this. If you're storing a copy of data to a virtual tape library for fast restore of individual files that may be corrupted or deleted in the primary storage environment, you have a location where services can be applied to data. Tributary Systems has done some groundbreaking work on this model. On the other hand, if you virtualize your storage, you can present virtual volumes to applications for use in storing data that's already fitted with associated data protection services. The data from critical applications can be sent to virtual volumes that provide always-on synchronous replication, while data associated with less critical apps may go to volumes that leverage tape backup.
The bottom line is that the safety copies of data are just as important to provision and manage as the original data. Given that 50% or more of your storage infrastructure is used to host data copies, it makes sense to review what you have stored and to ensure that all data is receiving the protection services it merits.