This article can also be found in the Premium Editorial Download "Storage magazine: Boosting data storage array performance."
Download it now to read this article plus other related content.
Cut big backups down to size
Data-reduction technologies can slash the amount of data that gets backed up, making disk-based backup a cost-effective alternative.
The world of backup is in a state of flux, although it may not appear to be at ground level. There's more innovation and a greater variety of choices than ever before. The underlying enabler for this torrent of change is the availability of low-cost, high-capacity disk. But it would be misleading to view disk-based backup as a monolithic approach.
If you've recently invested in disk-based backup--or are considering it--you may have experienced sticker shock at the overall cost of moving to disk. The benefits of disk may be apparent, but when your vendor plans a configuration for you, the amount of capacity required may surprise you. Accurately sizing a tape library and planning for growth are important tasks in tape-based architectures, but it's even more critical to get it right for disk.
Traditional backups require a significant multiple of the primary data being protected. A commonly used ratio for tape backup is 10:1, but depending on retention policies and administrative practices, this can grow to upwards of 50:1 in some dysfunctional cases. Now shift to a disk-based backup scenario and consider the impact. Realistically, you'll likely do fewer full backups and more incrementals with disk technology; with traditional compression techniques
Can you afford to buy sixfold capacity for backup? Factoring in data growth rates, this is a huge problem. The power and cooling impact alone, not to mention the equipment cost, could make you reconsider. Without a means to address this issue, hard-dollar total cost of ownership analysis would continue to favor tape, and the justification of disk-based backup purchases would be based largely on risk reduction or improved service rather than on cost savings--a much tougher sell to a cost-conscious CFO.
But if backup data required dramatically less storage than primary data, the value proposition would swing heavily in favor of disk. Data-reduction technology makes that tantalizing possibility a reality. It's currently being deployed in a range of products and providing efficiencies of 10 times or more in storage utilization.
Factoring and commonality
Each time a file is modified and backed up, it's likely that most of the data in that file has been previously backed up. Backup vendors are aware of this, and while products like IBM's Tivoli Storage Manager (TSM) and Symantec/ Veritas' NetBackup have options to enable subfile or block-level incremental backups, these have generally been deployed on a limited basis, largely because of the performance impact of restoring many subfiles from tape.
Other vendors have taken the concept of subfile "deltas" and added commonality factoring techniques that extend it much further. Unfortunately, it seems that each vendor has coined a different term to describe its approach, including de-duplication, commonality factoring, single-instance store, data coalescence, capacity-optimized storage, content-addressed storage (CAS), common file elimination and many others.
This was first published in January 2006