This article can also be found in the Premium Editorial Download "Storage magazine: Top features in data backup applications."
Download it now to read this article plus other related content.
Data deduplication: Disk backup game changer
It's hard to overemphasize the importance of data deduplication in today's backup systems. It's perhaps the biggest game changer since the introduction of network backup systems 15 years ago, and its popularity can be traced to a number of factors. First, data deduplication enables users to increase disk utilization in their backup system. Tape had always been significantly cheaper than disk as a target for backups, and while the cost of disk has decreased significantly in the last several years, so has the cost of tape. So disk was typically used just as a staging mechanism for tape, rather than for long-term backup or archive storage.
Deduplication changed that forever. The random-access capabilities of disk allow data deduplication systems to remove redundant segments of data and replace them with pointers without significantly affecting restore performance. (While there's some performance degradation, restores are still much faster than when using tape.)
Despite dedupe's indisputable benefits, a lot of users waited to see if the techniques employed in target dedupe devices would eventually make their way into backup software, making such special-purpose appliances unnecessary. While most experts don't believe that target deduplication appliances are no longer necessary, data deduplication has, indeed, made its way into mainstream backup software products.
EMC and Symantec were the first major backup software
EMC and Symantec both offer source deduplication products. That is, you can install the Avamar or PureDisk agent on a computer and the client will communicate with the backup server to identify and eliminate redundant data before it's transferred across the network. Only new bytes are sent with each backup, which makes source deduplication perfect for smaller remote offices and mobile data.
Both vendors offer their source deduplication products as standalone products, which means you don't have to purchase Symantec's NetBackup or EMC's NetWorker. So even if you weren't using Symantec or EMC backup apps, you could take advantage of their deduplication technology. But if you wanted the functionality of both the backup app and dedupe, you had to purchase and manage two products (i.e., NetBackup and PureDisk, or NetWorker and Avamar). Symantec is the first to change this with NetBackup 7, which has built-in source dedupe that doesn't require a separate PureDisk installation. While you can manage Avamar via NetWorker, and a single install of their client software supports both NetWorker and Avamar backups, Avamar still requires a separate server to back up to.
Target deduplication is also available from backup software vendors. Symantec was the first to do this by allowing NetBackup customers to send standard NetBackup backups to a media server where they would be deduplicated by PureDisk. (With NetBackup 7, this functionality is available without requiring a separate PureDisk installation.)
IBM entered the data deduplication space with the introduction of its post-process target deduplication feature in Tivoli Storage Manager (TSM) 6.1. TSM can natively deduplicate its backups stored on disk after they have completed. IBM's target deduplication offering is unique in that it's included in the base product; however, the deduplication ratios it achieves may be relatively modest compared to those of other products' options that you have to pay for.
This was first published in March 2010