Should you go with a software-based approach that allows for policy-based deduplication or a hardware-based approach because it can be implemented quickly and easily?
The impact that data growth is having on backup windows is driving more organizations to implement disk-to-disk backup. This has created tremendous interest in data deduplication because the capacity optimization resulting from deduplication means that data can be retained longer on disk, which increases the likelihood of a disk-based recovery vs. a slower, manual, tape-based recovery.
While deduplication has been a feature of several backup offerings for years, the technology has been most widely adopted in backup hardware, such as virtual tape libraries (VTLs) and network-attached storage (NAS)-based disk targets. Meanwhile, deduplication implementations in backup software require organizations to switch out legacy solutions, which the hardware-based deduplication vendors have made sure to point out isn't always a desirable path. Now that mainstream backup software vendors such as CommVault, EMC Corp., IBM Corp. and Symantec Corp. are incorporating data deduplication into their backup products (reducing the amount of disruption caused by implementing deduplication), the question is being asked again: Where does deduplication belong in backup?
Software-based approaches are differentiated in a few ways. First, they have knowledge
Second, integration with the backup software allows for policy-based deduplication. Deduplication can be disabled for selected data sets where it doesn't make sense to turn it on (such as an MRI image) or for other data types (like databases) where you don't want to interfere with performance.
One of the drawbacks of a software-based approach is that adopting a deduplication feature could require an upgrade in backup application and/or client agents. Another factor is that deduplication may be processor-intensive and, when performed at the source application server, it may compete with and slow down apps. The scalability and performance of the media server performing deduplication could also be limiting factors. It will be important to investigate the upper limits of deduplication "pools" and performance capabilities for large volumes of data.
This was first published in March 2009