This article can also be found in the Premium Editorial Download "Storage magazine: What you need to know about data dedupe tools for backup."
Download it now to read this article plus other related content.
When source deduplication approaches gained traction, the key benefits touted were the end-to-end efficiency of backing up closer to the data source (content-awareness, network bandwidth savings and faster backups) and distributing deduplication processing across the environment (vs. having the proverbial four-lane highway hit the one-lane bridge downstream at the target deduplication system). These two themes are evident in HP’s StoreOnce deduplication strategy and EMC Data Domain’s Boost approach.
While HP Data Protector software doesn’t have deduplication built into its backup architecture today, users can benefit from HP’s StoreOnce deduplication strategy. StoreOnce is a modular component that runs as a service in a file system. It can be integrated with HP Data Protector backup software and HP’s scale-out file system or embedded in HP infrastructure components. The StoreOnce algorithm involves two steps: sampling large data sequences (approximately 10 MB) to determine the likelihood of duplicates and routing them to the best node for deduplication, and then doing a hash and compare on smaller chunks. HP’s dedupe strategy is differentiated because it’s portable, scalable and global. The implication is that dedupe deployments can extend across a LAN or WAN and among storage systems without flip-flopping data between rehydrated and deduplicated states.
EMC Data Domain’s Boost option enables Data Domain to perform deduplication pre-processing earlier in the backup flow with NetBackup, Backup Exec, EMC Avamar or EMC NetWorker. A Data Domain software component is installed on the backup server or application client. The tasks performed there help improve deduplication performance by distributing the workload while introducing network efficiency between the backup server or application client and the Data Domain system.
What’s in store for deduplication?
Disk-based data protection addresses backup window issues and deduplication addresses the cost of disk used in backup configurations. But new capture techniques, such as array-based snapshots, are emerging to meet high-performance requirements for those organizations with little or no backup window and minimal downtime tolerance. In many cases, block-level incremental capture and deduplication are baked into snapshot products. NetApp’s Integrated Data Protection products (SnapMirror, SnapProtect and SnapVault), coupled with NetApp FAS-based deduplication, eliminate the need for deduplication in backup software or target deduplication systems.
Similarly, Actifio VirtualData Pipeline (VDP) takes a full image-level backup and continuous block-level incrementals thereafter, and deduplicates and compresses the data so a third-party data reduction application isn’t needed. Nimble Storage takes a similar approach. It combines primary and secondary storage in a single solution, leverages snapshot- and replication-style data protection, and employs capacity optimization techniques to reduce the footprint of backup data. These approaches undermine traditional-style backup and, therefore, traditional deduplication techniques.
BIO: Lauren Whitehouse is a senior analyst focusing on data protection software and systems at Enterprise Strategy Group, Milford, Mass.
This was first published in August 2011