Data deduplication handbo
How to evaluate hardware-based data deduplication products
Hardware-based data deduplication products can relieve the processing burden associated with software-based
data deduplication products. Data deduplication features are also incorporated into other data protection hardware, such as backup platforms, virtual tape library (VTL) systems and even general-purpose storage systems like network attached storage (NAS). This approach generally doesn't focus on shortening backup windows or recovery objectives, but users can typically achieve superior compression levels, making the most of their available storage.
In-band vs. out-of-band data deduplication
Data deduplication can be performed in-band or out-of-band. In-band deduplication reduces data while it's being written to storage. In-band deduplication can be efficient because it is performed only once, though the additional processing power needed to handle the process may actually extend the backup window.
Out-of-band deduplication is performed after data has been stored. This approach does not affect the backup window and alleviates concerns about CPU processing creating a bottleneck between the backup server and the storage. However, out-of-band deduplication uses slightly more disk space during the data deduplication process. Also, out-of-band deduplication may take longer than the actual backup window. Disk contention is another problem, reducing disk performance as users attempt to access storage during the deduplication process.
Data deduplication hardware pros and cons
Where software-based deduplication focuses on eliminating redundancy at the source, hardware-based deduplication emphasizes data reduction at the storage system itself. Hardware-based deduplication does not offer the bandwidth savings that it might receive by deduplicating at the source, but compression levels are often better and hardware-based data deduplication products require less maintenance.
Hardware data deduplication appliances are noted for their high performance, scalability and relatively nondisruptive deployment. Backup software will normally see dedicated appliances as a generic "disk system" and remain totally unaware of the deduplication processes that are taking place under the covers. Small businesses or remote offices will often avoid appliances because they cost more than deduplication implemented in software, but they are ideal for enterprise-class deployments.
Hardware-based deduplication may also be incorporated into other storage (target) platforms. For example, data deduplication is often a feature of VTL systems. VTLs speed backup tasks by utilizing disk rather than tape for storage, and adding deduplication allows the VTL to maximize disk usage. In many cases, VTL deduplication is implemented as an out-of-band process. This is an advantage because all of the VTL's contents can be deduplicated to achieve very good compression ratios. The down side is that data deduplication is not immediate. However, some VTLs do incorporate the processing power to deduplicate backup data in-band as data is received from the backup server.
Popular data deduplication hardware products
Data Domain Inc. touts one of the most diverse product lines intended for VTL and NAS systems. Appliances range from the branch office DD410 to the enterprise-class DDX series. All deduplication is performed in-band using the SHA-1 algorithm along with a second proprietary algorithm to prevent hash collisions. The index itself is maintained on nonvolatile RAM within the appliance. Data Domain appliances are also relatively slow, offering a throughput of only 110 MBps, but the company claims that it is working to improve those data rates through clustering.
The enterprise-class ProtecTier VTL from Diligent Technologies Corp. also performs in-band deduplication using a single proprietary algorithm. The index is then stored on a Fibre Channel disk that can potentially improve indexing performance. The results are shown in Diligent's performance numbers that achieve up to 400 MBps. Similarly, the DXi3500 and DXi5500 appliances from Quantum Corp. perform in-band indexing and data deduplication using a patented algorithm that has also been added to Quantum's StorNext file system. By comparison, the Single Instance Repository (SIR) on FalconStor Software Corp.'s VTLs use out-of-band indexing with SHA-1 and MD5 algorithms.
For backup appliances, ExaGrid Systems Inc. includes an out-of-band deduplication feature with its NAS backup appliance. ExaGrid works with bytes rather than bits, so the indexing is simpler, leading to faster search performance. ExaGrid also examines the common data patterns in backup software products, aiding search and indexing performance. The HydraStor grid backup appliance from NEC Corp. of America uses a proprietary process to deduplicate data at the subfile level. NEC claims a 75% reduction in storage utilization without impacting storage performance.
Network Appliance Inc. (NetApp) performs block-level data deduplication in its NearStore R200 and FAS storage systems. Deduplication is based on NetApp's Advanced Single Instance Storage (ASIS) feature that uses 16-bit checksums already stored with each data block to look for redundancy candidates. Those blocks are then compared at the bit level, and identical blocks are discarded. NetApp's storage systems will deduplicate primary storage.
Check out the entire Data Deduplication Handbook
This was first published in November 2007