This article can also be found in the Premium Editorial Download "Storage magazine: Primary storage dishes up dedupe."
Download it now to read this article plus other related content.
NetApp Inc. NetApp was the first primary data storage vendor to offer deduplication, which leverages the company's existing write anywhere file layout (WAFL) file system technology. The WAFL file system already computes a CRC checksum for each block of data it stores, and has block-based pointers integrated into the file system. (It's the secret behind NetApp's ability to have hundreds of snapshots without any performance degradation.) An optional process that runs during times of low activity examines all checksums; if two checksums match, the filer does a block-level comparison of those blocks. If the comparison shows a complete match, one of the blocks is replaced with a WAFL pointer. The result is sub-file-level deduplication without a significant impact on performance. NetApp's deduplication system has been tested by many users against multiple data types, including home directories, databases and virtual images, and most users have reported positive results in both reduction percentages and performance. As of this writing, NetApp uses only deduplication and doesn't do compression.
Nexenta Systems Inc. Nexenta uses the Oracle Solaris ZFS file system in its NexentaStor family of storage system software products that are based on the open source OpenSolaris platform; however, the firm has added more than 30 additional features to its ZFS-based offering that are only available from Nexenta. Examples of these features include an integrated management
Ocarina Networks. Ocarina takes a very different approach to data reduction than many other vendors. Where most vendors apply compression and deduplication without any knowledge of the data, Ocarina has hundreds of different compression and deduplication algorithms that it uses depending on the specific type of data. For example, the company uses completely different techniques to compress images and Word documents. It also understands encapsulation systems such as the Digital Imaging and Communications in Medicine (DICOM) system. Ocarina will actually disassemble a DICOM container, examine and deduplicate the various components, and then reassemble the container. As a result, Ocarina can often achieve much greater compression and deduplication rates than other vendors can realize with the same data types.
Ocarina isn't a storage vendor; it works with existing data storage system vendors that will allow Ocarina to interface with their systems. Ocarina is currently partnering with BlueArc Corp., EMC, Hewlett-Packard, Hitachi Data Systems and Isilon Systems Inc.
Oracle-Sun. Oracle's Solaris ZFS file system also has sub-file-level data deduplication built into it. As of this writing, there's not much available information about how well it duplicates data or its performance in user production environments. However, the ZFS website does state that there shouldn't be a significant difference in performance between deduplicated and native data, as long as the hash table used for deduplication can fit into memory.
New and growing fast
A little over a year ago, there were virtually no viable options for reducing data in primary storage. Now there are half a dozen or so, with more on the way. Given the runaway growth in file storage that most companies are experiencing, it shouldn't take long for data reduction technologies to find their way into many of the products offered by data storage systems vendors.
BIO: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert. Curtis has worked extensively with data deduplication and other data reduction systems.
This was first published in April 2010