What you will learn in this tip: Choosing the right primary storage deduplication product can be tough. Here's a look at some popular vendors that offer primary storage dedupe products.
In my last column, you learned if you're considering implementing data reduction technologies in primary data storage, in general, compared to deploying data deduplication in a backup environment, the job is harder and the rewards are fewer. That's not to suggest you shouldn't consider primary storage data reduction technologies, but rather to help you properly set expectations before making the commitment. The following list will help you learn about what primary storage data reduction products are out there right now, and figure out which one might be a good fit for your company.
The following vendors currently offer primary storage data reduction technologies and products (listed in alphabetic order):
EMC Corp. EMC introduced file-level deduplication and compression of inactive files in its EMC Celerra filer. Administrators can configure various settings, such as how old a file must be before it's a candidate for this process, and what file sizes the process should look for. While deduplication and compression of older data won't generate as much data reduction as compressing or deduplicating everything, EMC customers have reported significant savings using this data reduction implementation.
Exar Corp. Exar gained data deduplication technology with its April 2009 acquisition of Hifn Inc. End users may be unfamiliar with Exar, but they may already be using their products. Many high-end virtual tape libraries (VTLs) and data deduplication systems for backups use Exar hardware compression cards for data compression. Exar now has released a card, designed to be placed into a Windows or Linux server, that will deduplicate data as it's being written to any hard drive. Exar's Hifn BitWackr B1605R is a hardware and software product that offloads data deduplication and compression from a server's CPU and makes adding data reduction to a Windows or Linux server a relatively easy process.
GreenBytes Inc. GreenBytes is in a unique position, as it's the first vendor attempting to make a single product to address the data reduction needs of both data backup and primary data storage in its GB-X Series of network-attached storage (NAS) and storage-area network (SAN) storage devices. The firm uses a hash-based data deduplication technology, but the hash algorithm is different from that used by all other vendors: Instead of the widely used SHA-1, GreenBytes uses Tiger, which it says is more suited to general-purpose processors than SHA-1 and, therefore, offers significant performance advantages while not decreasing data integrity. Tiger's key space (192 bits) is significantly larger than that of SHA-1 (160 bits), which further reduces the chances of a hash collision. GreenBytes is also making extensive use of solid-state disk (SSD) as a cache in front of SATA disk so that it can better meet the performance needs of primary data storage users.
Microsoft Corp. With its Windows Storage Server 2008, Microsoft offers file-level single-instance deduplication built into the operating system. A number of storage systems vendors are taking advantage of the built-in SIS, including Hewlett-Packard's StorageWorks X-series Network Storage Systems and Compellent's Storage Center with NAS. File-level deduplication alone will provide modest space savings for users of these systems.
NetApp Inc. NetApp was the first primary data storage vendor to offer deduplication, which leverages the company's existing write anywhere file layout (WAFL) file system technology. The WAFL file system already computes a CRC checksum for each block of data it stores, and has block-based pointers integrated into the file system. (It's the secret behind NetApp's ability to have hundreds of snapshots without any performance degradation.) An optional process runs during times of low activity examines all checksums; if two checksums match, the filer does a block-level comparison of those blocks. If the comparison shows a complete match, one of the blocks is replaced with a WAFL pointer. The result is sub-file-level deduplication without a significant impact on performance. NetApp's deduplication system has been tested by many users against multiple data types, including home directories, databases and virtual images, and most users have reported positive results in both reduction percentages and performance. As of this writing, NetApp uses only deduplication and doesn't do compression.
Nexenta Systems Inc. Nexenta uses the Oracle Solaris ZFS file system in its NexentaStor family of storage system software products that are based on the open source OpenSolaris platform. However, the firm has added more than 30 additional features to its ZFS-based offering that are only available from Nexenta. Examples of these features include an integrated management console, LDAP integration, continuous data protection (CDP) and synchronous replication. The NexentaStor 3.0 offers deduplicated storage that's fully integrated with Citrix Systems Inc. XenServer, Microsoft Corp. Hyper-V and VMware Inc. VMware vSphere.
Ocarina Networks. Ocarina takes a different approach to data reduction than many other vendors. Where most vendors apply compression and deduplication without any knowledge of the data, Ocarina has hundreds of different compression and deduplication algorithms that it uses depending on the specific type of data. For example, the company uses completely different techniques to compress images and Word documents. It also understands encapsulation systems such as the Digital Imaging and Communications in Medicine (DICOM) system. Ocarina will actually disassemble a DICOM container, examine and deduplicate the various components, and then reassemble the container. As a result, Ocarina can often achieve much greater compression and data deduplication rates than other vendors can realize with the same data types.
Ocarina isn't a data storage vendor; it works with existing data storage system vendors that will allow Ocarina to interface with their systems. Ocarina is currently partnering with BlueArc Corp., EMC, Hewlett-Packard, Hitachi Data Systems and Isilon Systems Inc.
Oracle-Sun. Oracle's Solaris ZFS file system also has sub-file-level data deduplication built into it. As of this writing, there's not much information available about how well it duplicates data or its performance in user production environments. However, the ZFS website does state that there shouldn't be a significant difference in performance between deduplicated and native data, as long as the hash table used for deduplication can fit into memory.
Storwize Inc. Storwize offers an appliance that sits in-band between your NAS (NFS/CIFS) filer and the systems accessing it. It works like the chips in your tape drives that compress data real-time as it's being written and uncompresses it real-time as it's being read. Like the tape drive chip, it doesn't hurt performance because it's compressing the data at the same speed it's arriving at the system. In fact, with certain applications it can even increase performance. One other interesting thing about their system is that files they compress simply appear as compressed files on the filer. Therefore, if your worst fears happened and Storwize disappeared one day, all you would need is enough space to uncompress the files using standard algorithms.
Data reduction is still new and growing fast
A little over a year ago, there were virtually no viable options for reducing data in primary storage. Now there are half a dozen or so, with more on the way. Given the runaway growth in file storage that most companies are experiencing, it shouldn't take long for data reduction technologies to find their way into many of the products offered by data storage systems vendors.
About this author: W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."