Data reduction has become a popular feature for disk-based backup systems. As storage continues to grow, data reduction products are also emerging that deduplicate data on primary and nearline storage devices outside the backup environment.
It is still a new market, but there are some compelling considerations in favor of the use of data reduction beyond the backup infrastructure. While new product offerings will take some time to gain users' trust, users are also reporting that overall storage growth continues despite the use of secondary storage data reduction tools.
Meanwhile, the widespread adoption of tiered storage has created ever more specialized roles for the many storage systems in the production environment, many of which require different tools than those used for backup data. Accordingly, the already announced data reduction tools for active data take varying approaches depending on the type of data sets targeted.
Data Domain Inc. -- DD Series: Data Domain's data deduplication devices were originally positioned for disk-based backup, but last September, the company made some tweaks to its operating system with Data Domain OS version 4.3. The update improved how Data Domain's DD series boxes handled small files as opposed to the large ones typical of backup, and Data Domain has since positioned its boxes for both disk-based backup and nearline NAS storage. This June, Data Domain added file-locking software that prevents files stored on Data Domain data deduplication devices from being deleted or altered for a set period for those using its devices for compliance file archiving.
Hifn Inc. -- Express DR 250 and Express DR 225: Hifn has developed a set of chip boards, announced in March, that will perform data deduplication, compression and encryption processing with the goal of eliminating some of the performance issues associated with software-based approaches in storage devices.
Hifn will use a chip that provides network-based encryption and compression with two chip boards. The Express DR 250 board is an update to a product that has already been on the market, while the DR 255 offers a faster four-lane PCIe interface. A third card based on new silicon that the company has said will perform at 1.6 GBps is due out by year's end.
Hifn has many OEMs for its data compression and encryption cards. The new cards could be used to offload inline data deduplication processing to hardware in order to support more performance-sensitive primary storage data sets.
NEC Corp. -- HydraStor: Positioned for both backup and archive, HydraStor takes a grid approach to secondary storage. HydraStor isn't marketed strictly for nearline data, but archiving support pulls it out of the strict backup category. "They don't market it this way," said Gartner analyst Dave Russell, "but there's nothing technically stopping you from using HydraStor for nearline data."
Microsoft -- Single Instance Storage: Microsoft has offered file-level single instancing for storage for both file and email systems at least as far back as the Windows 2000 operating system. Single instancing for files has also been a part of Microsoft's Exchange email server, Windows XP and Windows Vista, as well as the Windows Storage Server 2003 R2 release. Most archiving software offerings also include this type of file-level single instancing.
NetApp -- NetApp Deduplication: NetApp used its WAFL file system to offer the first subfile level data deduplication feature for a primary storage array last May. The feature is a free utility that comes as an option with a NearStore license. Within volumes on a nearstore system (or in NetApp terms, flexible volumes), NetApp Deduplication requires a window each day to conduct its post-process data reduction. In July, NetApp added the capability for its V-Series storage gateways and claims it can now deduplicate production data on its rivals' storage systems.
Ocarina Networks -- ECO System: Ocarina came out in April with an appliance that it claims can reduce file size on primary NAS storage, as well as the size of files that have already been compressed using industry-standard algorithms, such as .jpg photos.
The Ocarina system consists of two components. The Ocarina Optimizer is a 1U appliance with 16 processors that moves into the data path to crawl files on an existing NAS storage system, then moves back out again to process them using Ocarina's data reduction algorithms. The Ocarina Reader is a software agent used by workstations and servers to view the compressed files.
Ocarina is focusing on compressing large multimedia files for the oil and gas, medical imaging and media, and entertainment industries. A product update last month added data migration features, snapshots and support for virtual global namespaces. Ocarina also announced new partnerships with HP, Isilon, Ibrix and BlueArc.
Riverbed Technology Inc. -- Atlas: Atlas won't actually hit the market until 2009, but Riverbed announced plans last month to apply the data deduplication algorithms it offers in its Steelhead WAN optimization products to primary data. Atlas will originally support CIFS, but will eventually work with all file data and then extend to nonfile data via iSCSI two or three years down the road.
Storwize Inc. -- STN-6000: Storwize's primary storage data reduction appliance reportedly comes close to line speeds with its inline data reduction process through proprietary modifications to standard LZ compression algorithms. As such, it doesn't offer the most extreme data reduction ratios. "For nondatabase data that seldom changes, but needs to remain online, Ocarina would be a better fit; they trade off a higher compression ratio for less performance," said StorageIO Group founder and analyst Greg Schulz. Storwize makes the opposite tradeoff, offering up to 15:1 compression ratios and, the company claims up to 600 MBps throughput. Storwize claims to have 100 customers so far.
More primary storage data reduction players to come
A year from now you can expect to see a half dozen more solutions on the market. "I know of a couple more players still in stealth, and bigger companies are looking at this, they claim, in the shorter rather than longer term," Russell said. "A couple of years from now I would expect broad-scale adoption, at least for some of the systems." Also on the horizon for this space is support for data deduplication on block-based rather than file-based systems.
Meanwhile, IBM has indicated that it will spread its data deduplication IP acquired with Diligent Technologies around as broadly as possible in its product line, though specific roadmap plans have not been divulged by the company. EMC CEO Joe Tucci has promised features, including data deduplication and drive spindown, across all EMC product lines, which has begun with EMC's disk libraries.