Primary storage deduplication has yet to make its mark in the way that capacity-reducing technology has with backups, but all signs point to it picking up steam. Startups and long-established vendors will continue to build on the trends that took shape during the past year with appliances, traditional disk arrays and new solid-state systems.
Nimbus Data Systems Inc., Pure Storage Inc. and SolidFire Inc. offer their own inline deduplication in all-flash enterprise arrays, as does NexGen Storage Inc. in its flash/disk array. Solid-state drive (SSD) vendors are also testing the waters, because dedupe can help IT shops better utilize and preserve their expensive NAND flash storage, and flash can lessen the performance hit associated with deduplication.
Another new twist is the introduction of a primary dedupe appliance from balesio AG, which previously sold only software. Using dedupe and compression, balesio’s FMA-4800 Series appliance shrinks unstructured data in its native format, enabling the optimized data to pass to backup, archive and other systems without need for the CPU-intensive process of rehydration, in which the system expands data back to its full size.
But the more significant emerging trend is the prospect of major vendors such as Dell Inc. and Hewlett-Packard Co. eventually expanding the use of data deduplication throughout the data center, from clients and servers to primary storage and backups.
“Deduplication eventually is likely to exist in many places: applications, OS [operating system], file systems, in the network, in primary storage devices, and in backups and archiving software and hardware,” said David Russell, a research vice president (VP) in storage technologies and strategies at Stamford, Conn.-based Gartner Inc.
“While scale and performance can be issues today with moving the data reduction upstream to the server,” Russell observed, “there are also potentially greater savings in resources that can be achieved the earlier and further upstream the data is reduced.”
Passing deduplicated data from one system to another without having to rehydrate or re-inflate it can be an advantage because the shrunken data consumes fewer network resources and takes up less space on the target device. Even so, the scenario is implausible unless an IT shop uses the same proprietary deduplication technology across its data center.
If a user deploys one vendor’s dedupe technology at the server and a different vendor’s dedupe technology for primary storage, the system would need to expend the time and resources to rehydrate the data to its original size before transferring it to another system.
“You have to have some sort of deduplication standards and APIs that allow people to hand off deduplicated data between solutions. We’re not anywhere near that -- not even close,” said Brian Babineau, VP, research and analyst services at Milford, Mass.-based Enterprise Strategy Group.
Latency remains barrier to using dedupe with primary storage
The main technical challenge in using dedupe with primary storage is the latency associated with reads and writes. Marc Staimer, president at Dragon Slayer Consulting in Beaverton, Ore., said one way to address the problem is through a silicon-based dedupe implementation, similar to the approach with a TCP/IP offload engine. He claimed Hitachi Data Systems (HDS) Corp., for instance, is working on one with its BlueArc NAS platform. BlueArc signed an OEM deal to use Permabit Technology Corp.’s Albireo dedupe technology before it was acquired by HDS.
“When the users aren't impacted, then I feel better about it,” Staimer said, noting he’s not a huge fan of primary dedupe as it exists today. “That’s why I like it when it’s on solid-state and in silicon, because it’s zero impact on the users.”
Craig Nunes, director of marketing for HP StorageWorks, said that while solid-state may be part of the formula to address the latency issue, the ideal approach won’t require flash because customers don’t want to increase their cost per gigabyte for primary storage.
“You have to have a smarter approach to deduplication than just throwing an expensive tier at it,” Nunes said. “The alternatives for primary storage deduplication are thin technologies that are proven in the marketplace, offered by 20-plus vendors. [Deduplication] has got to really come out ahead of that from a cost perspective to pay off.”
To date, HP has delivered its StoreOnce deduplication in its Data Protector backup application and its new B6200 scale-out disk backup system. Nunes declined to specify a timeline for porting the StoreOnce technology to clients, application servers and primary storage.
“The aim is to go as fast as we can to address the technical challenges,” he said.
We think of dedupe like RAID in the '80s. It’s kind of cool, and everybody talks about it today, but 10 years from now, everybody’s just going to expect it to be in storage devices.
Brett Roscoe, Dell’s general manager of data management solutions
Dell emphasizes importance of data reduction
Dell’s mid-2010 acquisition of data reduction specialist Ocarina Networks Inc. so far has produced the policy-based DX6000G Storage Compression Node for its Object Storage Platform, which largely handles large content stores, archives and secondary storage. Dell has also outlined plans to integrate both deduplication and compression into its Compellent and EqualLogic primary storage lines, and may license advanced compression for special vertical markets, according to Brett Roscoe, Dell’s general manager of data management solutions.
“We think of dedupe like RAID in the '80s. It’s kind of cool, and everybody talks about it today, but 10 years from now, everybody’s just going to expect it to be in storage devices,” Roscoe said. He declined to disclose the timeframe for Dell’s dedupe/compression expansion other than to say that announcements are imminent.
Dell eventually expects to offer Ocarina technology in backup-to-disk devices, too. Roscoe said the company plans to work with partners, such as CommVault Systems Inc. and Symantec Corp., to improve the way their backup products ingest data to eliminate the need for rehydration before data moves from one system to another.
NetApp, EMC, Oracle early primary storage deduplication leaders
In the meantime, the main suppliers of primary storage deduplication are EMC Corp., NetApp Inc. and Oracle Corp. (through its Sun Microsystems Inc. acquisition). The Sun-designed, open-source ZFS file system provides deduplication for vendors such as GreenBytes Inc. and Nexenta Systems Inc.
NetApp pioneered the use of dedupe in primary storage with the 2007 release of a free add-on for its FAS and NearStore customers. This year, the company added data compression for primary storage and expanded the 16 TB volume limit for deduplication. The recent release of NetApp’s Data Ontap 8.1 operating system supports 100 TB volumes in the FAS and V-Series systems.
Larry Freeman, a senior technologist at NetApp, said approximately 35% of the vendor’s customers turn on deduplication, up from about 25% a year ago. He said the predominant use case remains virtualized servers and desktops, although he’s also noted a slight expansion into the areas of Microsoft Exchange and SharePoint.
Freeman said the jury’s still out on the degree to which solid-state technology might improve performance because NetApp does deduplication in the background. NetApp offers deduplication for both file- and block-based applications, and can keep data in deduplicated form with its SnapVault and SnapProtect backup products.
EMC uses different dedupe technologies with its Data Domain and Avamar backup products than it does with its primary storage systems. With this year’s rollout of its VNX and VNXe unified storage systems, EMC offered up block compression and file deduplication and compression as a free feature with every system shipped. EMC offered the file-based technology three years ago with its Celerra NS Series, and last year added block-based data reduction to its Clariion CX4 and Celerra NS lines.
Jon Siegal, senior director of product marketing in EMC's unified storage division, said plans include a display to show compression-related capacity savings, further optimization of block compression performance and a deeper compression algorithm for file-level data.
IBM offers dedupe through its OEM deal with NetApp, but the company’s own technologists focus on compression. Late last year, IBM released its Real-Time Compression Appliance based on technology acquired from Storwize Inc. IBM is also looking to embed the technology in its midrange and enterprise systems for file and block storage, according to Dan Galvan, VP of storage systems marketing and strategy.
But Gartner’s Russell doesn’t think vendors will be able to separate appliances in the long run. “People want this to be supported by a big vendor and delivered as a feature inside of their equipment,” he said.