Primary storage data reduction advancing via data deduplication, compression

Primary storage data reduction is becoming more practical as hardware to support performance-impacting data deduplication and compression techniques matures.

While not as hot as data deduplication for backup, primary storage data reduction, which includes data deduplication and data compression techniques, is getting warmer thanks to a scattering of products that try to shrink the data footprint on tier 1 disk.

The companies with offerings in this space are taking a variety of approaches to address the problem. For instance, one primary storage data-reduction approach searches for duplicates at the file level, while others are more granular, comparing data blocks or byte streams, of fixed or variable sizes. Some work post-process, storing the data writes before starting the data deduplication process. And one compression specialist operates inline, in the data path.

NetApp Inc. deduplication, which operates at the block level, is the most prominent of the offerings taking aim at primary storage. The company claims that more than 8,000 customers have licensed its free deduplication technology since its 2007 release.

Rival EMC Corp. followed NetApp into primary storage deduplication in early 2009 with the release of its Celerra Data Deduplication, which actually performs compression before tackling deduplication on file-based data.

Ocarina Networks Inc. also does both compression and deduplication but takes a different path than EMC. Ocarina's ECOsystem first extracts and decompresses file-based data, then deduplicates on a variable- or sliding-block basis before compressing it.

Finishing up the list of current entrants in this space is Storwize Inc., which has a compression-only offering. Storwize CEO Ed Walsh contends that primary storage is not the appropriate place for data deduplication.“ (Storwize was acquired by IBM in July 2010.)

"You dedupe what's repetitive, and you don't find in primary data the same repetition that you see in a backup data flow," said Walsh, who was formerly the CEO of Avamar, a deduplication backup software vendor acquired by EMC in 2006.

More primary storage deduplication products on the horizon

Yet some industry analysts expect more vendors to turn their attention to primary storage deduplication. Permabit Technologies Corp., for instance, offers inline, sub-file level deduplication. Permabit targets its dedupe at archiving but claims some customers use it for primary storage. Sun Microsystems Inc. recently added built-in deduplication to its ZFS file system. Other vendors that employ the open-source ZFS technology are likely to exploit it.

"Vendors that have solutions today in the market may not be the ones you'll see in five years," said Lauren Whitehouse, a senior analyst at Enterprise Strategy Group. "It's not that they're going to go away, but I don't think they'll even be the top ones. It might be the application vendor or the operating system vendor, someone closer to the creation of data, the storage of that data, policies around that data."

Valdis Filks, a research director for storage technologies and strategies at Gartner Inc., said he expects two or three more vendors to offer deduplication for primary storage in 2010, with more to follow in 2011. By 2012, primary storage dedupe will be ubiquitous, he predicted.

"Sometimes we say a technology turns the industry upside down or on its head. Allegorically and technically, dedupe on primary storage does that," Filks said. "We are so used to writing the data to a backup dedupe device and deduping it there. If everything is deduped at source, I expect the back-end dedupe vendors to start to have lots of trouble, and they will obviously have a marketing offensive saying primary deduplication is the wrong place."

Filks said software-intensive, modern-design storage devices with a file system and intelligent block-based architectures, which have the ability to store metadata pertaining to each data block, will be best suited to primary storage data deduplication. Performance issues can be overcome through a combination of multi-core high-speed processors, low-cost DRAM for cache and solid-state drive technology, he added.

"Designers have more performance-accelerating components in storage than they have ever had before, at an affordable price," Filks said.

In the meantime, the majority of end users have been content to hold off on primary storage data reduction.

"People are OK just buying more disk drives," said Greg Schulz, founder and senior analyst at StorageIO Group. "People understand and realize that they can go in and archive, pull the data out of databases, out of email, out of file systems, and then back it up onto a deduped disk or onto a compressed tape."

How primary storage deduplication products work

Understanding how the current crop of primary storage data-reduction products works and where each of their sweet spots lies can help an IT organization to decide if the technology might be a good fit to help curb the explosive growth of storage.

"Vendors doing sub-file reduction have a much higher hurdle to get over because they have to demonstrate that they can do that with very little performance impact in primary storage use cases," said Jeff Boles, a senior analyst and director, validation services at Taneja Group.

Dig Deeper on All-flash arrays

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.