Managing and protecting all enterprise data


Deduplication now focusing on primary storage

IT managers have become as obsessed with reducing the amount of redundant data in their storage as Americans are with reducing their waistlines. But this trend has been focused mainly on secondary storage--backup and archiving apps, where most of the redundant data lives in storage infrastructures.

A handful of vendors are trying to take duplicate data out of primary storage even though there's a lot less redundant data in primary (tier 1) storage than in secondary storage. So data reduction ratios in primary storage will be much lower than the 15:1 or 20:1 ratios common when deduping secondary storage. "But you'll be getting a lot more bang for the buck because tier 1 disk is more expensive," says Eric Burgener, senior analyst and consultant at Taneja Group in Hopkinton, MA.

But as the use of virtualization increases, more and more virtual machines are running on one physical server. This creates multiple instances of OSes and apps, which in turn will increase the level of redundant data on expensive primary storage.

The next question is: When data reduction is performed on primary storage, is it still dedupe or something else (usually compression)? One could claim that, at the file level, Microsoft Office offers some kind of generic dedupe functionality, according to John Matze, VP of business development at Hifn, which makes card-level data reduction accelerators. But "that's a partial dedupe that exists in Microsoft's file system," which he calls "poor man's data deduplication."

"Deduplication is well-suited for static, redundant data, but it's not well-suited for primary storage," says Peter Smails, VP of worldwide marketing at Storwize, which began shipping a primary storage data reduction appliance in 2005.

Ocarina Networks brought out the second major release of its storage optimization product in September. Ocarina's Extract, Correlate and Optimize (ECO) System combines compression, dedupe and more than 100 file-specific information extraction algorithms. But even though ECO makes some use of dedupe, "their optimizer doesn't work like a dedupe engine at all," says Burgener. "And they get some of the highest reduction ratios in primary storage."

According to Carter George, VP of products at Ocarina, primary storage data reduction is all about shrinking the size of files. "The file types driving storage growth are already compressed," he says. "You can't compress the same file twice with generic algorithms."

Ocarina offers what it calls content-aware compression. "It's easy to see the advantage of shrinking a file, but what about performance?" asks the firm's George. Application performance is far more critical in primary storage, since backup and archiving tend not to be performance oriented. George defines primary storage performance as "time to first byte," and says it differs by market and user. "You might be able to take 30 seconds to open Word, but in HPC, 1 [millisecond] latency might be death," he says.

According to Taneja Group's Burgener, anyone trying to figure out if it makes sense to do data reduction in primary storage has to answer two questions. What am I paying in terms of dollars/GB on the primary side? And how much less primary storage will I have to buy over time? "If you're buying EMC and paying $20/GB to $25/GB, you have 200TB of data and you can get a 10:1 reduction level, then it's simple to figure out if it will be worth it," he says.

George notes that shrinking files change other things in the storage equation. "The first wave of users will be people avoiding disk purchases," he says. "The second wave will be people storing things they never thought of archiving before, like transferring seismic archives from tape to disk."

Right now, the group of vendors offering capacity optimization for primary storage is small. In addition to the three vendors mentioned, NetApp bundles optimization into its Ontap GX OS; Greenbytes offers an appliance that combines the Sun Fire x4540 server and the ZFS file system; and Riverbed has announced a box that sits in the WAN pipeline. But Burgener thinks all of the major storage vendors will offer data reduction for primary storage in some way in a few years. Yet array vendors may be caught in a bit of a vise: "If they have it in their arrays," he says, "that means you buy less storage."

--Peter Bochner, with additional reporting by Rachel Kanner

Article 9 of 20

Dig Deeper on Storage Resources

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All