EMC Corp. added primary deduplication to its Isilon scale-out network-attached storage arrays today with the goal...
of making it a better fit for traditional enterprise workloads.
Smart Dedupe, previewed last May at EMC World 2013, is part of Isilon's OneFS 7.1 operating system launched today. The release is the first major upgrade to the OneFS 7.0 Mavericks OS that Isilon began shipping earlier this year. The Mavericks release added enterprise features to bring Isilon beyond its traditional use in high-performance computing (HPC) verticals such as media and entertainment, gas and oil exploration, and life sciences.
"We see two worlds coming together where organizations can turn to scale-out storage for big data requirements and for enterprise IT needs," said Sam Grocott, Isilon's vice president of product management.
Smart Dedupe falls on the enterprise IT side. Isilon deduplication performs post-process block-level dedupe on its file storage. It dedupes along 8K fixed-block boundaries. Grocott said dedupe is done across clusters or on individual directories and is completely software-driven, with no aid from Isilon hardware.
Grocott said he doesn't expect Isilon customers to use dedupe much on traditional HPC workloads because those big data files don't usually have a lot of redundancy. He said mainstream IT file storage can be reduced up to 30%, however.
Smart Dedupe is a licensed feature, with pricing based on the number of nodes and capacity of deduped data.
There are other EMC products that perform dedupe -- including Data Domain, Avamar backup products and VNX unified primary storage arrays -- but Grocott said Isilon's deduplication was "home-grown" specifically for its scale-out network-attached storage (NAS) systems.
Isilon isn't the first scale-out NAS with primary deduplication. NetApp Inc. built primary dedupe into its Data Ontap OS in 2007, and Hitachi Data Systems (HDS) added primary dedupe to its Hitachi NAS last May.
"You can see Isilon making an aggressive push toward having enterprise capabilities," said Ashish Nadkarni, research director for storage systems at Framingham, Mass.-based IDC. "The primary dedupe and the other things they're adding [with Mavericks] are about moving more into the enterprise NAS segment."
Nadkarni pointed out that Isilon's software dedupe approach is more similar to NetApp's dedupe than HDS', which uses hardware in the form of field-programmable gate arrays to accelerate the hashing and chunking involved with primary dedupe.
Isilon's OneFS also has a tool that tells users what the dedupe ratio will be on specific directories and applications before turning it on.
Nadkarni said users must keep in mind that dedupe ratios are only an estimate. "It's like a doctor telling you your surgery will take two hours and involve these procedures. Until you have the surgery and come out of the operating room, you won't really know whether it's a success or not," he said. "Dedupe ratios vary by data types. People won't really know the results until they do it for themselves."
Although designed mainly for enterprise workloads, there's one big data use case where EMC claims Smart Dedupe can help. An EMC big data blog makes the case for using dedupe to significantly shrink the size of the data set required to run Isilon Hadoop Distributed File System, which can be a capacity hog because it requires multiple copies of data.