Data reduction methods for primary storage, such as deduplication and compression, received a shot in the arm in mid-2010 when Dell Inc. and IBM acquired companies specializing in data reduction techniques. But these firms aren't alone in staking out the territory; most of the major data storage vendors are developing products that will have a greater impact on reducing primary storage. "If you had asked me a year ago, I would have said the outlook for primary storage optimization is boring. Today, that picture is much different. We're surrounded by deduplication and content optimization and compression technologies that are all distinctly different," said Jeff Boles, a senior analyst and director, validation services at Hopkinton, Mass.-based Taneja Group. "Primary storage optimization has become a checkbox item for everybody in the file space. When it comes to block [storage], we're sorting out our expectations.
While data reduction methods for primary storage have become more interesting, it's still early days for deduplication and data compression technologies, and there's no telling how long it will take for all the pieces to settle into place. A proven technology in the backup space, deduplication made inroads into primary storage with NetApp Inc.'s 2007 introduction of a free add-on for its FAS and NearStore customers. NetApp promoted its volume-based, post-process deduplication, in particular, as an effective way to remove redundant copies of virtual machine operating systems in VMware Inc. virtual server environments. NetApp is also planning to add compression and expand the 16 TB volume limit for data deduplication. Rival EMC Corp. made its initial pitch for primary data reduction two years ago with file-level deduplication and compression, on a per-file-system basis, as a free operating system feature with its EMC Celerra NS Series systems. In August, EMC extended its data reduction to block-based data with the introduction of LUN compression for its Clariion CX4 and Celerra NS lines. Beyond NetApp and EMC, primary data reduction rested largely with startups or niche vendors until July, when Dell bought Ocarina Networks Inc. Ocarina sold an appliance, which combines deduplication and sophisticated content-aware compression algorithms, but had also been working on an embedded deduplication product it intended to sell through OEM deals with storage vendors. Just 10 days after the Dell-Ocarina announcement, IBM reported plans to purchase Storwize Inc., which sold a real-time data compression appliance and also had OEM deals in the works.
In June, Permabit Technology Corp. had disclosed its intention to OEM its Albireo High Performance Data Optimization Software and has since added design wins with BlueArc Corp. for NAS and Xiotech Corp. for block-based storage. Permabit executives say more OEM deals are on the way.
Permabit's sub-file deduplication works inline, parallel or post-process, and is attracting attention on such claims as minimal performance impact and scalability to petabytes of information, thanks to its grid architecture. Other entrants in the primary storage data reduction space include Compellent Technologies Inc. (which plans to add dedupe next year), GreenBytes Inc., Nexenta Systems Inc. and Oracle Corp. (with its Sun ZFS file-system-based deduplication). David Russell, a research vice president at Stamford, Conn.-based Gartner Inc., said deduplication needs approximately five years to catch on in the backup space, but predicted a shorter timeframe for dedupe and compression in primary storage. Whether dedupe and/or data compression makes more sense in any given situation will vary on the use case or workload, he said. "While it's probably still fair to say that this is an evolving and somewhat nascent market, the idea of data reduction for primary storage is definitely taking hold," Russell said. Data reduction methods: Adoption outlook for primary storage
A deeper look into the roadmaps that Dell and IBM have started to lay out sheds light on the direction primary data reduction methods may take in the coming years. But how effectively the technology will help IT shops address the storage problems associated with explosive data growth remains an open question.
"Many of these vendors have to stand up and deliver some significant proofs of concept about their performance under demanding production workloads before I'll say, 'Hey, I expect adoption everywhere,'" Taneja Group's Boles said. "The verdict is still out in my book." However, Tony Asaro, founder and senior analyst at The INI Group LLC, predicted that primary storage deduplication will be a "requisite going forward in the next few years" and that "customers have to start demanding it," even if it's currently not a priority item for them when they make a primary storage decision. "If primary performance is really important to you, you're probably going to want it as a post process; and if managing capacity in real-time is important to you, then you're going to want it inline," he said. But, he added, "Moore's Law is really going to make primary dedupe much more doable as time goes on. [Primary dedupe] is CPU-intensive, especially if you do it inline. So, you're going to need faster processors, and they keep getting faster."
Dell's sweeping primary data reduction vision Perhaps the most comprehensive vision statement for primary data reduction comes from Dell. Carter George, formerly vice president of products at Ocarina and now director of strategy and business development at Dell, said the vendor plans "consistent and compatible dedupe in every product," including storage, servers and possibly certain types of applications. Moving data between products in its most compressed and/or deduped form can not only produce disk space savings, but speed storage management tasks such as replication, backups, archiving and tiering, as well as save network bandwidth and processing power, George said. He predicted that vendors offering data reduction methods for primary storage will differentiate themselves in two ways: by how well and how fast they deduplicate or compress data, and by their end-to-end IT infrastructure capabilities. Companies such as Hewlett-Packard (HP) Co. and IBM, which also sell servers and storage, would be potential competitors in such an end-to-end scenario. But because data deduplication and compression are proprietary technologies, such benefits hinge on customers buying multiple products from the same vendor, or choosing products that make use of the same technology. If not, the system must expand the data back to its full size, or rehydrate it, before passing it to a product that supports another vendor's deduplication and/or compression technology.
To realize its end-to-end vision, Dell will have to confront the challenge presented by the different backup targets its customers use, from CommVault Systems Inc. EMC/Data Domain and Symantec Corp. Customers would dedupe with Dell/Ocarina on primary storage and another vendor's technology on backup. "The strategy long-term is to see if we can get some compatibility on the backup side with all the primary storage," George said. "For the immediate future, the goal of [Dell/]Ocarina is to get integrated with the primary storage stuff, and maybe we'll work with partners on backup." In the primary storage space, Dell plans to focus on integrating the technology into both its NAS and block-based storage products, with an eye toward making the technology invisible to customers in much the same way that a RAID level is, George said. "We very much think that embedding dedupe right into a file system makes a ton of sense," he said. "Insofar as Dell products have file systems, you'll see our dedupe embedded in there in very much the way that dedupe is in ZFS today." George said customers will likely see two tiers of offerings: "Ocarina Basic" with straightforward deduplication and simple compression as a built-in feature of a storage system, possibly for free; and "Ocarina Advanced" with content-aware compressors and more sophisticated policy options for dedupe, likely at an extra charge. Dell may sell an appliance for NAS, as Ocarina Networks did before the acquisition, and remains interested in continuing Ocarina's plans for an embeddable edition for NAS vendors, a direct-attached storage (DAS) option and a port for Windows servers, according to George.
IBM embeds Storwize compression; HP StoreOnce to primary storage? IBM's recent direction statement on its Storwize acquisition is similar to the roadmap articulated by Dell, with the exception of the server discussion. IBM executives said the company will embed Storwize compression throughout the IBM storage portfolio. IBM also rebranded Storwize's main product as the IBM Real-time Compression Appliance for NAS and discussed plans to use the Storwize brand for a new storage virtualization disk array, which does not yet include the data reduction capabilities. "You won't see Storwize/IBM come out with a Fibre Channel appliance. Customers don't want the complexity of having a Brocade switch, an IBM Fibre Channel compression appliance and an EMC array sitting in a stack," said Steve Kenniston, Storwize's former vice president of technology strategy and now IBM's global storage efficiency evangelist. "The future is all of this technology will be embedded." Echoing the sentiments of Dell's George, Kenniston claimed the file system approach, as exemplified by Oracle's ZFS, is the right model for data reduction methods in primary storage. "If I could do this in the file system, why not?" he said. IBM plans early next year to add a file system interface to its ProtecTier inline block-level deduplication backup product, according to Victor Nemechek, IBM's market segment manager for deduplication. Nemechek said that could prompt customers to use ProtectTier for primary data. "It's almost exclusively used for backup today, but we have seen customers say they want to use it for primary storage," Nemechek said. IBM isn't the only company tweaking its backup deduplication. HP plans to make its StoreOnce backup deduplication available on primary storage, although on a less aggressive time frame than IBM. Company executives say they will eventually extend HP StoreOnce technology to the X9000 scale-out NAS product. But Lee Johns, director of product marketing for HP StorageWorks, listed primary storage as a low priority in HP's dedupe roadmap. "I watch with a skeptical eye whether HP can actually use StoreOnce for primary dedupe because it was developed for backup dedupe," The INI Group's Asaro said. "Those are different technologies and different algorithms." Is standardization a pipe dream? The sort of end-to-end capabilities that Dell and HP ultimately envision, with data passing between systems in deduped and/or compressed form, with no rehydration required, will be restricted to their proprietary environments unless there's a movement for interoperability standards.
The different vendors' deduplication and compression technologies are like snowflakes; no two are exactly alike. Some operate inline or real-time; others post-process. Some work on data blocks of fixed size; others operate on blocks of variable size; some get more granular than others. Some use sophisticated hashing algorithms to pinpoint identical data; others use simple algorithms. Some let users define the scope of deduplication; others restrict dedupe to a specific storage volume or file system. "If I were a user, I'd be asking, 'Why do I have to deal with an infinite number of deduplication and compression algorithms when they're all essentially doing the same thing?'" said Larry Freeman, a senior marketing manager for storage efficiency solutions at NetApp. "The problem is, right now, you've got a very competitive environment, and nobody wants to share because then they're giving up their patents and their intellectual property." Freeman said the topic of a common standard for deduplication came up briefly at a Storage Networking Industry Association (SNIA) meeting, but "the discussion quickly stopped when we realized that it can never happen at this point. "Nobody's going to sit down and even do performance standards at this point because nobody wants to compare their deduplication to somebody else's," Freeman continued. "Nobody wants to be last on the list of who's best at deduplication. So, we're quietly collaborating on things like promoting deduplication and talking about the reasons that you should deduplicate."