With data deduplication rapidly becoming a standard feature in backup products, storage administrators are looking forward to using the data reduction technology for more of their data. But while non-backup products do support dedupe, technology hurdles must be overcome before it becomes ubiquitous.
Data deduplication and compression can help tame data growth and allow companies to spend less on disk, but these technologies are in their early days for primary data.
Today, NetApp's FAS arrays can perform post-process, block-level deduplication. EMC Corp.'s Celerra NAS does file-level dedupe. NEC Corp. of America's HydraStor and Permabit Technology Corp.'s Enterprise Archive archiving products added data reduction for long-term repositories, while Data Domain Inc. has positioned its DD Series for use as a nearline or archival device in addition to backup. Some storage pros have begun to visualize a world with dedupe everywhere, but this is still a dream.
"Three of the eight people I work with have come to me looking for ways to reduce the amount of data we're keeping online," Tory Skyers, infrastructure engineer at a leading credit card issuer, told SearchStorage.com. The problem, he said, is that none of the products available seem mature enough to be a good risk in a risk-averse climate. "Right now, about the only thing I can offer them is NetApp's V-Series gateway with data deduplication, but it's still a relatively young product for their liking," Skyers said.
To be useful for primary storage dedupe must get along better with applications. "To do that you'd have to change your thinking around proprietary things," said David Grant, data center manager at Mitel Networks Corp. in Kanata, Ontario. "Applications would have to be able to rebuild deduplicated data without needing a separate restore."
One storage architect at a large telecom, who asked not to be named because he's not authorized to identify the products his firm uses, said he'll add deduplication wherever he can. He expects to try the single-instancing features in EMC's new Celerra arrays.
"Using pockets of dedupe, you get some benefit, but it's not the same as it would be with a global dedupe domain for the entire enterprise," he said. "You also have to be careful what treadmill you're getting on, and how long you're going to have to stay on it."
Another thorny problem is getting smooth integration between parts of the IT infrastructure performing data reduction. For example, CommVault's Simpana software is often used with NetApp filers, which do primary dedupe. But that deduped data would have to be re-inflated before being handed off to Simpana, and then deduped again before being written to the backup infrastructure, according to Brian Brockway, senior director of product management at CommVault.
Performance is another major hurdle to deduping primary data. Data deduplication is "inherently going to be I/O bound," according to Frank Slootman, CEO at Data Domain. At an 8 K block size, typical storage volumes can require "up to 25,000 lookups per second," he said. "Going to disk 25,000 times per second is going to be excruciatingly slow."
But solid-state drives (SSDs), whose specialty lies in high IOPS, could boost dedupe into primary storage. Slootman suggests dedupe might help the solid-state medium, too.
"What makes solid state economical for primary storage is deduplication, just like deduplication brought disk down to the economics of tape for backup," he said. Slootman said Data Domain is running a system with SSD inside, but it's more of an experiment at this stage. "I'm not setting [release] dates around it," he said. "It's been a research inquiry."