| While virtual tape library (VTL) vendors have been scrambling to add data deduplication to their products in recent months, the technology is spreading to archiving, replication and even primary storage.
A few examples: NetApp claims thousands of its customers have licensed its dedupe for primary storage, Data Domain is moving to bring dedupe to secondary and archiving applications, and switch vendor Brocade plans to get into the dedupe act with a fabric-based replication device.
NetApp has gone against the grain with dedupe. It's the first storage vendor to offer deduplication for primary data and one of the last to put it in its VTL. According to Chris Cummings, NetApp's senior director of data protection solutions, 2,500 customers and 10,000 systems activated NetApp's free dedupe utility in its Data Ontap OS as of the end of July, and most are using it for primary storage.
Greg Stazyk, systems coordinator at the Michael Smith Genome Sciences Centre (GSC) in Vancouver, BC, says he's been using NetApp's dedupe for primary data on his NetApp FAS6070 system since last year. He says his biggest fear proved to be unfounded, and he's happy with the results. "I haven't seen performance issues, which was one of the concerns I had," he says.
Stazyk gets the most benefit from deduping data sets such as home directories and virtual machine volumes. "We use deduplication on a couple of different types of data sets. Some of the data sets are very high turnover, and we don't get much benefit," says Stazyk. "But we do have data sets that are static and we get good space savings there. On average, we get about 17% to 20% disk savings when we've applied it to static data sets."
Others question the value of deduping primary data. Data Domain CEO and president Frank Slootman argues that primary data doesn't live long enough or take up enough space to make it worth deduping. But part of the argument is over how to define primary data. Slootman says some of his customers dedupe home directories, one of the use cases cited by GSC's Stazyk.
Arun Taneja, founder and consulting analyst at Hopkinton, MA-based Taneja Group, maintains that it takes a different technology to deduplicate primary data than secondary data, and argues that compression technologies from startups Ocarina Networks and Storwize are better suited for primary data. He says NetApp's technology is good enough as a freebie in the operating system, but would not be considered sufficient for backup data.
"What NetApp is doing basically is saying, 'For zero cost I'll give you some reduction,'" says Taneja. "I wouldn't say the effect is zero, but it's nominal, not something you'd write home about."
To make his argument, he points out that NetApp has taken longer to add dedupe to its VTL because it's much more complicated for backup.
Data Domain is moving beyond using dedupe for just pure backup to nearline and archiving, which the firm's Slootman calls "the closest cousin" to backup. "We're going to be gradually adding capabilities to get more transactional-oriented storage, but it's unlikely you will see us running on Oracle," he says.
Data Domain hasn't changed its technology for nearline storage, but is adding features to make it a better fit, such as a RetentionLock file-locking application that rolled out in June.
Dedupe is commonly used in WAN optimization products, but Brocade is developing what it calls the first Fibre Channel-based device to deduplicate and replicate over the WAN, due out in late 2009. Martin Skagen, Brocade's CTO of data center infrastructure, describes it as an extension of Fibre Channel over IP products.
"The data gets reduced while it's in transit," he says. "It's not a data at rest thing, but data in flight. We feel we can reduce the amount of data that goes over the wire by six to 12 times. It's a pretty significant reduction."