While data deduplication for secondary storage was among the hottest storage technologies for 2008, data dedupe for primary data is still in its infancy. Some storage insiders expect data reduction for primary storage to grow in popularity quickly, others question if it's ready for primetime.
There was a mad rush for data deduplication targets for backup data this year. EMC, Hewlett-Packard, IBM, Hitachi Data Systems, Sun, Sepaton and Overland all added data dedupe to disk backup products, while Data Domain and Exagrid continued to expand their platforms.
Yet only a handful of vendors offer data dedupe for primary storage. NetApp leads the way with deduplication as a built-in feature in its OnTap GX operating system. NetApp claims customers are running its dedupe on more than 16,000 of its storage systems. NetApp requires a license for data deduplication but does not charge for it.
Storwize has been shipping a primary data reduction appliance since 2005 and claims to have around 100 customers. Ocarina Networks is a newcomer, launching its NAS reduction product in April.
Storwize and Ocarina technically do compression and not data dedupe, although that distinction isn't always made from a marketing standpoint. Ocarina has taken to calling its product primary data deduplication instead of compression or capacity optimization. "Dedupe for primary storage wasn't our first choice for a working description, but that's what everybody's calling it," says Carter George, Ocarina vice president of products.
Is data deduplication a good fit for primary storage?
Some are calling it unnecessary or not the best use of dedupe technology. "We try to use the term 'primary storage' carefully," says Data Domain CEO Frank Slootman. "If you have data that is really hot, has an extremely high change rate, like transactional data, there's no sense deduplicating it. Where it does seem to matter is in a virtualized environment. It makes a lot of sense when you have multiple virtual machines running that are highly similar."
NetApp chief marketing officer Jay Kidd points to VMware VMDK files, home directories, CAD files and seismic data as good examples of primary storage that benefit from deduplication.
But primary data dedupe products are engineered differently than backup data deduplication. On the backup side, throughput is the key feature. Backup dedupe products may introduce latency, which primary storage data dedupe can't afford. So dedupe for primary storage is architected for performance at the expense of deduplication ratios.
"Performance is an issue on the primary side," says Taneja Group analyst Eric Burgener. "So, solutions on the primary side will need to be different than on the secondary side. You're not getting the same ratios, but getting a lot more bang for the buck because the disk is more expensive [for primary storage]."
Amit Bar-on, IT manager of unified communications vendor Polycom's Israel office, says Storwize's STN-6000 product saves him from having to add disk. Bar-on was adding about 1 TB a year to his NetApp FAS3020 system, but has held firm at 6 TB since installing the STN-6000 in late 2007.
"I didn't buy any disk this year and will not buy disk next year," he says. "We use it on data, images, everything we run on NetApp."
Bar-on says he may look at NetApp's data dedupe to complement the STN-6000. The biggest benefit he gets from Storwize is reducing VMDK files, the virtual machine image files used by VMware, he says.
NetApp's dedupe can reduce 100 TB of VMDK files to 15 TB to 20 TB, Kidd says. "Customers never believe us until they see it. When we show them, they have a 'Holy [cow]!' moment and say, 'What else can I apply this to?'"
Storwize claims its latest software upgrades provides a 15:1 data reduction ratio; Ocarina, used mainly on image files, claims a 10:1 reduction. All data dedupe and compression vendors remind users that results vary by file type and how often files are changed.
More vendors offering data deduplication technology for primary storage
More storage vendors will get into the primary reduction game soon. Ocarina has forged partnerships for its ECO System primary storage compression device with NAS vendors HP, BlueArc and Isilon. Ocarina's Optimizer and Reader software will run directly on HP's ExDS9100 NAS blades and Ocarina has developed custom-built appliances to run with Isilon and BlueArc storage.
Riverbed has previewed its Atlas device, which combines with its current WAN optimization product, to dedupe primary data. It is due to hit the market in mid-2009.
HiFN has card-level data reduction accelerators that can run in storage systems, and greenBytes sells an appliance that combines the Sun Fire x4540 server and ZFS.