In last year's Storage magazine survey of storage managers, 69% said they plan to implement or evaluate deduplication...
this year. With the idea of deduplicating data becoming so popular, many wonder if it can go beyond backup to primary -- or online -- data.
But for now, data deduplication remains almost exclusively a backup technology. Nearly all products that use data deduplication are either backup applications or virtual tape libraries (VTL). However, late last year Data Domain Inc. added support for nearline storage -- reference data, online archival and remote disaster recovery -- and Network Appliance Inc.'s (NetApp) A-SIS software works on certain primary, or Tier 1, applications.
Nearly everyone agrees that data deduplication isn't ready for high-transactional data and perhaps never will be. But data deduplication can play a role in more than just backing up data as an alternative to tape.
Data Domain CEO Frank Slootman said that customers are using his company's devices for what he calls "general purpose" storage, but that you'll never see deduplication on transactional databases. "We're not running on Oracle yet, not quite," he said. "That's performance-optimized rather than capacity-optimized storage. They're using us for home directories; for example, employees are mapping their directories to our storage."
As for deduplication on those Oracle databases, don't count on it. "It's unlikely that you'll ever see us running on Oracle," Slootman said. "Deduplication is not interesting unless data has some shelf life. If data lives for a nanosecond, why deduplicate it? That kind of data is only a fraction of people's storage requirements. You're not going to dedupe for 1% of your data."
NetApp offers A-SIS data deduplication for what it calls "light-duty" applications as a feature of its Data OnTap operating system. By light-duty applications, NetApp refers to volumes that contain primary data, which are not performance-driven.
"We think deduplication is not just for backup data, but for primary storage," said Jay Kidd, NetApp's chief marketing officer. "Hundreds of customers have turned on dedupe in OnTap within FlexVol [volume-provisioning software]. We've seen it used across a wide range of applications. We're not encouraging people to do it with anything that's transactional or performance-intensive, but people are doing it with things like home directories, CAD files and virtual machine instances."
NetApp went against the tide with deduplication, starting with primary storage instead of its NearStore VTL product. Kidd said NetApp is in the final testing stage of deduplication for NearStore and will soon offer it as software-add.
"It's a different algorithm than in OnTap, and we're taking testing and validation of it very seriously," Kidd said. "With anything related to deduplication, quality has to be a notch higher than anything else. If you lose a piece of disk or a piece of the archive, you lose a lot of data."
Analysts say it's unlikely that data deduplication in its current form will ever be used in mainstream primary storage. They maintain that the type of compression Storewize Inc. uses in its STN appliances and Ocarina Networks claims to be working on is better suited for primary data than deduplication. These vendors use an optimized version of the Lempel-Ziv (LZ) compression between the server and storage arrays.
"Reducing the size of primary storage is going to be the next wave, but on the primary side you need different kinds of technology," said Arun Taneja, founder and consulting analyst at the Taneja Group. "It's not the same as the deduplication that's out there now. Conceptually, I don't know how you use existing dedupe technology for primary storage. Doing dedupe on secondary storage is nontrivial stuff."
Taneja said that advances in LZ compression can reduce data by ratios up to 5 to 1 or 6 to 1, a far cry from the 20 to 1 or so that so many deduplication vendors boast of, but a solid reduction nonetheless.
"The advantage of compressing primary data is it shrinks the most expensive storage down, and it flows through your secondary storage, as well," Taneja said. "Your advantage flows all the way through. It's something we have to pay attention to."
Analyst Greg Schulz of the StorageIO Group said deduplication is too demanding on processors to be the data reduction scheme of choice for primary data.
"NetApp is using it, but the benefits of deduplication don't offset the penalty for online active primary storage," he said. "Moving forward, sure, I could see it as becoming a feature. Eventually, we'll get to it, but we'll need faster processors and better algorithms to get to it. Now it plays to its strength -- nearline, offline data -- where you don't have to pay a penalty to see results. Today for online primary data, you use online compression. With compression, penalties aren't as severe, but you still get a benefit."
Data Domain's Slootman said, "We're a storage company, we're not just a data protection company," perhaps tipping his hand on where he will go next. He said the vendor will add encryption this year, along with bigger and smaller versions of its current product, while trying to broaden beyond backup.
"There are many shades of gray, that's where we still have huge amounts of work to do," Slootman said. "We're going to be gradually adding capabilities to get more transactional-oriented storage, just falling short of [EMC Symmetrix] DMX-class technology."