This article can also be found in the Premium Editorial Download "Storage magazine: The lowdown on solid-state storage."
Download it now to read this article plus other related content.
Compression and data deduplication
Compression is one of the oldest methods for saving space and data deduplication is one of the newest, but they're related and each has a role to play in holding down storage spending. Understanding how the technologies differ is the key to using each one most effectively.
Compression uses mathematical algorithms to simplify large or repetitious parts of a file, with different compression products aimed at different use cases and various types of files. Some storage shops use the compression capabilities built into popular operating systems such as Unix, or even low-cost utilities such as WinZip on Windows platforms. Later this year, NetApp will release compression features "covering all the platforms we now cover, including primary storage," said Chris Cummings, NetApp's senior director of data protection solutions.
Data deduplication eliminates duplicate patterns within a data store, and in ideal cases -- such as repeated backups of almost identical files -- vendors claim they can reduce data sets by 15:1 to 20:1. It's no wonder that 95% of respondents to the Symantec survey are at least discussing data deduplication, with 52% either implementing or having implemented it.
However, deduplication works best on data to which only minor changes are made over time (such as backups of lengthy business documents or engineering plans) rather than data of which only one copy exists, such as a CAT scan stored
In fact, Crump said, deduplication "loses value the closer it gets to primary storage" where there are usually fewer multiple copies. To prevent dedupe from slowing down disk access on primary storage, the deduplication would have to be done after the data arrives on disk, added Andrew Reichman, a senior analyst at Cambridge, Mass.-based Forrester Research Inc. "This will require swap space to write data un-deduplicated and then deduplicate it to a separate set of disk," he said. This "could eliminate the capacity reduction," he added, which is the whole point of deduplication.
Health Alliance Plan's Trim said he's seeing approximately a 50% savings in storage capacity with Symantec's Veritas NetBackup PureDisk.
Different vendors squabble over just where and how to use dedupe. Symantec, for one, is pushing a "dedupe everywhere" strategy, while NetApp's Cummings said he doesn't recommend it "for your tier 1, highly transactional, high IOPS database environment. But we do see it as being safe and having little or no performance impact" for storing virtual servers, tier 2 databases, file services and archiving.
For Chris Watkis, IT director at Grey Healthcare Group Inc. in New York City, data deduplication was an unexpected benefit from his purchase of a FalconStor Software Inc. Virtual Tape Library in 2007. His main goal was to speed backups and restores as the medical marketing company moved into more markets, created bulkier content such as video and held onto that content for longer periods.
This was first published in September 2009