This article can also be found in the Premium Editorial Download "Storage magazine: Hot storage trends and technology for 2010."
Download it now to read this article plus other related content.
Data deduplication for primary storage
The rate of growth of digitally stored information is putting many storage managers on the defensive as they struggle to address the operational risks and costs associated with unchecked data growth. In 2010, a variety of data-reduction technologies for primary storage, including deduplication, will provide some relief in hard-pressed storage shops.
"Business are finding that it's taking a lot less time to reach that second terabyte or petabyte than it did to reach the first," said Tory Skyers, a senior infrastructure engineer at a leading credit issuer. "Primary dedupe will allow any business to increase the density of data on their existing disks by at least twofold."
A fixture in backup environments, dedupe can also be applied to primary storage, thus helping to cut space, power and cooling costs. But primary dedupe won't yield the dramatic results common with backup dedupe.
Performance is another concern. "With backups, as long as the virtual tape loads and the backup works, everything is fine. With primary storage, performance isn't as cut and dry," said TechTarget's Preston. "If a restore [of a backup system] goes slowly, it's not the same as a system where you have thousands of people accessing files that they expect to open immediately."
The key to primary dedupe may come from finding the right balance between benefits and costs. "I'm looking to reduce my cost for storage and it's all about maximizing it with online compression and online dedupe," said Greg Schulz, founder and analyst at Stillwater, Minn.-based StorageIO Group. "Primary dedupe is not good for data that you're frequently working on, but it's good where you can trade time for money savings."
Both inline and post-processing dedupe can be applied to primary storage. For applications that can afford the performance hit, inline dedupe is perfect. If the data in those systems can be held in cache and then deduped before it hits a disk, fewer disks are required on the back end of the system, which ultimately cuts costs. "While inline is currently the slowest performer, I have a feeling with the advent of [solid-state storage] and larger inline caches, it's eventually going to catch up with post-process," Skyers said.
Some major storage vendors, including EMC and NetApp, are now offering primary data-reduction capabilities. NetApp's dedupe is built into its Ontap operating system. It works by storing the cyclic redundancy code (CRC) of every block written to storage, comparing the CRCs, and then eliminating and replacing any matching blocks with a pointer.
"NetApp is doing real dedupe and they're doing it essentially without a change in performance," Preston said. "When the actual dedupe process is running there's a change in the performance. But once the data has been deduped and you're just running your database or VMware, there's essentially no change in performance."
Ocarina Networks and Storwize Inc. also had early primary data-reduction entries. Ocarina's ECOsystem is an out-of-band appliance with software that's tuned to the data types associated with specific applications. Storwize's STN appliances work with NAS devices to compress and uncompress the data inline. Both of these startups have garnered a lot of attention that has led to partnerships with a variety of storage vendors.
This was first published in December 2009