Primary storage data reduction -- long overshadowed by its high-ROI cousin, data reduction for the backup set -- is gaining traction in the marketplace with a growing list of products that handle data deduplication and compression for the most expensive tier of storage. Factor in higher-performance hardware – multicore, high-speed processors and low-cost DRAM for cache and solid-state technology -- that promises to help mitigate the performance penalties of primary data-reduction techniques, and it becomes clear that primary storage data reduction is becoming more attractive to IT organizations looking to contain growth in the data center.
Read our six-part Special Report on primary storage data reduction to learn about the techniques involved, how the products on the market approach the problem and where the market is headed.
storage data reduction advancing via data deduplication, compression
While not as hot as data deduplication for backup, primary storage data reduction is getting warmer thanks to a scattering of products that try to shrink the data footprint on tier 1 disk. The companies with offerings in this space are taking a variety of approaches to address the problem. For instance, one primary storage data-reduction approach searches for duplicates at the file level, while others are more granular, comparing data blocks or byte streams, of fixed or variable sizes. Some work post-process, storing the data writes before starting the dedupe process. And one compression specialist operates inline, in the data path.
Post-process deduplication limits performance hit in primary storage data
NetApp Inc. offers dedupe as a feature of its Data Ontap operating system with its FAS and V-series systems. The company cites post-process dedupe as a major reason it's able to limit the deduplication performance penalty to 10% to 20% for average workloads. Writes are stored to minimize interference with application throughput. Deduplication runs later either on a scheduled basis typically during off-peak hours or automatically, based on the growth of the storage volume.
Celerra: Primary storage data reduction through deduplication, compression
Celerra is currently the only primary storage subsystem in the EMC Corp. product family to provide primary storage data reduction. Celerra's deduplication/compression service integrates a number of technologies that EMC acquired, including an extensible policy engine from Avamar and the compression algorithms of RecoverPoint. A free operating system feature, Celerra Data Deduplication works at a file level with CIFS and NFS data, and only on a per-file-system basis (file-level deduplication is also referred to as single-instance storage).
ECOsystem deconstructs before compression, deduplication for primary storage data
Ocarina Networks Inc. claims that its primary data storage-targeted ECOsystem appliance produces data reduction of up to 85% on Microsoft Office documents, PDFs and virtual machine files, and 40% or more on images. But the competition contends the savings come at a performance price that some users may be unwilling to pay.
claims good data compression rates, no performance degradation on STN-6000 appliance
The only primary storage data-reduction product we examined for this story that doesn't have a deduplication component, Storwize Inc.'s STN-6000 inline appliance focuses strictly on real-time compression and installs in front of network-attached storage (NAS) filers from vendors such as EMC Corp. and NetApp Inc. Storwize claims that the product provides compression ratios of 2:1 to 15:1, yet its major distinguishing characteristic isn't compression algorithms.
storage data deduplication is mature now, says Gartner analyst
Data dedupe is used mainly for data backups today, but Valdis Filks, a research director for storage technologies and strategies at Gartner Inc., predicts that primary storage dedupe will have a much greater prominence in the next few years. Check out this podcast to learn what Filks thinks about the primary storage dedupe landscape during the coming year and for some advice on effectively leveraging dedupe in your infrastructure.
This was first published in December 2009