Primary storage data reduction -- long overshadowed by its high-ROI cousin, data reduction for the backup set -- is gaining traction in the marketplace with a growing list of products that handle data deduplication and compression for the most expensive tier of storage. Factor in higher-performance hardware – multicore, high-speed processors and low-cost DRAM for cache and solid-state technology -- that promises to help mitigate the performance penalties of primary data-reduction techniques, and it becomes clear that primary storage data reduction is becoming more attractive to IT organizations looking to contain growth in the data center.
Read our six-part Special Report on primary storage data reduction to learn about the techniques involved, how the products on the market approach the problem and where the market is headed.
♦ Primary storage
data reduction advancing via data deduplication, compression
While not as hot as data deduplication for backup, primary storage data reduction is getting warmer
thanks to a scattering of products that try to shrink the data footprint on tier 1 disk. The
companies with offerings in this space are taking a variety of approaches to address the problem.
For instance, one primary storage data-reduction approach searches for duplicates at the file
level, while others are more granular, comparing data blocks or byte streams, of fixed or variable
sizes. Some work post-process, storing the data writes before starting the dedupe process. And one
compression specialist operates inline, in the data path.
♦ NetApp:
Post-process deduplication limits performance hit in primary storage data
deduplication
NetApp Inc. offers dedupe as a feature of its Data Ontap operating system with its FAS and V-series
systems. The company cites post-process dedupe as a major reason it's able to limit the
deduplication performance penalty to 10% to 20% for average workloads. Writes are stored to
minimize interference with application throughput. Deduplication runs later either on a scheduled
basis typically during off-peak hours or automatically, based on the growth of the storage
volume.
♦ EMC
Celerra: Primary storage data reduction through deduplication, compression
Celerra is currently the only primary storage subsystem in the EMC Corp. product family to provide
primary storage data reduction. Celerra's deduplication/compression service integrates a number of
technologies that EMC acquired, including an extensible policy engine from Avamar and the
compression algorithms of RecoverPoint. A free operating system feature, Celerra Data Deduplication
works at a file level with CIFS and NFS data, and only on a per-file-system basis (file-level
deduplication is also referred to as single-instance storage).
♦ Ocarina
ECOsystem deconstructs before compression, deduplication for primary storage data
reduction
Ocarina Networks Inc. claims that its primary data storage-targeted ECOsystem appliance produces
data reduction of up to 85% on Microsoft Office documents, PDFs and virtual machine files, and 40%
or more on images. But the competition contends the savings come at a performance price that some
users may be unwilling to pay.
♦ Storwize claims
good data compression rates, no performance degradation on STN-6000 appliance
The only primary storage data-reduction product we examined for this story that doesn't have a
deduplication component, Storwize Inc.'s STN-6000 inline appliance focuses strictly on real-time
compression and installs in front of network-attached storage (NAS) filers from vendors such as EMC
Corp. and NetApp Inc. Storwize claims that the product provides compression ratios of 2:1 to 15:1,
yet its major distinguishing characteristic isn't compression algorithms.
♦ Primary storage
data deduplication is mature now, says Gartner analyst
Data dedupe is used mainly for data backups today, but Valdis Filks, a research director for
storage technologies and strategies at Gartner Inc., predicts that primary storage dedupe will have
a much greater prominence in the next few years. Check out this podcast to learn what Filks thinks
about the primary storage dedupe landscape during the coming year and for some advice on
effectively leveraging dedupe in your infrastructure.
This was first published in December 2009