What you will learn in this tip: Data continues to grow at a fast pace in today’s storage environments, but many shops still haven’t utilized widely available data reduction methods, such as thin
provisioning, data deduplication and compression. This tip outlines what these technologies can offer users, and what steps you can take to get started with data reduction.
Data reduction methods, from thin provisioning to compression and data deduplication, have long been available for enterprise IT use. So why has adoption taken so long? Some would say buyers are waiting for the right technology, but the reason may be that they simply aren't sure of the use case.
There has always been a conflict in IT between performance and capacity utilization: if you improve one, the other tends to fall. The classic example of this conundrum is compression and primary storage -- although some disk capacity can be saved, it slows computers to a crawl.
But times are changing. Data growth is accelerating, and storage utilization hasn't improved. The amount of wasted capacity is rocketing upwards, even as server virtualization has improved the efficiency of the data center as a whole. This is the reason for widespread interest in thin provisioning storage systems: they attack the past utilization problem head-on.
Thin provisioning is no panacea for out-of-control data storage growth. Implementing an effective thin provisioning stack requires communication from applications through file systems and operating systems (OSes) to storage arrays, and these communication channels are just beginning to be created.
Symantec Corp. Storage Foundation and VMware Inc. vSphere are the only host-based products that actively communicate with thin provisioning storage arrays today. It's likely that future OS updates will include thin provisioning commands like T10 block zeroing and SCSI UNMAP, just as they have adopted ATA TRIM support for solid-state drives (SSDs).
Yet, even if all OSes and file systems were “thin capable,” what would the result be? Thin provisioning is only one aspect of capacity optimization, and it doesn't address data growth itself. Every byte of data created must be copied, backed up and archived. This magnification effect is immune to the benefits of thin provisioning and requires a different approach.
Data deduplication and compression
Data deduplication, a special form of data compression, has gained wide acceptance in data protection products. Most backup systems, and the storage devices that support them, now include some type of data deduplication technology. But what about primary storage? Could deduplication and compression be valuable there as well?
Only a few vendors have developed deduplicating primary storage devices, and the results have been mixed. Although Moore's Law has provided sufficient CPU resources to handle dedupe and compression in real-time, these storage arrays stand apart from the flow of data protection and archiving, so the benefits don't accumulate. In short, primary storage optimization only attacks primary storage efficiency, not the entire data lifecycle.
Even so, deduplication and compression can be very valuable where storage capacity is extremely expensive. The new crop of “all flash” storage devices recently introduced by vendors like Nimbus Data Systems Inc., Pure Storage Inc. and SolidFire leverage data reduction methods to be competitive with disk-based alternatives. Full-time thin provisioning, deduplication and compression, combined with high-performance CPUs and flash chips, make these products possible. And it's likely that mainstream storage vendors are now eyeing the possibilities of capacity-optimized flash.
However, it's more difficult to see the payback of applying data reduction technology to existing storage systems. Data reduction products can extend the useful life of capacity-constrained systems, but not all applications are ideal for data reduction. Some special cases, such as the storage of office files for digital photographs, can be especially ripe for data reduction technology. Buyers should focus on “out of control” growth areas that respond well to these technologies rather than static data sets.
Data reduction methods are likely to become a major force in future enterprise data storage devices. But today they remain somewhat marginalized in the realms of flash storage and media files. Perhaps one day we'll see a fully integrated “data reduced” lifecycle, but we aren't there yet.
BIO: Stephen Foskett is an independent consultant and author specializing in enterprise storage and cloud computing. He is responsible for Gestalt IT, a community of independent IT thought leaders, and organizes their Tech Field Day events. He can be found online at GestaltIT.com, FoskettS.net and on Twitter at @SFoskett.
This was first published in September 2011