In the toolkit available to storage administrators who want to realize allocation and utilization efficiency in...
their storage infrastructures, deduplication and compression are often reached for first. However, these two technologies can be considered short-term fixes, providing temporary relief from capacity shortfalls by squeezing more data into the same amount of space. There are several more strategic tools to consider, but let's consider these, especially in light of all the attention they have received recently.
Though limited in terms of the extent and duration of their impact, tactical capacity maximization tools have gotten a lot of ink in the past few years. Deduplication, for example, is heralded by some vendors as a "capacity expander" that, through the reduction of the physical space used to store data, enables a single drive to store the data that once occupied the space of several drives. Let's take a closer look.
Deduplication originally aimed at backup files stored to disks that were configured as tape caches or virtual tape libraries (VTLs). Deduplicating backup workloads may have made sense, given that most full backups contain a considerable percentage of static data -- that is, data that has not changed since the prior backup. Essentially, deduplication processes mimic, in terms of their impact, a backup strategy often referred to as incremental or change backup, in which only changed data is backed up after a full backup has been conducted. Deduplication merely processes the incremental backup in a different way (by comparing the full backup data to the previous full backup and "reducing" or eliminating the bits that are the same in the new full backup).
Vendors used deduplication algorithms to charge considerably more money for a stand of comparatively cheap (usually consumer-grade SATA) disk drives. An early model of a deduplicating VTL had an MSRP of $410,000 for approximately $3,000 worth of drives and shelf hardware, the justification for the huge price hike being the value-added software (the deduplication software) included on the rig. Once it was determined that few companies would realize sufficient capacity and cost savings from deduplication to justify the price of the rig, the appeal of the strategy began to fade. Also, we have to consider the isolation of deduplication functionality to a single array. Once that array is filled, a second array running an entirely separate deduplication process is required, thus making the value case even less sustainable.
Compression offers another approach, albeit one much more straightforward and less subject to the challenges of identifying and deduplicating similar bits of data. The acquisition of an industry leader in the compression space, StorWize, by IBM in 2012 has introduced a family of storage arrays featuring in-line compression technology. Such compression also has a way of stretching capacity by squeezing the same amount of data into a smaller amount of disk turf. The sensible question to ask is whether the equipment with compression technology built in is yielding sufficient capacity savings to justify its price tag. IBM appears to be migrating some of its StorWize technology to its virtualization engine, the SAN Volume Controller, to provide a way to scale the compression functionality to multiple stands of disk instead of isolating it to one array controller.
At the end of the day, neither deduplication nor compression does much more than buy some time so that more strategic measures can be taken to manage capacity. It is also worth noting that initiatives are afoot within most file-system development projects to build both deduplication and compression directly into the file system for optional use by storage administrators. If these efforts are successful, there may be no need in the future to purchase proprietary technologies for delivering these functions that are isolated on specific array controllers. Gone will be the exorbitant price tags and the vendor lock-ins for such functionality.