Published: 01 Oct 2010
With demonstrated results for backup, data reduction techniques are now being aimed at primary storage. It's a hot market, but there are still plenty of wrinkles to iron out.
By Jeff Byrne
Mergers and acquisitions are alive and well in the storage industry, particularly when it comes to storage capacity optimization (SCO). SCO technologies, such as data deduplication and compression, increase the utilization of primary, secondary and/or archived storage by shrinking the amount of stored data.
Despite being a relatively new market, we've seen a recent surge of SCO consolidation. In a single month, Dell Inc. scooped up compression and dedupe vendor Ocarina Networks Inc. and IBM picked up real-time compression supplier Storwize Inc. And don't forget EMC Corp.'s acquisition of Data Domain Inc. last year.
Dedupe slims backups
SCO technologies have traditionally had their greatest impact on backup storage, with data deduplication playing the leading role. Dedupe vendors can often reduce needed backup capacity by 90% to 95% or, put another way, increase effective backup capacity by 10 times to 20 times. But while traditional deduplication can pay big dividends in shrinking backup capacity requirements, it hasn't been as effective in primary storage environments. So for primary storage, the focus has shifted somewhat to compression-based technologies.
Optimizing primary storage with compression
Primary storage has particular characteristics that make it difficult to shrink. Unlike backup storage, it doesn't consist of a lot of nearly identical data. And many primary storage environments are performance-sensitive and can't be slowed down by optimization processes.
Storwize's technology compresses file-based data in-line with little or no impact on application performance. The Storwize appliance sits between a NAS array (NFS or CIFS) and users of the data, and typically reduces the stored data footprint by 50% to 90%.
Storwize only works with file storage, and its compression algorithms aren't optimized for specific file formats. But what it does, it does well. For IBM, Storwize will work with its N series and SONAS NAS systems; it will also work with non-IBM NAS systems from EMC, Hewlett-Packard (HP) Co., NetApp and others.
In contrast, Ocarina Networks compresses data using an out-of-band, post-process approach, reading and compressing stored data and then writing the smaller files back to storage. Ocarina's technology is content-aware so its optimization is tailored to the particular type of content. Overall, the capacity savings afforded by Ocarina are on a scale similar to those delivered by Storwize.
Dell has offered (or shown interest in) several competing storage capacity optimization technologies and will now need to sort out its product portfolio to see where Ocarina Networks best fits. Ocarina recently introduced a software-based deduplication product that can be embedded in other vendors' storage arrays, but Dell is likely to discontinue that OEM business. That would leave Permabit Technology Corp. as the only vendor that will provide an embeddable SCO solution to other storage system vendors. Looking ahead, we believe Quantum Corp. might also OEM an embeddable SCO solution, combining its StorNext file system with its dedupe capabilities.
The Holy Grail: End-to-end optimization
So what's driving this wave of storage capacity optimization market consolidation? All of the leading data storage companies are in need of effective SCO solutions, and they're scurrying to acquire technology from the rapidly dwindling ranks of independent SCO vendors. Typically, vendors accumulate technologies to shore up or fill gaps in their SCO portfolios, and then try to piece the technologies together into a coherent whole. More often than not, the results are collections of poorly integrated, often incompatible point solutions.
Users are the biggest losers in this game. Consider this scenario. A storage manager selects a deduplication product for backup, and successfully reduces backup storage by 90%. So far, so good; but what comes next? Maybe they now want to move inactive chunks of deduplicated backup data to an archive where it can be searched, used for e-discovery and so forth. But that can't be done without rehydrating the deduped data back to its original file format, thus losing the benefits of the original deduplication effort.
Now suppose our storage manager decides to use a compression app to optimize primary storage. Once again, there are some limitations. In some cases, compressed files will need to be rehydrated before they can be moved among different storage tiers. And the same could happen if you want to dedupe the compressed data during backup -- you may have to rehydrate the data.
That might not seem like such a big deal, but rehydrating consumes networking and CPU resources, as well as the disk capacity to store the data once it's rehydrated. Rehydration may also result in the loss of information, where data has been deduplicated or compressed using a "lossy" algorithm.
There's currently no easy way for users to "knit together" diverse SCO solutions to enable stored data, once optimized, to retain the benefits of that optimization throughout its lifecycle. Instead, you'll have to be content to choose the best optimization solution for each storage tier (backup, archive and primary) and put up with the inefficiencies caused by data crossing those boundaries. It's not likely to get better in the short term, as vendors focus on building and differentiating their own proprietary stacks and interoperability standards fall by the wayside. All of that makes planning a capacity optimization strategy for the next three years to five years pretty tough.
We recommend you try to stick with a single vendor that's in the process of integrating its various storage capacity optimization technologies; it's probably the best opportunity to achieve something close to end-to-end optimization at some point. Several vendors are pursuing end-to-end strategies, but none is delivering on the promise yet.
Over the long term, we think your interests would be better served by standardized, cross-vendor solutions that enable interoperability among different SCO solutions. With a standard way to communicate among optimized systems, it might be possible to migrate or re-tier optimized data without the overhead of rehydration. In addition, vendors would be able to use their optimization technology outside of single storage boxes, or cross-license their technology as part of a richer SCO ecosystem. But right now, standardization seems unlikely.
Still, we can be optimistic that the best days of the capacity optimization market lie ahead. Several major vendors we've spoken with say they understand the ultimate goal even as they're amassing pieces of the puzzle.
BIO: Jeff Byrne is a senior analyst and consultant at Taneja Group. He can be reached at firstname.lastname@example.org.
- Tiered Storage - Optimizing the Storage Infrastructure –Fujifilm Recording Media USA, Inc.
- Illuminating Insight for Unstructured Data at Scale –IBM