Vendors tout dollars per gigabyte per I/O, but figuring out what a data storage system will really cost your company is a much more complicated process.
If your job involves storing data, you already know storage eats up budget dollars faster than just about anything else in IT. But as daily practitioners, we rarely notice just how many bucks storage is truly consuming. The marketing hype for data storage products stays focused on magic bullets like “more capacity per dollar” or “more I/O per dollar,” but the cost of storage is about more than that.
One of the best ways to consider the true cost of storage is to think about what’s required to store a single piece of data over its entire lifetime. This is called the “lifecycle cost of data storage,” and it includes the technologies, processes and workflows across all types of storage. This type of bottom-up look at storing a piece of data over a period of time not only helps us understand what data storage truly costs, but helps us determine what we should be looking for when we’re shopping for data storage products.
The classic challenge for the strategic storage manager is that the vast majority of data created in the data center starts its life on primary storage, so storage purchasers tend to fixate on these costs. But capital costs are actually more of a distraction than a real representation of what storage costs. Even worse, the costs are even more skewed when you consider the dissimilar products that are typically part of most storage environments. Even the total costs of primary data storage are often misunderstood. Primary storage is often poorly provisioned and underutilized, a problem that’s compounded as more storage systems are added and more features are licensed (you may need additional replication or copy functionality with multiple systems installed). And you’re bound to see at least a linear increase in the time and effort required to manage the storage, especially when time-consuming tasks like data migrations are required.
But the cost of storing data goes beyond just primary storage. Each piece of stored data is surrounded by an ecosystem of information services. Data protection is the most obvious, and it typically consists of several parts such as the management of multiple disk and tape tiers, and the logistics of getting protected data offsite. A holistic look at the cost of data protection alone is likely to dwarf the capital costs of primary storage systems.
There are plenty of other costs besides data protection. Regulatory compliance and the changing nature of intellectual property encourage ever-longer retention of data. To hold onto more data for longer periods of time, companies must spin up entire infrastructures and establish practices around information archiving, search and retrieval. Disaster recovery is yet another data protection function that can’t be ignored, and its cost can be significant. The replication engines, bandwidth consumption and duplicate storage infrastructure required for disaster protection can easily induce some serious sticker shock.
Taneja Group has done extensive benchmark testing of hypervisors for what we refer to as virtual machine (VM) density; it’s based on the idea that different hypervisors are more or less efficient, so that given matching hardware, some will be able to run more VMs or deliver better performance. When considering all the hardware, licensing, management effort and consumption of data center resources, this can have an enormous bottom-line impact. Recently, we’ve also focused on “storage density,” which is a similar concept. It considers how a product or technology can help you shrink your data footprint when considered across storage capacity, storage systems, bandwidth consumption and human interaction.
It’s not really a new idea; rather, we’re just now starting to see solutions that attempt to bring together capabilities that used to come from many separate storage products. In part, it’s also a reaction to changing IT practices in the enterprise; with virtualization becoming mainstream, data protection practices have had to change, and it’s not uncommon to see primary storage products that incorporate some aspect of data protection. Similarly, data storage products are featuring better optimization, and secondary storage systems are raising performance and becoming more general purpose.
A number of vendors try to communicate this message in terms of storage efficiency. But storage efficiency often implies a sort of soft, qualitative comparison, and it can fool users into thinking that improving storage might just be about tweaking one or two of the most obvious dimensions of efficiency, such as capacity optimization. Whether we call it storage density or storage efficiency, it’s more than a single dimension.
We’ve explored various examples of this functional convergence creating greater storage density in recent lab exercises. I see those exercises as representing the market as a whole; each vendor in this consolidating, scaling-storage landscape is striving to enhance the value and competitiveness of their offerings. But the real winners are users, who are finally getting storage technologies that can help enterprises scale their storage infrastructures through better storage density rather than sprawl.
BIO: Jeff Boles is a senior analyst at Taneja Group.