This article can also be found in the Premium Editorial Download "Storage magazine: Data archiving in the cloud."
Download it now to read this article plus other related content.
Give or take a few million years, the Mayans say we’re doomed; but our data storage systems may be living on borrowed time right now.
One reading of the stelae discovered in the ancient ruins in and around the Yucatan Peninsula holds that the world is kaput as of December 21, 2012. So you can understand why I wanted to get this column published now.
While the consensus of the scientific community regarding the Mayan Apocalypse is that somebody did their math wrong by omitting the exponent that properly places the end of everything at a somewhat later date (41 octillion or 4.124105 x 1028 years after this December), you just never know. Exponents, or “powers of 10” as my first math teacher called them, are shorthand expressions after all. As such, they’re simplifications intended to limit the number of integers required to express large numeric values so we can do math with our fingers or fit big numbers onto the screens of our smartphone calculator applications.
While useful, the incorrect use of exponents can lead to error and misapprehension. Instead of our sun going supernova in 50 million years, a misplaced exponent could put this extinction-level event a mere five years away.
Consider the exponents IDC and others have begun to use to describe data storage capacity growth. An analyst’s chart presented at a trade show last year showed storage capacity growth worldwide topping 21 exabytes in 2011. That’s 21 x 1018 bytes.
Referring to the storage growth chart, this analyst went on to argue that transactional data had been declining as a share of total data being stored, while file data was growing. But that was old news; they had been saying that file storage had exceeded block storage since the mid-aughties. More interesting to me was their assertion that the capacity allocated to replicating data had grown to approximately half of the total capacity deployed, suggesting that most companies were using their most expensive disk to make copies of the stuff they already stored on their most expensive disk. If true, this statistic makes me sick to my stomach for three reasons.
First, given a 7% to 14% failure rate in disk per year, based on the experience of Google and others, that means somewhere between 1.4 exabytes and 2.9 exabytes of data will be compromised by simple disk failures in 2012. It’s a scary thought, and one array makers use to encourage us to purchase spare drives and unused capacity to replace failing platters.
Second, given current estimates of data growth in companies deploying server virtualization -- from 300% over the next three years according to IDC, to more than 600% over the same period per Gartner -- the total capacity demand for storing production data will end up between 300 exabytes and 650 exabytes by 2015. If you double that number to include disk-based replication schemes, you’re looking at a total data storage capacity requirement that exceeds a zettabyte (1.3 zettabytes or 1.3 x 1021 by Gartner’s estimate). Factor in the additional capacity we’ll need to purchase to keep up with drive failure rates, and you’ll need to add another 91 exabytes to 182 exabytes of replacement disks.
Third, if you consider the energy requirements for that much disk, both to power and cool them, you’re looking at a significantly greater energy demand and cost than we confront today. Hard disk power consumption ranges from approximately 3 watts to 10 watts. Calculate how many disk drives are required to deliver 1.3 zettabytes of capacity, plus another 100 exabytes of powered spare drives, and we’re looking at some serious power consumption. Moreover, the heat dissipation requirements for a storage plant in excess of a zettabyte of capacity will be well above the current estimate of about 2 kilowatts per square foot of data center floor space to somewhere in the neighborhood of 10 kilowatts per square foot. Mix in the energy required to power the disk and the energy required to dissipate the heat with the increasing cost of utility power (up 23.2% on average in the U.S. over the past two years, according to USA Today), and you’ve created a real witches brew.
All this paints a pretty apocalyptic picture of data storage and its costs going forward. Unlike the Mayan Apocalypse, however, our movement along this path is not pre-ordained or inevitable.
Compression and data deduplication (preferably done as a function of the file system) will have an impact along the way. And magnetic media manufacturers are working on reducing power demands and improving energy efficiency at the component level. But altering this dismal picture significantly will require a more holistic or systemic rethinking of our data storage strategies.
We’ll need to get much more particular about what we store and where we store it. We’ll need to challenge the disk industry’s mantra about the inefficacy of tape-based storage and bring it back online sooner for hosting, archiving and protecting the 40% to 70% of data that doesn’t need to be stored on spinning disk. And we might just have to eschew any server virtualization software approach that requires an unwieldy reconfiguration and replication of our storage infrastructure to obtain anything like acceptable I/O performance from applications.
In short, we need to get strategic with our storage planning or else the apocalypse we’ll really be confronting in the next couple of years -- perhaps as soon as December 21 for some firms -- will be of our own making and not the result of a galactic reset predicted by some crazy Mayan text.
BIO: Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.
Correction: In my January column (“IOPS per what?”), I mistakenly asserted that HP/3PAR’s 450,000 IOPS record on the Storage Performance Council’s SPC Benchmark was achieved by short-stroking disk. I was informed this wasn’t the case, as the workload was spread across 1,900 drives that weren’t short stroking. While the rig does support short stroking, the technique wasn’t used in this test.
This was first published in May 2012