(Ed. Note: this guest blog comes from Siemens Medical Solutions storage administrator Jim Hood in response to the editorial in the July Storage magazine, Dedupe and virtualization don’t solve the real problem).
I was happy to see that someone finally acknowledged the root of some of the evils in the storage business. Your editorial, “Dedupe and virtualization don’t solve the real problem,” spoke to the heart of the matter: “The math is easy: More servers mean more apps, and more apps mean more data.” It cannot be spoken any clearer than that. I have been involved with storage all of my 27 years in IT from the early ‘80’s until now spanning mainframe and open systems and have seen the amount of data expand exponentially. I wish my retirement fund had the same growth curve.
In our business, we continue to satisfy our hosted mainframe customers’ needs with relatively small amounts of data (our bread-and-butter apps in zOS use customized VSAM [Virtual storage access method] files hardly over the “4-gig limit” to provide databases for hospital clinical applications) while similar applications on Windows stretches the imagination – mine at least. As someone who has lived through this transformation and now has to support the backup processes for our open system business, the amount of data we handle makes my head spin.
It isn’t unusual for us to process 25 TB of backup data every day (because we use Tivoli Storage Manager, this consists of only new or changed files). We have accumulated over 2 PB of capacity in our backup inventory. I don’t see it getting any less even though we have an active relationship with users, and encourage them to look at what they backup and how long they retain the backup data. The volume just keeps growing.
With all the technology at our disposal, the industry does not seem to want to address your basic math problem. I believe we live in an age where both technology and its pricing have brought us to a point where “creating data is cheap” — so cheap that there is no turning back. We seem to have lost the thought processes associated with data management: how many files, file size, other data spawned from these files, where does the data reside, what data should be backed up, etc.
I’m not sure, going forward, how to make it appear as though storage costs are kept relatively level while at the same time incurring new costs for hardware, software and people to manage this growth. In our environment we pass on expenses by using a chargeback system, but pressure from the user base (application development) to reduce their costs from one fiscal year to the next usually translates to lower chargeback pricing while the real problem — too much data — persists. We can try to dedupe and virtualize our way out of, this but somebody will have to pay for it.
To really address this problem will require, as you stated, “an awful lot of manual work,” but it will be difficult for many organizations to cough up the resource costs to do so. Let’s face it, that grunt work doesn’t generate any new revenue through new products. So again, it becomes a storage management issue rather than a data management solution.
My view is this: Twenty years ago we had a modest home with a one-car garage (mainframe) to keep all our stuff in. In the last decade we decided we needed more stuff — newer stuff — and moved to a larger house with a two-, heck, three-car garage (Windows). The reality of the economy and housing market is reshaping the world of real estate. I’m not sure what kind of “housing crunch” will be necessary to have us take a different look at how we create data. Getting people to do that would be a good first step in the right direction.
Finally, on a more humorous note, I think one of the problems is in how we refer to amounts of data. One TB is no big deal, right? How do I sell my problem to those who write the checks when I speak in terms of one or two of something? “So, Jim, you say you can’t manage your 2 PB easily!” or “What is so hard about managing your growth from 1 PB to 2 PB, come on, you only grew by one!” It is all about perception these days and by truncating real capacities, we diminish the true state of affairs. Sometimes I try to communicate the reality by simply changing the language: 2,000 TB makes a larger impact than 2 PB. Maybe we all need to begin speaking in larger quantities than single digits
EHS Storage Management
Siemens Medical Solutions