Consolidation tools have helped some storage shops keep a lid on storage growth, but a consolidation solution could very well become the problem.
Back in the '90s and throughout most of this decade, a lot of IT shops must have had revolving doors leading into their data centers. Expansion, upgrades and technical refreshes kept the doors spinning as a constant flow of bigger-better-faster gear made its way in and the older, suddenly obsolete equipment was shoved aside. It's a legacy of sprawl that most IT managers are grappling with now, and storage systems are squarely in the sights of consolidation efforts. Unfortunately, most consolidation has the opposite effect.
Figuring out just how we got to this point may be instructive for future planning, if not particularly useful information for dealing with the issues now at hand. It's true that a good part of the problem may seem unstoppable -- we live in a data-driven world where protecting intellectual property is approached with a kind of religious zeal, causing every jot and tittle that makes it into digital form to be safeguarded as if the future of the company hinged on its very existence. Sure, some stuff is important, but a lot of it was barely considered when conceived and will never, ever be read again. Compliance has also added a dose of paranoia to the mix, causing companies to hoard data as if it were some kind of get-out-of-jail-free card.
But in a twist that may seem to defy logic, I think the very technologies that are ostensibly designed to help untangle this mess and allow consolidation are either exacerbating it or poised to do so down the road. The thing is, these technologies -- and how they're applied in data centers -- tend to treat the symptoms without actually providing a cure.
Take a look at server virtualization, the current poster child for data center consolidation. On paper, the idea of bunching up a lot of virtual machines (VMs) on a few massive servers so that you can eliminate a slew of other physical servers makes a lot of sense. But in the real world (according to many of the IT managers I've heard from), that's the way these consolidation efforts started out, only to be undermined when systems admins and users realized just how easy it was to spawn a new VM, a few VMs or a few dozen VMs. In most cases, shops ended up with more virtual servers than the number of physical servers they had before consolidation began. In reducing the amount of hardware on the floor, new issues and problems arose, with bottlenecks where they never existed before.
Data deduplication -- that paragon of storage consolidation -- looms as another potential consolidation paradox. At some point, it seems likely that its advantages will be outweighed by the infrastructure required to keep it doing its job with some measure of efficiency. As your backup volume grows and performance or capacity is threatened, you'll need to add more dedupe devices, which will increase your infrastructure and administrative burdens. This situation compounds itself as the dedupe setup expands in other directions, such as adding remote sites to the mix or replicating data among dedupe boxes. Maybe just adding more cheap disk to your backup environment wasn't such a bad idea after all.
Data archivers may also turn out to be detrimental despite their obvious benefits. When old or unused data is archived off online systems, it has to go somewhere, and that somewhere typically needs to be nearline storage (not tape), which can grow pretty fast.
My point isn't that there's anything inherently wrong with these technologies, it's just that they don't address the root of the problem. No matter how well these tools work and how impressive their results are, it's a good bet they're moving around and storing a lot of garbage.
Effective consolidation has to start with data reduction that accurately assesses stored data and separates the one-off and not-very-useful stuff from true intellectual property. That means you need to know the data you're storing, and the only way that will happen is if you're able to accurately classify it based on the data itself, not on scanty peripheral information about the data.
A few years ago, a handful of startups emerged with products that could classify data with varying degrees of effectiveness. They've since been bought out, gone out of business or morphed into e-discovery tools. It's not that they weren't useful or necessary, it's just that they came on the scene at a time when the need wasn't as apparent as it is today.
Hopefully, some storage vendors out there have recognized this opportunity and are busily cobbling together data classification apps that will actually help manage the information stored on disks and tapes instead of just shuffling around all those bits and bytes.
BIO: Rich Castagna (email@example.com) is editorial director of the Storage Media Group.
Dig Deeper on Storage virtualization
Altaro backup product steps 'inline' for VM protection
Now's the time to virtualize backup deduplication
On-array primary deduplication entices customers, and vendors notice
How should I factor dedupe technology into my virtual machines?