Manage Learn to apply best practices and optimize your operations.

“Big data” conspiracy theories abound

Could the latest and greatest buzzword in the storage biz be killing off some of the most useful storage technologies around?

Could the latest and greatest buzzword in the storage biz be killing off some of the most useful storage technologies around?

I’ve gone off about this “big data” thing on more than one occasion, about how it’s mostly marketing hype that vendors hope to turn into sales. But the whole deal is starting to give me the creeps, and it’s not just because the phrase “big data” is being burned into our collective cerebrum with astonishing efficiency.

I’m still struggling with the idea that there’s a solution appropriate for processing big files like high-def digital movies and lots of small files like tweets. Given the number of vendors already piled onto the big data bandwagon, it seems like there are dozens of these so-called solutions floating around. So I have trouble getting over that big data hump because any term that refers to completely opposite things simultaneously is questionable in my book.

I’m not just on a semantical jag here. The big data specter goes deeper than that. If you can look past its dual personality, a pretty clever play for corporate egos is at work here: “Of course our data is big; we wouldn’t want it if it weren’t!” It’s an emotional ploy that feeds into the kind of self-importance that says everything we do/say/create is important and should be kept/mined/analyzed. Big is beautiful, and our data is bigger than yours.

Here comes the conspiracy theory part. Why is it that so many companies today are awash in big data (if they really are)? Sure, there are a lot more ways of creating stuff than before, and everybody seems to be walking around with a data-creating machine in each pocket, but all of a sudden we have to figure out what to do with the stuff. Not too long ago, a few startups appeared with products -- rudimentary maybe, but real products -- that could help us pore through all the stuff we stored to determine what’s worth keeping and what needs to be deep-sixed. Those data classification products had a lot of promise; they came from companies such as Abrevity, Arkivio, FileTek, StoredIQ and Kazeon, and it looked like they would be the cornerstones of storage management operations from tiering to archiving. The premise was simple: You have to know what you’ve got before you decide what you need to do with it.

Maybe it’s too simple. If users get a better grip on what they’re storing and what they shouldn’t be storing, they might -- gasp! -- buy less storage. So it wasn’t a big surprise that most of those classification vendors disappeared, some simply into the ether and others into the portfolios of (guess who?) storage vendors, where, for the most part, they’ve just withered away.

Apparently, knowing what to keep and what to chuck is a little threatening if your company sells the storage to stash all that stuff on. With those pesky data classification apps out of the way, users could get back to doing what they always do, amassing untold heaps of data and buying more disk to store it.

So now you have big data and it’s a big problem, and vendors are all too happy to help. They have big data storage systems and can supply state-of-the-art big data processing tools so that firms like yours can crunch through the piles of data you’ve been encouraged to keep. And as you plow through the knee-high drifts of data, you’ll probably determine your employees spend too much time tweeting, updating their Facebook pages and using the SAN to save photos of their kids. And you’re left to wonder why you’re keeping it all.

But data classification isn’t the only technology casualty of the big data juggernaut. A couple of years ago there was buzz around the arrival of a few products that promised to cut our primary data storage down to size using deduplication and compression. But the two leading startups were scooped up by megavendors, and now primary dedupe and compression sit like a couple of old boxcars on an abandoned railroad siding.

And while you’re still reeling from the big data drubbing you’re getting from vendors, watch out for the cloud haymaker they’re about to land. Because cloud storage is perfect for big data, right? And buzzwords tend to get lonely if they’re left on their own.

BIO: Rich Castagna is editorial director of the Storage Media Group.

* Click here for a sneak peek at what’s coming up in the January 2012 issue.

Dig Deeper on Big data storage