All data isn’t big data, and dealing with it requires a variety of data storage technologies and disciplines.
For a long time, I felt like I was the only person who was confused about this “big data” thing. I thought it meant dealing with large files, but the term also seemed to be tossed around in equal doses to refer to lots and lots of pieces of unstructured data. I think I get it now. It means both things, but it doesn’t refer to everything in between, which is probably 90% of the data stored in most data centers. But, to be fair, the 10% that’s some form of big data can be pretty important stuff.
If you manage to parse these things out, they start to make some sense. Not the kind of sense storage marketers are trying to hypnotize us with by relentlessly pairing big data with “the cloud,” “virtualization,” “solid-state storage” and whatever buzzword du jour they think (or hope) describes their product lines. That’s nonsense. But if you put the 90% of data aside for a moment and look at the two big data constituencies, there’s something of substance to talk about.
It might seem like big data has been reverberating in our ears forever, but it’s only been around for only a few years. When it first emerged, “big data” was a fairly straightforward and succinct description of data that came in the form of large files, like video and medical images and some scientific data. The groups that had to deal with those files -- video post-production facilities, all kinds of health care organizations and research labs -- needed special tools on the storage side to use their jumbo-sized data effectively.
Isilon Systems Inc. (now owned by EMC Corp.) and a number of other storage vendors answered the call for storage system architectures that were more adept than traditional arrays at handling these large files. Snagging a few celebrity customers like Sports Illustrated, which used Isilon’s systems for its Beijing Olympics coverage, didn’t hurt and certainly helped “big data” find a permanent place in our storage technology lingo. Over the years, a number of storage vendors -- like Active Storage Inc., Dot Hill Systems Corp., Omneon (part of Harmonic Inc.), Pivot3 Inc. and Sonnet Technologies Inc. -- have built a solid niche catering to these specific needs with purpose-built storage systems.
The other “big” -- the one that deals with massive amounts of small bits of data -- has a completely different genesis. In fact, everything about this “big” is different from the other “big.” Even the word “big,” borrowed from the other use case and liberally redefined, doesn’t seem right. It’s not big at all, but rather a helluva lot of something -- in this case, vast numbers of discrete pieces of information in the form of disconnected files. So systems built for the original big data probably wouldn’t be all that useful.
For the sake of clarity, let’s not even call the second group of stuff big data anymore; let’s call it “helluva lot of data.” Helluva lot of data means working with lots of files that may (or may not) be related to try to turn seemingly disparate tidbits of data into something that might be useful. That doesn’t strike me as a storage issue even though you need a place to stash away those zillions of morsels that has the horsepower to serve ’em up fast enough when they’re needed.
So it seems that scale-out network-attached storage (NAS) or object storage systems should suit helluva lot of data applications just fine. For helluva lot of data, it’s really a software story, but not a storage management software story. It’s based on the premise that if we can put all those dissimilar fragments together in just the right way, we’ll unearth (or maybe even create) valuable new information. And we need specialized software to do that kind of thing, which really doesn’t have much to do with the underlying storage.
In essence, it turns into an exercise of creating a value proposition to go out and buy technology that will help you find value in all the stuff you’ve been collecting. Interestingly, it’s often not a given that there’s any value buried in the bits and bytes. I guess the “big” question is, “How far do you go in pursuit of that hidden intelligence?” And how much do you spend trying to determine if there’s some real intelligence to be sorted out or if it’s all, well, a helluva lot of junk?
Of course, in most companies the answer to that is “We wouldn’t be collecting all this data if it wasn’t valuable, right?” Um . . . maybe. Now, if you could only dump the junk before you start trying to fit the pieces together to reveal that inner truth . . .
You’ll also probably need some kind of specialized storage hardware that works effectively with software that’s smart enough to discard the pieces that don’t fit before trying to complete the puzzle. And most storage managers would be only too happy to get their hands on some kind of superintelligent archiver/cataloger that might provide some relief for overextended file data systems. Now that would be a helluva solution.
BIO: Rich Castagna is editorial director of the Storage Media Group.
- Unstructured Data –Hitachi Vantara
- Addressing the Changing Role of Unstructured Data with Object Storage –Western Digital
- The State of Unstructured Data Management –Igneous
- How to Evolve Unstructured Data Management Processes –Igneous