Classification of data can solve your data storage problems

Having a data classification process in place can make your data smart enough to know what to do with itself, allowing you to attain the holy grail of enterprise data storage.

What's the holy grail of enterprise storage? The single "thing" that affects every bit of hardware and software that hooks into data center storage? The discovery of the end-all and be-all of not just "storagedom" but the entire IT realm and all the business processes continually gobbling up and spitting data out?

Hyper-converged infrastructure, you say? Interesting, but that could be considered mostly a reshuffling of the data center deck chairs. How about software-defined storage (SDS)? No, SDS really looks more like a shift in focus that de-emphasizes hardware than anything weightier than that (and it's still mostly sold by hardware vendors -- go figure). Cloud storage? Nah, that's just another place to put your data. What about object storage, the current darling of the array set?

You're getting warmer, because one of the coolest things about object storage is its ability to support extended metadata, and metadata is the underpinning of the classification of data, which is -- indeed -- the elusive holy grail of storage.

Data classification: Ignored for years

Yeah, yeah, I know I've taken up the banner of data classification on more than a few occasions, but convincing people (and vendors) that there's more to storage than zippy flash performance and high-capacity drives ain't easy. Latency, throughput, IOPS -- none of that matters all that much if you don't know anything about the data that's being written and read.

Leaving something like classification of data up to the whims of humans is a pretty effective way of setting it up to fail.

And while there is consensus that classification of data is important, it has languished almost as an afterthought in most shops for decades. And it keeps on getting a bad rap. Remember ILM? Information lifecycle management endeavored to bring order to data disarray, but it didn't take long for ILM to become the kiss of death in the storage world. And back in the Stone Age, when mainframes roamed the earth, HSM, or hierarchical storage management, was the methodology for data classification and management. But all that seems to have been tossed to the IT trash heap with "new" storage architectures and infrastructure and the ongoing struggle to keep up with capacity demands, processing requirements, data protection and so on.

But that's actually the best argument for data protection, because it makes all of those things easier -- and cheaper -- to do. It helps get them done better, too.

You've gotta know what it is to know what to do with it

Still not convinced? Let's try a little analogy. We'll make believe you're doing some spring cleaning on that catch-all hall closet packed with lots of stuff that didn't seem to belong anywhere else. Digging through the disarray, you find something way, way in the back behind a tennis racquet with broken strings. What do you do with it? At this point, you have no idea what to do with it, of course, because I haven't told you what that "thing" is.

It could be a button that fell from a coat hanging above or maybe an attendee badge from the 2009 VMworld conference or maybe it's a long-lost lottery ticket. You might sew the button back on the coat, deep-six the badge or check to see if you're a lucky winner and should be shopping for real estate in the south of France rather than reading this column.

The thing is, if you know what the thing is, you'll know what to do with it. The same goes for data.

Classification isn't a cure-all, but it comes close

Classifying data so you know some basic facts about it -- like what's inside the file, why it was created, who created it and who should be able to look at it or not -- creates a wealth of information that determines how that piece of data should be handled and cared for. If it's the corporate crown jewels, you may need to back it up multiple times, encrypt it and give limited access. If it's plans for the company Christmas party, less stringent measures are likely in order. But you wouldn't know that without knowing more about the file than most current file systems reveal.

While alternatives are available, it's likely there's some resistance to the inevitable vendor lock-in of putting all your data classification eggs in a single vendor's basket.

ILM cratered because it was an extra step, a lot of extra steps, actually, requiring a lot manual intervention and attention. Leaving something like classification of data up to the whims of humans is a pretty effective way of setting it up to fail. But if the process can be automated based on the application creating the file, the person using the application, the group that person belongs to, the security clearance of the file originator and so on, the files themselves will be packed with critical info about their disposition.

In a data-centric world, data should do the talking. "Sorry, you can't copy me to that cloud. ... Hey, it's time to archive me. ... No, don't attach me to an email."

Most storage still not smart enough

When you consider how many ways solid classification of data can be leveraged, it's a wonder that every storage shop isn't doing it today. But maybe it isn't so surprising, as so few storage vendors actually build these capabilities into their products. The technology is, however, available in other forms and formats from compliance, security and other product category vendors.

For example, I came across a very useful document called, The Definitive Guide to Data Classification, published by Digital Guardian, a data security vendor. Yes, classification is key to effectively securing data, too.

While alternatives are available, it's likely there's some resistance to the inevitable vendor lock-in of putting all your data classification eggs in a single vendor's basket. But maybe as interest in object storage grows and becomes more widely implemented, it will encourage vendors to develop some level of metadata standardization as well. That way, applications, OSes and file systems will need just a single vocabulary to act on classified data appropriately.

