Data classification isn't just for document management systems anymore; it's the key to storage efficiency.
A few years from now, when you look back on 2009, what will you see? Will it be just a blur of slashed budgets and urgent demands for new capacity, leaving you wondering how your storage shop survived the turmoil? Or will you see it as the turning point that set your shop on the path to greater storage efficiency?
How you reflect on these tough times will depend on your ability to turn misfortune into opportunity.
It's pretty clear that this year's theme is "efficiency." Months ago, I said 2009 was going to be a year of rethinking how we manage our storage. While you've consistently been hearing that you have to do "more with less," the truth is that you have to do more with the same stuff -- just better than ever before.
And there are plenty of tools to help you use your storage gear more efficiently. Data deduplication -- the poster child for the storage industry for the past two years -- is a great way to avoid adding more disk to your backup operation while protecting growing data stores. More and more storage managers are implementing dedupe in their primary storage systems, too. Archivers do a neat job of moving less useful data off primary disk and onto cheaper storage systems. And if you bought or plan to buy new storage arrays this year, there's a good chance you're looking at low-cost, high-capacity systems that can add a tier to your setup and ease the strain on your more expensive primary storage.
These are all very effective ways to make better use of installed storage systems, but they are, in essence, point solutions that tend not to play well with each other. For each one to work well, you need to know the nature of your data and be able to classify that data so you can determine if it should be saved and, if so, where it should go.
That means, of course, data classification, something that should be on your agenda in lean times and anytime. In some cases, the classification capabilities are fairly sophisticated, with policy creation scenarios and integration with directory services like Microsoft's Active Directory and LDAP. But classification is often limited to the basics like file extensions or time/date stamps. The differences among products' classification features can make it hard (or even impossible) to create consistent data classification policies across multiple services like data archiving and data migration.
So what's missing from this picture? A unified, or federated, data classification method where you can define your corporate, regulatory and maintenance policies once and apply them across the board to all products that move data around your storage environment.
A few years back, several companies brought out data classification products; they managed to form a few alliances with other vendors, but the idea of having a separate box or application to handle classifying data for other applications never caught on. But that was a different time, for sure. Budgets were bulging back then and escalating capacity demands were dealt with by just tossing more iron into the data center.
If the theme is indeed efficiency, wouldn't it be the ultimate in efficiency to create a single set of data disposition policies and apply them across all apps? A consistent approach like that would work for 90% or more of the data you're storing right now -- if it's important enough to save in one system, it'll likely be just as important in another.
So what has to happen to reach this kind of data classification nirvana? First, vendors that provide only rudimentary classification criteria in their products need to beef up those capabilities. And, second, vendors have to either create a standard set of APIs that let systems share each other's classification information or they have to create a completely transparent, common classification system that plugs into their apps.
Cynics would say that these things will never happen because the storage industry, while giving lip service to standards, never actually standardizes much at all. But the only way that can change is if you put pressure on them. Storage classification is important right now, and it will get even more important as storage systems become more intelligent -- so let your vendors know now that data classification is important to you.
In the meantime, you can create your own internal standard classification system. Make sure the classification and data disposition policies you create for one application are used consistently across all others, too. And don't settle for products that don't provide the level of classification sophistication that your environment requires.
BIO: Rich Castagna (firstname.lastname@example.org) is editorial director of the Storage Media Group.
- Focus on Storage in a VMware Environment –SearchConvergedInfrastucture
- Storage in a Virtual Environment: Expert Answers to 4 FAQ –SearchStorage.com
- Accelerating Time-to-Value: Fast-Growing Reduxio Implements Priority Engine to ... –TechTarget