The data classification process allows companies to organize their information in a way that corresponds to business needs. The process can be laborious, especially in large organizations with significant content to evaluate and categorize. Management must often spearhead the classification effort with input from every department -- data classification is not solely an IT function.
In addition, the software tools that support data classification are evolving, growing beyond the niche markets that they primarily serve now. But an enterprise that implements data classification properly can understand what data they have, recognize its importance and make informed decisions about how it should be managed and stored. This allows companies to realize a range of benefits that can save time and money, and reduce legal vulnerability.
Data classification serves the need for compliance and risk management -- mission-critical data can be identified and protected to meet compliance audits or legal discovery tasks. But other benefits are often overlooked. Data classification offers a cost savings by allowing less important data to be migrated from expensive
Some companies try to enhance the security of their corporate information using encryption. The problem with encryption is that it demands processing overhead, which slows performance for the user. Without data classification, an encryption process would encrypt everything, impacting users far more than necessary. But through a data classification initiative, a company can identify and encrypt only the relevant data.
Indexing is a popular method for improving retrieval times for end users. Data classification can help companies identify data that would benefit most from indexing, then move that data to the location that provides the best storage performance for indexing tasks. Backups can also benefit from data classification. Traditional backup processes will typically save everything to disk or tape, but companies rarely need every fragment of data to run their business. Data classification can help a business identify the mission-critical information needed for continuous operation, then focus backups on that essential data. This allows for faster backups and restores, reducing recovery time objective (RTO).
Seeing through the data classification hype
The first thing a storage administrator needs to do is see beyond data classification misconceptions. First of all , data classification is not a difficult process. "We make it appear like it's really expensive, and it's really hard and you'll never be able to do it -- and it's just not true," Duplessie says. "You don't have to buy a thing to start this process."
The key is to communicate with others in the organization in order to establish a set of common parameters. "You're going to have to have candid discussions, in a room, for an hour and come up with some macro points that everyone can agree to," says Duplessie. "From there, you can get as detailed as the organization wants to go." As evaluation of data assets becomes more detailed, additional time (and more careful communication) will be needed to keep a classification project on track.
The second misconception is that there is one right way to accomplish a data classification project. The needs, and the results, are as varied as the enterprise itself. John Merryman, senior consultant at GlassHouse Technologies Inc., says that most people agree on the definition of data classification at a high level, but their execution can be radically different. The misconception, he says, is that "data classification is all things to all people."
Nor is data classification essential for everyone. "Deep, sophisticated data classification (and involved data classification initiatives) aren't necessarily for everybody," says Greg Schulz, senior analyst at Evaluator Group. "Everyone needs to have an understanding of what data they have and how it's being used -- a basic awareness of their environment. But not everybody needs a very deep and involved knowledge."
Another frequent misconception is that data classification is basically the task of moving old data off of primary storage. The value of corporate data does not necessarily correspond to its age -- old data is not inherently "less valuable" to a business. Sales spreadsheets from the previous quarter are far more valuable to the organization than jokes sent by e-mail yesterday, or MP3 files downloaded last week. There's no such thing as an "old data" group. Instead, "age" is a filter that is applied to established groups.
No clear point of inflection
When should an enterprise implement a data classification initiative? There is no clear correlation between the volume of corporate data (or the sheer number of files) and the need for data classification. Business needs and risk tolerance should be the driving forces behind data classification. The consensus among analysts suggests that data classification offers more benefit to larger companies with the administrative and technical expertise to manage such an undertaking.
However, even relatively small companies can benefit from a data classification initiative, if they deal with compliance neds and government regulatory requirements. "The answer is not based on size, but based on the value of the information," says Michael Peterson, program director of the Storage Networking Industry Association's Data Management Forum. He points out that a small medical shop with a Insurance Portability and Accountability Act requirement can sustain just as much risk as a larger organization with more files or larger data volumes. The real issue is whether the risk or needs can justify the work involved.
"The bigger shops tend to think about these things and act on them a little bit more quickly," Duplessie says. "But I don't think there's any more or less relevance to doing it in a big shop versus a small shop." Schulz agrees. "Any size business could benefit from some level of basic data classification," he says. An understanding of what data is present, where it's located, how it's being used and how it's growing can have a significant impact on storage planning in any organization. But larger and deeper initiatives are often best justified with significant data volumes. "That is where we see multiple terabytes -- it could be tens of TB, it could be hundreds of TB," Schulz says. He believes that some smaller organizations may find that adding storage is more cost-effective than undertaking a data classification initiative.
Go to the next part of this article: Data classification: The vendors
Or skip to the section of interest: