Problem solve Get help with specific problems with your technologies, process and projects.

Data classification: Getting started

Classifying data and knowing how its value changes over time will improve service levels, create a better working relationship with business units and reduce costs. (This tip is part of our Storage 101 tip series.)

This story originally appeared in the July 2005 issue of "Storage" magazine.

What you will learn from this tip: How data classification can lead to higher service levels, a better working relationship with the business units that create and own the data, and the ability to reduce costs by storing data on an appropriate class of storage. (This tip is part of our Storage 101 tip series.)

A data classification project doesn't have to be complex or difficult to accomplish, but it can easily escalate in complexity depending on how granular the classification effort becomes. Like it or not, data classification will be the cornerstone for a much larger information lifecycle management (ILM) project.

The best way to begin is to use a minimal methodology and a high-level approach to classifying data. This way, there's a clear balance between your level of effort and the ROI. Merely classifying your data is an interesting exercise, but unless you take action, no benefit will be derived. The way data is stored will need to be changed; to do this without creating havoc, the organization has to agree that the current method for allocating storage isn't as effective as it could be. Selling this idea is easier said than done, however.

Ask your business units what they need. The answers may illuminate inadequacies in your storage department, which will lead to infrastructure changes that reflect real business needs. With everyone's buy-in, there will be funds to pay for these changes.

From requirements to classification

Data classification simply means mapping business requirements to your infrastructure. Data classification begins with a structured interview with the user, typically an application or project owner. Having a structured storage management organization, standard procedures and a standardized infrastructure are essential prerequisites to the long-term success of data classification. Don't worry if "firefight" or "chaos" are two words that best describe day-to-day operations. Go ahead and use data classification as a way to reach out to the user community and to get a handle on the business requirements behind the service requests.

Related information

ILM: Get it right the first time

Managing tiered storage a pain in the neck

Data classification efforts typically lack structure and rely on informal meetings between the storage staff and business units, interaction during application or server rollout processes, or just e-mail correspondence to obtain user requirements for storage services. Sometimes it boils down to hallway conversations or phone calls to your pals in operations to get the right service setup.

Often, a knowledge gap exists between the user and infrastructure team, so requirements end up mapping to the "high end" of the scale. It boils down to "What can you do for me?" instead of a "What are your requirements?" conversation. When this is the case, everyone's data is the most important and requires the most expensive, high-performance storage.

Approaching users with a structured set of questions (such as "How would you rate the performance of this application?" or "How mission-critical do you consider this application?") with specific ranges for answers provides consistency and allows business requirements to be mapped to various aspects of storage.

Business requirements map to a storage service through key performance metrics. For instance, production recovery time metrics for an enterprise storage environment might range from two hours to a day. A complete inventory of business requirements facilitates the delivery of multiple storage service types. You should obtain or create a logical mapping of the user environment to your infrastructure. This usually means mapping projects or apps to your infrastructure. Once you have a complete collection of requirements and infrastructure meta data, the requirements gathering phase is done.

Next, align requirements to service offerings by developing a service catalogue, which is the menu of storage services. This living document describes the service provided and offers technical details for each standard offering within the storage service type. The service catalogue provides a reference point for users, and is referenced in subsequent service level agreements (SLAs). As technology infrastructures change, so does the service catalogue.

Unique business requirements map to specific types of storage services. To provide a manageable and flexible storage service, the service must accept changes in business requirements and infrastructure. To accomplish this, storage services must be segregated into discreet storage service domains. Typical storage service domains include primary storage, disaster recovery, backup/recovery and archive. Within each storage service type, tiers of service are often developed.

Service catalogue development requires an iterative approach. Business requirements must be aligned to service offerings, and that alignment takes vision, work and refinement. The first pass will get you off the ground, but subsequent iterations and improvements to the service offerings will be required before a service catalogue is enterprise ready.

Closing the deal

A mature data classification model includes the SLA and a cost model; however, they're not core to the discussion of data classification. The SLA is the key point of interaction between IT and the user. It represents a contract, outlining and quantifying how and when the service will be provided. A cost model facilitates service offering development and the occasional "realignment" of user requirements should a service cost exceed the organization's ability to pay the bill. In reality, chargeback isn't a realistic goal for all organizations; however, user accountability can still be achieved through a combination of cost modeling, measurement and reporting.

Data classification creates a dialogue and a process between users and IT. Activity is likely to span IT groups, business units and disparate user groups, risking political mayhem. Often, data classification creates a first-time dialogue or crosses burned bridges where relationships went wrong long ago. Even if you have a high-level executive sponsorship, internal politics are likely to occur during the process. Identify the type of executive sponsor with enough clout to demand accountability and cooperation within the organization.

Chicken or egg?

The industry often views the building of tiered services as a "chicken or egg" process. It's difficult to standardize on a finite number of tiers when starting with disparate business requirements. Conversely, it's also difficult to present finite tiers without first gathering data. To sidestep this potential problem, work iteratively to define service tiers. Build an initial set of service offerings that reflect the requirements and then classify sets of user data against these offerings. The initial classification can then be used to conduct follow-up dialogues with the business units to confirm the mapping of data requirements to storage offerings.

Introducing vast amounts of procedural change creates more pain than benefit. For data classification, start with a pilot effort that includes a representative sampling of your environment from a business and technology point of view. Don't try to tackle all storage services at one time. Pick one domain, such as primary storage, and focus the service offering development there. Trying to accomplish everything at once will open a Pandora's box of unforeseen problems.


The linchpins of a successful data classification project are detailed planning and meaningful dialogue with users about business requirements. The idea is to match different levels of storage with users' requirements. Make no mistake: Defining policies to map requirements to service tiers is arduous and time-consuming work, but it can be achieved through a sound methodology, an iterative development approach and a rapidly evolving set of tools in the marketplace. The long-term benefits of data classification include cost reduction, risk mitigation and QoS improvements.

To read more of this article, visit the "Storage" magazine Web site.

For more information:

Bye, bye RAID?


Dig Deeper on Long-term archiving