Classifying data and knowing how its value changes over time can lead to good things: higher service levels, a better working relationship with the business units that create and own the data, and the ability to reduce costs by storing data on an appropriate class of storage.
A data classification project doesn't have to be complex or difficult to accomplish, but it can easily escalate in complexity depending on how granular the classification effort becomes. Like it or not, data classification will be the cornerstone for a much larger information lifecycle management project.
Data classification provides the following main benefits:
- Reduced storage costs through lower consumption and cost per unit of storage.
- Higher service levels for storage consumers.
- Reduced risk of unprotected or underprotected data.
- Shared accountability between the service provider and user.
The best way to begin is to use a minimal methodology and a high-level approach to classifying data. This way, there's a clear balance between your level of effort and the return on investment. Merely classifying your data is an interesting exercise, but unless you take action, no benefit will be derived. The way data is stored will need to be changed; to do this without creating havoc, the organization has to agree that the current
Ask your business units what they need. The answers may illuminate inadequacies in your storage department, which will lead to infrastructure changes that reflect real business needs. With everyone's buy-in, there will be funds to pay for these changes. The following describes the data classification process, key elements, common pitfalls and new products that promise to make the effort less manual and more granular.
From requirements to classification
Data classification simply means mapping business requirements to your infrastructure. Data classification begins with a structured interview with the user, typically an application or project owner. Having a structured storage management organization, standard procedures and a standardized infrastructure are essential prerequisites to the long-term success of data classification. Don't worry if "firefight" or "chaos" are two words that best describe day-to-day operations. Go ahead and use data classification as a way to reach out to the user community and to get a handle on the business requirements behind the service requests.
Data classification efforts typically lack structure and rely on informal meetings between the storage staff and business units, interaction during application or server rollout processes, or just e-mail correspondence to obtain user requirements for storage services. Sometimes it boils down to hallway conversations or phone calls to your pals in operations to get the right service setup.
Often, a knowledge gap exists between the user and infrastructure team, so requirements end up mapping to the "high end" of the scale. It boils down to "What can you do for me?" instead of a "What are your requirements?" conversation. When this is the case, everyone's data is the most important and requires the most expensive, high-performance storage.
Approaching users with a structured set of questions (such as "How would you rate the performance of this application?" or "How mission-critical do you consider this application?") with specific ranges for answers provides consistency and allows business requirements to be mapped to various aspects of storage (see Mapping business requirements to storage).
Business requirements map to a storage service through key performance metrics. For instance, production recovery time metrics for an enterprise storage environment might range from two hours to a day. A complete inventory of business requirements facilitates the delivery of multiple storage service types. You should obtain or create a logical mapping of the user environment to your infrastructure. This usually means mapping projects or apps to your infrastructure (hosts, arrays, file servers, etc.). Once you have a complete collection of requirements and infrastructure meta data, the requirements gathering phase is done.
Next, align requirements to service offerings by developing a service catalogue, which is the menu of storage services. This living document describes the service provided and offers technical details for each standard offering within the storage service type. A catalogue item might be twice-weekly data replication to a remote site 200 miles away with the data stored on a tape library. The service catalogue provides a reference point for users, and is referenced in subsequent service level agreements (SLAs). As technology infrastructures change, so does the service catalogue.
Unique business requirements map to specific types of storage services. To provide a manageable and flexible storage service, the service must accept changes in business requirements and infrastructure. To accomplish this, storage services must be segregated into discreet storage service domains. Typical storage service domains include primary storage, disaster recovery, backup/recovery and archive. Within each storage service type, tiers of service are often developed once a representative sampling of business requirements is available from the data classification interview process.
Service catalogue development requires an iterative approach. Business requirements must be aligned to service offerings, and that alignment takes vision, work and refinement. The first pass will get you off the ground, but subsequent iterations and improvements to the service offerings will be required before a service catalogue is enterprise ready.
This was first published in July 2005