A step-by-step approach to data classification


This article can also be found in the Premium Editorial Download "Storage magazine: Five companies on their storage virtualization projects."

Download it now to read this article plus other related content.

Sample data classification metrics by data type

Requires Free Membership to View

What's data classification?
Any data classification project should begin with a comprehensive inventory of your company's applications and their associated data sets, followed by classification into groups with common requirements. These requirements may include traditional IT metrics such as recovery time objectives, recovery point objectives, backup schedules, maintenance windows, etc. Successful data classification will also involve more business-centric metrics such as business criticality, revenue and productivity impact over time, business-continuity objectives, application performance, application criticality, data retention periods and security requirements.

While this may seem like an academic exercise or an overly theoretical approach, this process is critical for successful storage projects. Most organizations have some type of classification scheme in place, and there are often multiple, conflicting ones--application "tiers," disaster recovery (DR) levels, business-continuity tiers, etc.--each with its own unique purpose. Typically, these schemes were defined some time ago and haven't been kept up to date. These "slices" of classification are rarely sufficient for making storage decisions, or aren't complete enough to set true service-level objectives.

Different data sets have different business requirements that, taken together, define a service level. For example, data residing in enterprise resource planning (ERP) systems will usually require the highest level of service to ensure that it can be accessed quickly, restored in the event of disaster, protected from theft and available in more than one location. Common sense tells us that ERP test data doesn't require the same level of protection and recoverability. Why protect and manage these two different types of data at the same level when their needs are different? Data classification is the process of creating formally defined service levels for different apps, and sorting the application data sets into these defined service levels.

We must also consider the time dimension. Over time, business requirements for data can change, with data assigned to different service levels and migrated to different storage tiers to reduce costs. But the first and most important step is to classify the active data sets into the right service levels, and to place them on appropriate storage tiers when they're created. For most organizations, this initial data classification --along with correct placement on tiered storage-- will deliver the biggest and fastest returns on investment and effort (see "Will my storage environment become more complicated?"). Once these basic elements are in place, an ILM strategy can provide significant incremental gains over the longer term.

This was first published in August 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: