A step-by-step approach to data classification


This article can also be found in the Premium Editorial Download "Storage magazine: Five companies on their storage virtualization projects."

Download it now to read this article plus other related content.

Create a manageable process

Requires Free Membership to View

Ironically, most data classification projects fail because they're too large or overly complicated. While it's always a good idea to keep a holistic "big picture" in mind, trying to create a complete data classification scheme across the enterprise--and getting buy-in from a large number of business units--often turns out to be an overly ambitious endeavor. If this is your first project, or if you're new to this type of exercise, it may be appropriate to select a subset to classify: either a subset of applications in a large data center (a single business unit, for example); just the applications in a single, small data center; or a single filer or e-mail server. This smaller set of data is often easier to classify initially. Once the project has been completed, a "cookie cutter" process is created that can be applied in pieces across the organization. This is as true of a project aimed at storage tiering as it is for a project focused on file-level classification for archiving or compliance retention for e-mail.

Inappropriate protection levels
By taking a comprehensive approach across the enterprise at the application level, a data classification project can gather the information necessary to make informed decisions about the service levels needed for data kept on spinning disk. Why is this important? While the cost of physical disk per gigabyte may be decreasing, the incremental costs associated with providing the highest level of service continue to rise. Increasing regulation, litigation discovery requirements and user expectations are stretching budgets and staff capabilities.

As your application catalog and data have grown--or been acquired or migrated between different types of systems--much of your data is now most likely maintained at an inappropriate service level. Typically, 10% to 20% of a company's data is underprotected. This means the data isn't managed at the service level required by the business. More often than not, underprotected data can't be recovered quickly enough in the event of a partial or complete disaster. This represents a real risk to the business and its customers.

Perhaps a bigger problem, at least from an operating cost perspective, is overprotection; typically, 40% to 60% of an organization's data is overprotected. In most cases, this data is overreplicated remotely and locally, or backed up in a costly manner. While users aren't likely to complain about too much protection, it represents considerable overspending as data continues its explosive growth.

Without proper data classification, these expenditures will continue to grow, crowding out other IT initiatives. Classifying data, creating agreed-upon service-level agreements (SLAs) for data and changing the storage strategy will result in lower storage costs in the near and long term.

This was first published in August 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: