Data classification: Getting started

Mapping business requirements to storage

Requires Free Membership to View

Closing the deal
A mature data classification model includes the SLA and a cost model; however, they're not core to the discussion of data classification. The SLA is the key point of interaction between IT and the user. It represents a contract, outlining and quantifying how and when the service will be provided. A cost model facilitates service offering development and the occasional "realignment" of user requirements should a service cost exceed the organization's ability to pay the bill. In reality, chargeback isn't a realistic goal for all organizations; however, user accountability can still be achieved through a combination of cost modeling, measurement and reporting.

Data classification creates a dialogue and a process between users and IT. Activity is likely to span IT groups, business units and disparate user groups, risking political mayhem. Often, data classification creates a first-time dialogue or crosses burned bridges where relationships went wrong long ago. Even if you have a high-level executive sponsorship, internal politics are likely to occur during the process. Identify the type of executive sponsor with enough clout to demand accountability and cooperation within the organization.

Chicken or egg?
The industry often views the building of tiered services as a "chicken or egg" process. It's difficult to standardize on a finite number of tiers when starting with disparate business requirements. Conversely, it's also difficult to present finite tiers without first gathering data. To sidestep this potential problem, work iteratively to define service tiers. Build an initial set of service offerings that reflect the requirements and then classify sets of user data against these offerings. The initial classification can then be used to conduct follow-up dialogues with the business units to confirm the mapping of data requirements to storage offerings.

Introducing vast amounts of procedural change creates more pain than benefit. For data classification, start with a pilot effort that includes a representative sampling of your environment from a business and technology point of view. Don't try to tackle all storage services at one time. Pick one domain, such as primary storage, and focus the service offering development there. Trying to accomplish everything at once will open a Pandora's box of unforeseen problems.

In the early stages of the data classification project, don't collect a lot of meta data about a file. There's only so much meta data an organization can manage, and the benefits of going to deeper levels drop off precipitously after a certain point. However, for those with a vision of more granular meta data management and the business plan to prove its value, there are several new products that automate some of the above processes.

These new products are designed to help storage admins struggling to apply highly granular policy requirements to the Wild West of unstructured data. This is especially true for large file server farms supporting businesses with rigorous compliance requirements. Data classification vendors are taking file-level storage resource management (SRM) concepts to another level by adding context-based meta data to apply policy-based actions on how the file is stored. For example, if a file contains a social security number, the file will be moved to a highly secured storage device.

As these products evolve and take on new, more sophisticated ways to determine and classify a file's contents, the more rule-based actions such as copy, movement and security for enterprise unstructured data can be applied. Applying security changes, such as narrowing the permissions list for a file, can be achieved natively through most file systems, while encryption requires the integration of third-party tools. Data copy can be facilitated via APIs to various products, ranging from backup applications to write once, read many (WORM) disk storage devices.

Data classification products
A variety of tools now tout data classification features. Many vendors offer point solutions for e-mail archiving, compliance and file system management. Most products are focused on unstructured data. The tools typically take a bottom-up approach to collecting vast amounts of meta data and address individual issues like compliance searches of data or hierarchical storage management (HSM)-style file movement. Here are some of the companies focused on data classification:

  • Abrevity Inc., San Jose, CA, provides point solutions for compliance and service-level policy enforcement by providing bottom-up meta data. Its FileBase server and client software generate meta data similar to an SRM tool, but with more depth. The company claims FileBase is compatible with any data mover technology and uses tagging techniques to track classified and migrated data.
  • Kazeon Systems Inc., Mountain View, CA, will shortly release a full beta of its file-based searching and reporting software. It claims its tools make storage more "content aware." The software catalogs assets and tags them via a meta data repository that allows basic policies to be set and run on an ad hoc or scheduled basis. Kazeon plans to extend this rudimentary data movement capability and build on its content- and pattern-based searching in later releases throughout the year.
  • Scentric Inc., a startup in Duluth, GA, plans to address the three major categories of data--files, messages and databases--with equal aplomb. At the heart of its strategy is making applications become "self-describing" and abstracting low levels of complexity into meaningful information. Scentric believes its toolset will be usable for policy makers and storage engineers.
  • StoredIQ Corp., an Austin, TX-based startup formerly known as Deepfile, has retooled its suite of HSM and SRM software to help users address risk issues through data classification and policy automation. Focused mainly on compliance and security for storage, it's still in super-secret stealth mode. StoredIQ claims to leverage a complex searching capability to illuminate file-based meta data using interactive dialogs with the user, as well as a sophisticated "lexicon" that captures multiple parameters for more robust classification of information.

The linchpins of a successful data classification project are detailed planning and meaningful dialogue with users about business requirements. The idea is to match different levels of storage with users' requirements. Make no mistake: Defining policies to map requirements to service tiers is arduous and time-consuming work, but it can be achieved through a sound methodology, an iterative development approach and a rapidly evolving set of tools in the marketplace. The long-term benefits of data classification include cost reduction, risk mitigation and QoS improvements.

This was first published in July 2005

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: