How do data classification and file indexing manage unstructured data?

Well, data classification itself is not really a technology, but rather a policy-driven practice of data categorization based on an understanding of the data on hand...

Well, data classification itself is not really a technology, but rather a policy-driven practice of data categorization...

based on an understanding of the data on hand -- particularly unstructured data. Once you've learned what data you have and decided what data you need, you can then employ technologies (e.g., data classification tools) to assist in the actual movement, storage and disposal of that data based around the policies that you set.

By comparison, indexing will help searches and legal discovery issues, but there are limitations to indexing. Indexing will rely on a text string of some sort; it could be within the file, or in the case of audio or images, rely upon the file name. This implies a naming convention that needs to be consistent with your data and the way that you work with it. This is something that we're absolutely not used to doing, and filenames are typically left to the preferences of the particular individual. Policies can help establish consistent naming conventions, which in turn can help improve indexing capabilities. Of course, indexing doesn't address growth, but it can help you to understand what you have.

Email searches are also hindered by the lack of policies around the "Subject" line. People have a tendency to simply respond to a previous message as a means of starting a totally different message thread -- you can have 15 different conversations all with the same title because it was just convenient to reply. This can really muddy up subject-based searches. Again, policies and a little discipline in working with your data can have an important effect on data management and searchability.

Metadata can help to identify and categorize unstructured data, so you should absolutely pay attention to the file's metadata structure and content. This also relates to content-addressed storage (CAS), which relies on metadata to really organize and understand what files you have. Metadata can identify dates, size, permissions, title and other details. But again, searches rely on text strings, so metadata must be complete and consistent for it to be of any use.

Listen to the Unstructured data FAQ audiocast.

Go to the beginning of the Unstructured Data FAQ Guide.


This was first published in March 2007

Dig Deeper on Secure data storage



Find more PRO+ content and other member only offers, here.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:



  • Flash technologies remain hot in 2016, experts predict

    Experts predict solid-state technology will remain hot in 2016, leading to the demise of high-speed hard disk drives, as ...

  • Tintri VMstore T5000

    Like all of its VM-aware storage systems, Tintri’s first all-flash array -- the Tintri VMstore T5000 -- allows admins to bypass ...

  • SolidFire SF9605

    The high-capacity SolidFire SF9605 uses SolidFire’s Element OS 8 (Oxygen) to deliver new enterprise features such as synchronous ...