How do data classification and file indexing manage unstructured data?
Well, data classification itself is not really a technology, but rather a policy-driven practice of data categorization based on an understanding of the data on hand -- particularly unstructured data. Once you've learned what data you have and decided what data you need, you can then employ technologies (e.g., data classification tools) to assist in the actual movement, storage and disposal of that data based around the policies that you set.
By comparison, indexing will help searches and legal discovery issues, but there are limitations to indexing. Indexing will rely on a text string of some sort; it could be within the file, or in the case of audio or images, rely upon the file name. This implies a naming convention that needs to be consistent with your data and the way that you work with it. This is something that we're absolutely not used to doing, and filenames are typically left to the preferences of the particular individual. Policies can help establish consistent naming conventions, which in turn can help improve indexing capabilities. Of course, indexing doesn't address growth, but it can help you to understand what you have.
Email searches are also hindered by the lack of policies around the "Subject" line. People have a tendency to simply respond to a previous message as a means of starting a totally different message thread -- you can have 15 different conversations all with the same title because it was just convenient to reply. This can really muddy up subject-based searches. Again, policies and a little discipline in working with your data can have an important effect on data management and searchability.
Metadata can help to identify and categorize unstructured data, so you should absolutely pay attention to the file's metadata structure and content. This also relates to content-addressed storage (CAS), which relies on metadata to really organize and understand what files you have. Metadata can identify dates, size, permissions, title and other details. But again, searches rely on text strings, so metadata must be complete and consistent for it to be of any use.
Listen to the Unstructured data FAQ audiocast.
Go to the beginning of the Unstructured Data FAQ Guide.
27 Mar 2007