This article can also be found in the Premium Editorial Download "Storage magazine: Big 3 backup apps adapt to disk."
Download it now to read this article plus other related content.
A more promising alternative to manual keywording or tagging is to use some type of data classification system, especially when dealing with unstructured data such as word processing and spreadsheet files. Products from companies such as Index Engines, Kazeon Systems Inc. and Njini Inc. allow users to create custom policies that are applied to help categorize files, generally at the time of file creation or during the backup process.
Data classification can do more to make searches more comprehensive than just adding custom attributes to the data. For example, out of the box, some apps will apply standard classifications to files to tag specific elements like Social Security numbers. Rather than having to devise search criteria that looks for a specific pattern or numerical sequence, the data is, in effect, pre-screened for having the attribute of containing a Social Security number. iLumin, for example, includes this capability in its Assentor Discovery product. iLumin calls this classification technique "smart indexing," as it allows the application to segregate files that include the Social Security numerical pattern so that subsequent searches will only have to plow through a subset of the data. Other patterns that may be specific to a particular business, such as part numbers, can also be included.
Positioned as an information lifecycle management tool, CommVault Inc.'s Data Classification Enabler module, part of its QiNetix
Beyond the basics
Meta data and index-based searches may suffice for many organizations, but litigation issues are likely to require more advanced search capabilities. Not surprisingly, the push for more sophisticated search functionality is being spearheaded by companies that have considerable experience with the discovery process, such as "highly litigious corporations," notes Michael Clark, managing director at EDDix LLC, a Washington, DC-based electronic data discovery research firm. He cites the tobacco, financial services, insurance, energy and telecommunications industries as examples.
For users, the goal is simple: Ensure that all material relevant to a litigation or regulatory case is found quickly. "Ultimately, you need to get beyond keyword search and Boolean operators," says Clark. Some of the newer, more advanced search tools address this issue, and "reduce the overall cost of a project," he adds.
This was first published in April 2006