This article can also be found in the Premium Editorial Download "Storage magazine: Big 3 backup apps adapt to disk."
Download it now to read this article plus other related content.
A keyword search will turn up all occurrences of a word or phrase, but more advanced search engines work more like the human mind. For example, some search engines use a technique called latent semantic indexing (LSI), which is based on a statistical system that reveals associations among words or phrases within files. For example, an LSI-enabled engine might discover during a search on the word "contract" that the phrase "binding agreement" appears with enough consistency that a logical association can be assumed. So the "contract" search may return files that don't even contain that word but are linked logically.
|Shopping for search|
Here are some tips to keep in mind when evaluating the capabilities of archive and search products.
Some search providers have already incorporated LSI into their applications. For example, San Francisco-based Recommind Inc. provides conceptual search capabilities in its MindServer Retrieval products. iLumin doesn't do conceptual searches per se, but includes a number of advanced search techniques such as natural language processing, which can recognize the usage differences that distinguish words with the same spelling--such as the name "Sue" and the verb "sue." Zantaz Inc.'s EAS Search currently provides proximity searches that return results for two or more words or phrases that appear near each other within a document. The firm says it will soon include conceptual searching as well as relevancy scoring of found data objects.
Other search techniques appearing in document management, archiving and search applications include fuzzy, phonic and stemming searches (see "Advanced search concepts"). Many of these have been used for some time by Internet search sites.
The key to enhancing search capabilities with these complex, compute-intensive algorithms is incorporating them without sacrificing the performance of the search process. To this end, companies like AXS-One suggest using more general search techniques on a dataset first to create a more manageable subset that can be used with the advanced search functions.
Regardless of the specific search functionality employed by each vendor, it's clear that the state of the art in searching is steadily advancing. "The tools have reached the point where they're as or more reliable than human beings," says Andy Cohen, senior counsel and director of global solutions practice lead for compliance at EMC Corp. Cohen is also a member of the Sedona Conference, a group of lawyers, jurists and other experts that offer publications on electronic document retention and management, among other topics.
This was first published in April 2006