Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Big 3 backup apps adapt to disk."

Download it now to read this article plus other related content.

A keyword search will turn up all occurrences of a word or phrase, but more advanced search engines work more like the human mind. For example, some search engines use a technique called latent semantic indexing (LSI), which is based on a statistical system that reveals associations among words or phrases within files. For example, an LSI-enabled engine might discover during a search on the word "contract" that the phrase "binding agreement" appears with enough consistency that a logical association can be assumed. So the "contract" search may return files that don't even contain that word but are linked logically.

Shopping for search

Requires Free Membership to View

Here are some tips to keep in mind when evaluating the capabilities of archive and search products.
  • Ask the vendor how much space its full-text index requires; this is usually expressed as a percent of the size of the source data. And find out if the indexing process will considerably slow down the application's performance.
  • Know your search needs--consult with legal, compliance and human resource departments to determine what types of searches they're likely to require.
  • Ask the vendor about its roadmap for product development. For archive vendors, ask about new or more sophisticated search features that they plan to add. For search application vendors, find out what additional archive applications they'll support.
  • Test the user interface to determine if it's intuitive enough so that users in your company's business units will be comfortable using it. A Web-based interface is the easiest to implement and provides universal access.

Some search providers have already incorporated LSI into their applications. For example, San Francisco-based Recommind Inc. provides conceptual search capabilities in its MindServer Retrieval products. iLumin doesn't do conceptual searches per se, but includes a number of advanced search techniques such as natural language processing, which can recognize the usage differences that distinguish words with the same spelling--such as the name "Sue" and the verb "sue." Zantaz Inc.'s EAS Search currently provides proximity searches that return results for two or more words or phrases that appear near each other within a document. The firm says it will soon include conceptual searching as well as relevancy scoring of found data objects.

Other search techniques appearing in document management, archiving and search applications include fuzzy, phonic and stemming searches (see "Advanced search concepts"). Many of these have been used for some time by Internet search sites.

The key to enhancing search capabilities with these complex, compute-intensive algorithms is incorporating them without sacrificing the performance of the search process. To this end, companies like AXS-One suggest using more general search techniques on a dataset first to create a more manageable subset that can be used with the advanced search functions.

Regardless of the specific search functionality employed by each vendor, it's clear that the state of the art in searching is steadily advancing. "The tools have reached the point where they're as or more reliable than human beings," says Andy Cohen, senior counsel and director of global solutions practice lead for compliance at EMC Corp. Cohen is also a member of the Sedona Conference, a group of lawyers, jurists and other experts that offer publications on electronic document retention and management, among other topics.

This was first published in April 2006

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: