Guide to index and search software purchase considerations

Index and search products have emerged as essential enterprise tools. Indexing creates catalogs of file content based on the metadata applied to content as it is stored. Search combs through indexes, comparing criteria against metadata and presents results to the user. Learn what to look for when purchasing index and search software.

IT departments are challenged to store corporate data, but storing information is just the beginning. Documents, email messages, attachments and other files contained within an archive may need to be accessed years or decades into the future -- long after the files' creators have left the company or forgotten filenames, message dates, or other attributes. And even though archived data may have been unused for years, it must be located quickly when needed to address critical business issues like discovery requests or regulatory compliance audits. Consequently, index and search products have emerged as essential enterprise tools. Indexing creates catalogs of file content based on the metadata applied to content as it is stored. Search combs through indexes, comparing criteria against metadata and presents results to the user.

But index/search tools are not the same, and can vary dramatically in metadata support, object and performance scalability, search presentation, archive system integration, and other key characteristics. Understand the unique storage needs and objectives of your business, and test prospective tools thoroughly before making a purchase decision. Now that you've reviewed the essential issues involved in any archive product purchase, this segment will cover the considerations that are most specific to index/search tool purchases. After that, you'll find a series of product specifications that will help make on-the-spot comparisons between vendors like Abrevity, CommVault, Index Engines, Quest Software and others.

More info on data retrieval strategies
Visit the Data Retrieval Strategies All-In-One Research Guide for data retrieval guidelines from backups, archives, email and insights into document management software.
Consider the support for metadata. Metadata highlights the attributes of each file or email message. It can include basic attributes such as creator, creation date, file size, and key words. It can also include many more detailed attributes such as attending physican, patient name, diagnosis key words, prescribed medications or entirely different spheres of details that support specific industries or business types. When selecting an index/search tool, identify the basic (default) metadata created for each record, and understand the advanced metadata that can be created or customized.

This is perhaps the single most important issue in product selection, because it is the metadata that a search engine typically operates against. Metadata must be relevant to your organization and provide enough unique distinction between files to make future searches practical. If metadata is too general, search results will be too broad , returning a huge number of potential hits, making it difficult, if not impossible, to locate the exact files. For example, email archives may include a "sender" name in the metadata, but searching against a sender alone may return thousands of results. Understand how the metadata capabilities of a potential index/search tool can be searched collectively to yield more manageable results.

Evaluate the product's scalability. Archives can become huge over time, swelling the size of the metadata index. When evaluating an index/search product, understand the maximum number of objects or files that the product can support on a per server or per platform basis. For example, Abrevity claims to support 20 TB to 30 TB per FileData Classifier server, while Index Engines claims to index up to 100 million files or email messages on each appliance. Archives are always growing, so be sure to compare product scalability against your anticipated future archive size, rather than the size of your current archive. Also consider the role of clustering in capacity scaling. Clustering allows multiple index/search servers, or dedicated appliances, to work cooperatively in larger environments.

Evaluate index sizes and storage requirements. Remember that indexes can contain a significant amount of metadata and will demand some amount of space for storage. Understand the storage requirements for your index and be sure that an adequate amount of storage space is available for anticipated growth. For many organizations, this additional storage is a not a significant issue, but smaller organizations or businesses operating at high storage utilization levels may get blindsided by unforeseen storage needs.

Evaluate search processing speed and performance. Search tools must often process millions and even billions of index entries. In many cases, response times are just a few seconds because only the index is being searched. Contextual searches that look inside documents and email for keywords can take considerably longer. Any prepurchase evaluation should include an examination of the search speed, ensuring that even large archives can yield meaningful results within an acceptable timeframe. Also consider the role of clustering in performance scaling, allowing multiple index/search servers , or dedicated appliances, to work cooperatively in larger environments.

Test the search features thoroughly. Not all search tools are created equal, and not all tools will perform the same way in your organization. Put the prospective search tool through its paces and perform a comprehensive series of search tests -- see that the tool will actually find mail, documents or other files based on your queries. For example, try locating all Word memos related to a recent company project or initiative. The search tests should return useful and relevant results based on common criteria, such as keywords, sender and file dates, and new metadata attributes that you have developed.

Consider integration with archive systems. Index/search tools should support and integrate with the archive storage platform, possibly more than one storage platform depending on your data center. Tools may look for specific storage devices, such as an IBM DR550, an EMC Centera or a Bridgehead HT Filestore. Other tools may provide more general support, interoperating with any storage device that presents an NTFS partition or any Windows storage device. Virtualized storage can also be problematic for some index/search tools, so virtualized storage environments should pay particular attention to index/search tool compatibility. Actual testing can help to ensure compatibility with your current storage infrastructure.

Consider integration with policy manager platforms. Some index/search tools may include policy manager features to help manage archive data retention and deletion. Otherwise, the index/search product should provide some level of interaction with external policy managers. Interoperability willl help to synchronize the two management activities, and any archival data that is moved or deleted by a policy manager should update any file location references in the index. If not, a search may include indexed references to that can no longer be found.

Consider hardware requirements and the role of server virtualization. Deployment of the index/search product can vary depending on the choice of a standalone appliance or a software product. Standalone index/search appliances may require a minimum amount of set up in the data center. Index/search software will typically require a server that must meet or exceed minimum requirements. If server virtualization is deployed in the environment, verify that the software can operate on a virtual server and allocate enough virtual server resources to meet peak processing requirements.

The index/search software specifications page in this chapter covers the following products.

  • Abrevity; FileData Classifier
  • Autonomy Corp.; Intelligent Data Operating Layer (IDOL) Server
  • CommVault; Simpana
  • EMC Corp.; Infoscape
  • Hewlett-Packard Co.; Integrated Archive Platform (formerly RISS)
  • Index Engines Inc.; Tape Engine
  • Kazeon Systems Inc.; Version 3 of Information Access Platform and Information Server
  • Lucid8; Digiscope
  • Quest Software Inc.; Archive Manager
  • MetaLINCS; MetaLINCS Enterprise E-Discovery Software V4.0

    Return to the beginning

  • Dig Deeper on Long-term archiving

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.