But index/search tools are not the same, and can vary dramatically in metadata support, object and performance scalability, search presentation, archive system integration, and other key characteristics. Understand the unique storage needs and objectives of your business, and test prospective tools thoroughly before making a purchase decision. Now that you've reviewed the essential issues involved in any archive product purchase, this segment will cover the considerations that are most specific to index/search tool purchases. After that, you'll find a series of product specifications that will help make on-the-spot comparisons between vendors like Abrevity, CommVault, Index Engines, Quest Software and others.
This is perhaps the single most important issue in product selection, because it is the metadata that a search engine typically operates against. Metadata must be relevant to your organization and provide enough unique distinction between files to make future searches practical. If metadata is too general, search results will be too broad , returning a huge number of potential hits, making it difficult, if not impossible, to locate the exact files. For example, email archives may include a "sender" name in the metadata, but searching against a sender alone may return thousands of results. Understand how the metadata capabilities of a potential index/search tool can be searched collectively to yield more manageable results.
Evaluate the product's scalability. Archives can become huge over time, swelling the size of the metadata index. When evaluating an index/search product, understand the maximum number of objects or files that the product can support on a per server or per platform basis. For example, Abrevity claims to support 20 TB to 30 TB per FileData Classifier server, while Index Engines claims to index up to 100 million files or email messages on each appliance. Archives are always growing, so be sure to compare product scalability against your anticipated future archive size, rather than the size of your current archive. Also consider the role of clustering in capacity scaling. Clustering allows multiple index/search servers, or dedicated appliances, to work cooperatively in larger environments.
Evaluate index sizes and storage requirements. Remember that indexes can contain a significant amount of metadata and will demand some amount of space for storage. Understand the storage requirements for your index and be sure that an adequate amount of storage space is available for anticipated growth. For many organizations, this additional storage is a not a significant issue, but smaller organizations or businesses operating at high storage utilization levels may get blindsided by unforeseen storage needs.
Evaluate search processing speed and performance. Search tools must often process millions and even billions of index entries. In many cases, response times are just a few seconds because only the index is being searched. Contextual searches that look inside documents and email for keywords can take considerably longer. Any prepurchase evaluation should include an examination of the search speed, ensuring that even large archives can yield meaningful results within an acceptable timeframe. Also consider the role of clustering in performance scaling, allowing multiple index/search servers , or dedicated appliances, to work cooperatively in larger environments.
Test the search features thoroughly. Not all search tools are created equal, and not all tools will perform the same way in your organization. Put the prospective search tool through its paces and perform a comprehensive series of search tests -- see that the tool will actually find mail, documents or other files based on your queries. For example, try locating all Word memos related to a recent company project or initiative. The search tests should return useful and relevant results based on common criteria, such as keywords, sender and file dates, and new metadata attributes that you have developed.
Consider integration with archive systems. Index/search tools should support and integrate with the archive storage platform, possibly more than one storage platform depending on your data center. Tools may look for specific storage devices, such as an IBM DR550, an EMC Centera or a Bridgehead HT Filestore. Other tools may provide more general support, interoperating with any storage device that presents an NTFS partition or any Windows storage device. Virtualized storage can also be problematic for some index/search tools, so virtualized storage environments should pay particular attention to index/search tool compatibility. Actual testing can help to ensure compatibility with your current storage infrastructure.
Consider integration with policy manager platforms. Some index/search tools may include policy manager features to help manage archive data retention and deletion. Otherwise, the index/search product should provide some level of interaction with external policy managers. Interoperability willl help to synchronize the two management activities, and any archival data that is moved or deleted by a policy manager should update any file location references in the index. If not, a search may include indexed references to that can no longer be found.
Consider hardware requirements and the role of server virtualization. Deployment of the index/search product can vary depending on the choice of a standalone appliance or a software product. Standalone index/search appliances may require a minimum amount of set up in the data center. Index/search software will typically require a server that must meet or exceed minimum requirements. If server virtualization is deployed in the environment, verify that the software can operate on a virtual server and allocate enough virtual server resources to meet peak processing requirements.
The index/search software specifications page in this chapter covers the following products.
This was first published in April 2008