Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Index and search software: Product snapshots and specifications

The product snapshots in this chapter highlight key specifications for a cross section of index and search software products.

Indexing creates catalogs of file content based on the metadata applied to content as it is stored. Search combs through indexes, comparing criteria against metadata and presenting results to the user. But index/search tools are not the same and can differ substantially in metadata support, object and performance scalability, search presentation, archive system integration and other key characteristics. It's important to have a clear understanding of the unique storage needs and objectives of your business before selecting an index/search tool. The product snapshots in this chapter highlight key specifications for a cross section of index and search software products. The following products were selected based on input from industry analysts and SearchStorage.com editors, and specifications are current as of March 2008.

The following specifications have been provided by vendors and are periodically updated. Vendors are welcome to submit their updates and new product specifications to SearchStorage.com editors.

Go to the first product snapshot, or select the desired product below:

  • Abrevity; FileData Classifier
  • Autonomy Corp.; Intelligent Data Operating Layer (IDOL) Server
  • CommVault; Simpana
  • EMC Corp.; Infoscape
  • Hewlett-Packard Co.; Integrated Archive Platform (formerly RISS)
  • Index Engines Inc.; Tape Engine
  • Kazeon Systems Inc.; Version 3 of Information Access Platform and Information Server
  • Lucid8; Digiscope
  • Quest Software Inc.; Archive Manager
  • MetaLINCS; MetaLINCS Enterprise E-Discovery Software V4.0

    Return to the beginning

    Product Snapshot #1

    Abrevity; FileData Classifier

    Data Types: Supports unstructured (Files), semi-structured (Emails), and structured (Databases) file types.
    Search/Index Speed: Not provided
    Search Criteria: Searches on complete metadata, keyword and pattern search capabilities.
    Retention/Deletion Features: Set and update retention date/time based on an event or policy parameters; Delete or shred after retention expiration.
    Storage Reduction Features: Duplicate file analysis and deletion of duplicates, contraband file analysis and deletion.
    Reporting and Logging Features: Comprehensive reporting engine with ability to create custom reports; Verbose logging and auditing.
    Metadata Features: Full metadata indexing, querying, and policy engine via custom distributed database.
    Scalability: 20-30TB per FileData Classifier -- up to 50 FileData Classifiers managed by a single FileData Manager.
    Archiving Platform Integration: Hitachi Content Archive Platform (HCAP), and any archive product supporting CIFS, NFS or HTTP.
    Requirements: Dual-core processor, 2GB RAM, 2 Internal HD's, and Windows 2003 Server.
    Availability: Currently available
    Base Cost: $30,000 for first FileData Classifier w/ 5TB of management. Additional TB's or Classifier node is $3,000.
    Detailed Specs: http://www.abrevity.com/software_fdc/
    Vendor URL: http://www.abrevity.com

    Go to beginning

    Product Snapshot #2

    Autonomy Corp.; Intelligent Data Operating Layer (IDOL) Server

    Data Types: Can index and search more than 1000 file formats, including PST, ZIP, PDF, word documents, HTML, XML, Oracle, Lotus Notes, images, and even voice and video content.
    Search/Index Speed: A single instance of IDOL can accurately index in excess of 60 GB/hour on commodity hardware, and execute over 200 queries per second on a single server. IDOL has been deployed to sustain over 3000 queries per second, with sub-second response times on a single machine with 2 CPUs when used against 30 million pieces of content, while querying the entire index for relevant information.
    Search Criteria: Search by source, keyword (all Boolean operators supported), concept, phrase, custodian, metadata field (e.g. reviewer, priority, date), tag, concept cluster, metadata field and cluster (hybrid), example ("same as" or "similar"), language, encoding, soundex (sounds like), parametric search, fuzzy search, character search, hyphenated search, and many more. Users may restrict by date, relevance, database, alphabetic range, numeric range, etc. Relevancy can be easily modified to bias certain fields, terms or user rating. Users can sort and combine results by arbitrary fields, enhance results by examining capitalization, apply complex post-filtering, or define an explicit importance of terms or fields.
    Retention/Deletion Features: Offers a comprehensive rules-based automatic capture and retention management toolset. Provides granular retention management -- the user is able to specify using keywords, attributes (e.g. date, author), example documents and/or concept to determine the length of retention period. Retention/disposition rules can be set based on meaning as well. Built-in workflow engines allow IDOL to apply both keyword and conceptual policies to documents during the collection/ingestion process. IDOL integrates with established enterprise systems to automate the operations that both "hold" and "delete" records pertinent to litigation processes as well as according to the best practices in the industry.
    Storage Reduction Features: A variety of data reduction features include deduplication, near-deduplication, dupe blocking, topical exclusion, and categorization.
    Reporting and Logging Features: Reporting tools display event logging of a document down to its field altering. Analysis of numerous statistics and data reporting can be drilled down, exported, or viewed for additional examination. Reports can be segmented based on metadata, user and contextual data.
    Metadata Features: Non-Quantum indexing capabilities enable data to be indexed from live systems without changing the metadata such as Last Accessed or Last Modified, preserving the complete integrity of data.Re-indexing is very quick (immediate commit), so the changed data is automatically searchable. On the case of initial data ingestion, IDOL provides flexible and open indexing APIs that not only import content, but also preserve and intelligently process all metadata and complex metadata relationships. As mentioned above, IDOL automatically identifies and extracts data that lends itself to key fields.
    Scalability: In general, a single IDOL engine can support up to 30 million documents on 32-bit platforms and over 250 million on 64-bit platforms. The system has unlimited ability to scale the crawling and acquisition of content through massive parallel processing. IDOL has been deployed to index ChoicePoint's 10 billion documents.
    Archiving Platform Integration: Integrates with all major archiving platforms (e.g. Symantec, EMC), but in particular, IDOL serves as the information processing platform for Digital Safe (Autonomy ZANTAZ's market-leading SaaS archiving product) and EAS (Autonomy ZANTAZ's market-leading software archiving product), as well as EAS On-Demand.
    Requirements: Cross-platform solution supports all major operating systems, including Windows, Linux and Solaris. As a benchmark, it is recommended that a single instance of IDOL run a dual processor server with at least 4 GB of memory.
    Availability: Currently available
    Base Cost: Minimum price is $100K and the average price is $350K.
    Detailed Specs: Not provided
    Vendor URL: www.autonomy.com

    Go to beginning

    Product Snapshot #3

    CommVault; Simpana

    Data Types: Content Index and search is available for 377+ File Types
    Search/Index Speed: CommVault searches against in excess of 20 million records, return in less than 1 second.
    Search Criteria: Search criteria includes content search for body data of files/email, name, ownership for files and email, To:, From:, CC:, BCC:, Subject:, Date:, etc.
    Retention/Deletion Features: Retention and deletion are policy based.
    Storage Reduction Features: Single Instance Storage (SIS) is a capability within the Simpana Software Suite.
    Reporting and Logging Features: Reporting and logging are standard features within Simpana Software Suite.
    Metadata Features: All items are indexed to the source where the data was originally collected via backup/archive policies -- in addition to all the source metadata in the items folders/dates/users. The items/jobs are also indexed to the storage policies where they are stored, which gives users the retention rules and media sources (potentially across multiple tiers). Content indexing is based on full-text and phrase indexing so users get a much richer search pool vs. restrictive key-words only.
    Scalability: There are unlimited numbers of files that can be archived or searched.
    Archiving Platform Integration: Simpana software is an archiving platform. It also integrates with hardware platforms like HDS HCAP, EMC Centerra.
    Requirements: One Windows Server and storage to run the archive starter package -- configurations can grow form there.
    Availability: Currently available
    Base Cost: Price starts at $8500 for 150 users.
    Detailed Specs: http://documentation.commvault.com/commvault/release_7_0_0/books_online_ 1/english_us/html/da.html
    Vendor URL: https://www.commvault.com

    Go to beginning

    Product Snapshot #4

    EMC Corp.; Infoscape

    Product details not available at this time.

    Detailed Specs: http://www.emc.com/collateral/software/data-sheet/h2341_infoscape_ds.pdf
    Vendor URL: www.emc.com

    Go to beginning

    Product Snapshot #5

    Hewlett-Packard Co.; Integrated Archive Platform (formerly RISS)

    Product details not available at this time.

    Detailed Specs: http://h18006.www1.hp.com/products/storageworks/riss/index.html?jumpid=reg_R1002_USEN
    Vendor URL: www.hp.com

    Go to beginning

    Product Snapshot #6

    Index Engines Inc.; Tape/LAN/SAN Engine

    Data Types: All common unstructured file formats and Microsoft Exchange email (Lotus Notes support is coming soon). The indexing platform indexes the contents of backup formats including TSM, BackupExec, ArcServe, NetBackup and Networker.
    Search/Index Speed: Search speed is sub-second across hundreds of millions of files. Indexing occurs at the speed of tape, typically about 1 GB/minute for the Tape Engine and at wire speed for high speed networks.
    Search Criteria: Comprehensive boolean search on file and email content as well as full metadata search.
    Retention/Deletion Features: The index has a retention period that can be defined.
    Storage Reduction Features: A unique document signature is generated for every file and email that is indexed. This allows for dynamic deduplication of data upon query. API's are available allowing for the generation of policy scripts that leverage the query engine in order to reduce data.
    Reporting and Logging Features: All indexing and query activity is logged on the appliance and available for export. Standard summary reports, detailing data consumption are also available.
    Metadata Features: A full tagging capability is available in order to tag query results for future reference. Data can be searched and retrieved according to these predefined tags.
    Scalability: The entry level 1U appliance can index 100M files and/or email. Clustered and custom configurations are available for larger environments.
    Archiving Platform Integration: None currently
    Requirements: Index Engines is a self contained appliance. It can plug into a SAN, LAN or to a tape drive/library. Available connections include SCSI, fibre, or network.
    Availability: Currently available
    Base Cost: The list price begins at $75,000
    Detailed Specs: Not provided
    Vendor URL: www.indexengines.com

    Go to beginning

    Product Snapshot #7

    Kazeon Systems Inc.; Version 3 of Information Access Platform and Information Server

    Data Types: Any NFS or CIFS file system can be indexed, grouped and searched. Kazeon supports over 370 document/file types with connectors to live Microsoft Exchange servers, PST, OST, MSG and EML message files, SMTP-based Internet email journals and dedicated archive devices and applications such as Network Appliance NearStore (with SnapLock), EMC Centera, Hitachi Content Archive Platform (HCAP), Plasmon UDO Archive Appliance and Symantec Enterprise Vault.
    Search/Index Speed: Deep crawl performance of 47 MB/sec when scanning for content (168 GB/hour, 4 TB per day); Sub-second response times for searches; Sustaining 1,500 files per second for metadata and 700 files per second with search indexing.
    Search Criteria: Search interface allows users to search for specific phrases, dates, email header information, user groups, locations, comments and more. Advanced search capabilities include keyword, wildcard, fuzzy, proximity, concept and Boolean searches. Metadata and content within documents and emails, including attachments, are searchable. Using the Kazeon search interface, reviewers are able to search for relevant information and open documents and email directly from the search interface in the native application to conduct a more thorough review of a document or email. Tags such as "relevant," "non-relevant" or "privileged" can be quickly applied to the document or email.
    Retention/Deletion Features: Users can architect defensible preservation protocols using either "staging areas" for litigation holds, initiating in-place legal holds or by moving targeted data to a compliant retention system and setting the retention period. The interface allows users to take action on data. Users can easily and safely remove duplicate and unnecessary information from the collected data set directly from the report and search interface.
    Storage Reduction Features: Can de-duplicate during the processing and culling phase by enabling 'Smart Tagging' to eliminate identical, non-responsive documents in bulk without having to individually tag and eliminate them. Users can quickly run summary and detailed reports against the information, quickly identifying duplicates and unnecessary documents and emails based on reports and search including: advanced, fuzzy, proximity, custodian, date range, keyword and concept searches utilizing multiple search terms.
    Reporting and Logging Features: Version 3 includes over 35 pre-built report templates, and it is easy to create custom reports. For example: custodian, duplicate file and date range reports can be leveraged by paralegals and litigation support specialists conducting eDiscovery. Access pattern reports help compliance officers understand which documents and emails contain non-public information (NPI). Other new features include: Pivot-style (matrix) reports/Summary report drill-down/ Selectable report columns/ Actions from duplicate report results.
    Metadata Features: Ability to sustain 1,500 files per second for metadata and 700 files per second with search indexing; Creation of meta-data as a result of tagging; Key meta-data is always preserved to prevent spoliation.
    Scalability: A single Information Server appliance or Information Server software instance can index, classify and search from 6-10 TBs of ESI on average. Customers can implement clusters to derive near-linear scalability.
    Archiving Platform Integration: Extensible technology also allows companies to leverage their existing investments to secure ESI to compliant storage devices and archive applications such as Symantec Enterprise Vault, Plasmon UDO Archive, NetApp Snaplock and EMC Centera.
    Requirements: The Information Server is a complete system that is delivered either as Linux-based software or a pre-packaged appliance.
    Availability: Currently available
    Base Cost: $40,000 list price
    Detailed Specs: http://www.kazeon.com/products2/index.php
    Vendor URL: www.kazeon.com

    Go to beginning

    Product Snapshot #8

    Quest Software Inc.; Archive Manager

    Data Types: Archive Manager can search anything that comes with an iFIlter that can be installed on the Archive Manager server.
    Search/Index Speed: Less than one second per document.
    Search Criteria: Users can search their messages using keywords, by date, by mailbox or even by a specific domain name. Searches can also be saved for future reference. Custom mailboxes enable messages to be shared using different criteria (e.g. relating to a specific customer or domain name) and individual messages can be "stored" in custom mailboxes for research or investigative purposes. Archive Manager's attachment search limits a search to the content of attachments and enables users to select a specific attachment and then identify which messages contain that attachment.
    Retention/Deletion Features: A flexible policy engine provides granular control over what e-mail data remains in the archive. Scheduling provides execution of retention policies based on a schedule. Auditing of retention activities means all retention activity is logged to the Archive Manager UI and to the event log. Retention engine can operate in a "dry run" mode where policies will be executed but no deletions will occur -- this can be used for testing retention policies before they are applied.
    Storage Reduction Features: Provides single-instance storage for messages and attachments, separating messages from attachments. Messages and attachments are deduplicated by treating them both as objects and running a MD5 hash algorithm across both then maintaining lists of associations in the database.
    Reporting and Logging Features: All system access and querying is logged and auditable with reporting on such things as Searches Performed, Messages Viewed, and Questionable Access (messages viewed by an administrator that "belong" to that individual). All activities are logged, and comprehensive reports are available to permitted users showing all activity.
    Metadata Features: Archive Manager gathers metadata from Exchange the moment it archives an email and stores that in a SQL database with the email.
    Scalability: One Archive Manager server can support 10,000 - 15,000 mailboxes depending on the unique characteristics of the environment. The approach to scaling is to add Archive Manager instances. Some customers report using Archive Manager to manage more than 3 TB of data.
    Archiving Platform Integration: Provides tiered storage support through partnerships with vendors including Bridgehead, EMC and IBM. Archive Manager can support a virtualized storage infrastructure. It supports the following storage devices: IBM DR550, EMC Centera, Bridgehead HT Filestore, or storage devices that present an NTFS partition.
    Requirements: Compatible with Microsoft Windows 2003 and SP1; Microsoft SQL Server 2000 (SP4 or later) or SQL Server 2005 and SP1; Microsoft .NET Framework 1.1 SP1 and Microsoft Internet Information Services 6.0. Archive Manager works with Microsoft Exchange 5.5, 2000, 2003 and 2007; Microsoft Live Communications Server 2005 (SP1); GroupWise 6.5.4 and above; and certain SMTP e-mail servers. On the client side, Archive Manager works with Microsoft Internet Explorer 5.5 or later; Mozilla Firefox 1.5; Microsoft Outlook 2003, XP; Microsoft Entourage 2004; Microsoft Windows Mobile 5; and BlackBerry 4.0.
    Availability: Currently available
    Base Cost: Priced at $40 USD
    Detailed Specs: https://www.quest.com/archive-manager/
    Vendor URL: www.quest.com

    Go to beginning

    Product Snapshot #9

    Lucid8; Digiscope

    Data Types: Supports any message, contact, calendar, task or journal item including attachments stored in either Exchange EDB's or PST's.
    Search/Index Speed: Performance depends on complexity of query and amount of data to be searched.
    Search Criteria: Use either full text or expressions against meta data, Exchange items and attachements.
    Retention/Deletion Features: Allows searching and recovery from the deleted items container.
    Storage Reduction Features: Allows physical de-duplication of search results coming from multiple historical EDB or PST backup copies.
    Reporting and Logging Features: Not provided
    Metadata Features: Does not add metadata, but allows searching all Exchange metadata properties.
    Scalability: Only limited by memory on the machine which runs DigiScope.
    Archiving Platform Integration: DigiScope allows to export PST's which can be ingested by archives.
    Requirements: Windows only
    Availability: Currently available
    Base Cost: Starting at $759 for a single store license (USD)
    Detailed Specs: http://www.lucid8.com/press/Lucid8-DigiScope-Large.pdf
    Vendor URL: www.lucid8.com

    Go to beginning

    Product Snapshot #10

    MetaLINCS; Enterprise E-Discovery Software V4.0

    Data Types: Can extract body content from over 250 file types including MS Office, email from MS Exchange, Lotus Notes and Unix systems. MetaLINCS can also search the content of any file type that is converted into an industry standard load file.
    Search/Index Speed: Single step processing phase culls, de-duplicates, extracts content, converts to HTML and builds a sophisticated analytic index at 5+ GB/hour while conducting a search. The precise data rate depends on the hardware and type of data being processed.
    Search Criteria: Over 350 specific metadata fields can be searched. The search operations include: Boolean, near search, fuzzy search, stemming in multiple languages, phrases, negative queries, and range searches. Product derives numerous searchable metadata fields during processing that can be searched such as the parent-child relationship with attachments, custodian and all attorney work applied.
    Retention/Deletion Features: Any case and all of its associated work can be fully archived or deleted on demand by an authorized administrator.
    Storage Reduction Features: Uses single instance storage to reduce the volume of data stored while retaining meta-data for each instance processed. This provides the greatest efficiency without losing any data.
    Reporting and Logging Features: User activity such as logins, searches and data retrieval is logged in the database and in a standard text log file. All of the core components of the system have a dedicated system log used to track system operations. Standard industry reports are provided for analyzing the log data in the database and custom reports can be created. There are also system performance counters that provide real-time data about the health and status of key operations. A scripting module allows scripts written in multiple languages, including javascript, to be run which can create reports based on the information in the search and analysis index.
    Metadata Features: The product adds two types of metadata, document metadata found at processing/ingestion time and work product added by attorneys and investigators during e-discovery or investigations. Both types of metadata are indexed in real time for searching. Some attorney work is also added to the database for reporting purposes.
    Scalability: The system has no inherent limitations other than what is imposed by the hardware and operating system.
    Archiving Platform Integration: Open architecture allows it to do network level integration with any archive platform that has an API. Integration is accomplished by writing an archive specific connector using MetaLINCS SDK. File level integration is possible with any archive platform that can export data as files. MetaLINCS sells a connector that provides integration with Symantec Enterprise Vault.
    Requirements: A common configuration consists of two servers, typically with dual quad core processors and at least 8 GB of RAM. Configurations can consist of up to 5 servers in a cluster using network storage. All servers should support 64 bit software.
    Availability: Currently available
    Base Cost: Starts at under $50K for a complete solution
    Detailed Specs: http://www.metalincs.com/products/ediscovery-software-index.html
    Vendor URL: http://www.metalincs.com

    Go to beginning

  • Dig Deeper on Long-term archiving

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.