Archiving unstructured data

The problem of indexing and archiving an organization's unstructured data is often swept under the rug. A typical response is to throw more hardware at the problem, but just adding more capacity to house data while ignoring its content no longer suffices. Regulators and legal professionals increasingly need to search and scrutinize unstructured data such as e-mail and file repositories, so companies must find ways to automate and simplify the process of identifying and inspecting archived files and e-mail messages. Add-ons to core enterprise content management (ECM) software, as well as specialized e-mail and file archiving programs, address this large pool of unstructured data, but they differ in how to process, discover, index and archive it. There's no complete solution available, so you'll likely have to make tradeoffs.

At a minimum, unstructured data archiving products must handle large volumes of data and meet compliance requirements in a cost-effective manner. Products from CommVault Systems Inc. and Zantaz Inc. minimize the time it takes to find particular e-mails from the archive pool. But these products generally lack the ability to build meaningful relationships among file contents, provide in-depth content analysis or create workflow processes--all features that ECM software from companies like EMC Corp., Hummingbird Ltd. and Open Text Corp. provide.

Archiving software is generally easier to implement and better tailored for the high volume, low-cost

Requires Free Membership to View

nature of some unstructured data environments, while ECM applications offer more options to manage, classify and create relationships among data components. To decide which approach best suits your needs, you should understand how these programs manage unstructured data. Issues that should be considered include:

  • How is data discovery, indexing and archiving handled?
  • What type of meta data does the product create?
  • What type of content analysis is done by the product?
  • What default policies or categories are included?
  • Is e-mail and file meta data indexed in the same database?
  • How difficult is the application to install and maintain?
  • Are additional products or modules required to deliver the desired level of functionality?
E-mail/File archiving software & Enterprise content management
Click here for a comparison table about e-mail/file archiving software & Enterprise content management software (PDF).

Discovery and indexing
ZipLip Inc.'s Unified Email Archival Suite, an e-mail and file archiving product, accomplishes the discovery of e-mail by tapping the native journaling features in Exchange and Lotus. By using the applications' journals, ZipLip's product can capture the information without using its own agent, as well as intercept outgoing or incoming e-mail without the sender's or recipient's knowledge. ZipLip built the product's server component to run on a grid architecture because analysis and searching can be CPU and memory intensive. The architecture also provides a scalable, easy and low-cost way to grow. Stephen Chan, ZipLip's co-founder and vice president of business development, claims this design allows ZipLip to scan and analyze incoming or outgoing e-mail and create the meta data specified by policies with little or no interruption to the e-mail process. To expedite searching, ZipLip puts the index on a file server that's separate from the database and executes queries for data across the index, not the database.

Conversely, ECM apps are less useful for rapidly processing large amounts of unstructured data than for thoroughly analyzing, creating and storing meta data.

Most ECM products don't use journaling features on the messaging server or integrate with Exchange or Domino; archival and retrieval tasks are executed using the Exchange API (MAPI) and the Notes API over the Domino NRPC protocol. The downside of this approach is that these tasks will only run at scheduled times. Because the ECM software API call looks like a client to the mail server, the mail server needs to allocate processing time to manage and handle the requests. If the ECM software asks for 1,000 e-mail messages, it's probably not a problem; but if it asks for copies of all of the e-mails since the last request, it will likely slow the messaging server's performance.

Unlike its competitors, CommVault's QiNetix offers modules that integrate with applications or make API calls. For instance, the firm's DataArchiver for Exchange typifies the tight integration one normally finds in file archiving software because it places an agent on the Exchange server that copies the server's messages. However, other modules, such as CommVault's DataMigrator for Exchange and DataMigrator for Centera, make API calls. CommVault stores data from all sources in a central database, the common technology engine (CTE) that acts as a global catalog and indexes across the entire line of QiNetix products.

This was first published in August 2005

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: