COMPANIES THAT IMPLEMENTED e-mail archiving made a smart move but, for some, the pre-archive days are coming back to haunt them. If they're involved in litigation, they may be called upon to produce e-mails that pre-date their archiving implementation. It's obviously beneficial to have all e-mail in the archive so that they can be easily searched and retrieved, so companies are now looking at ways to retroactively load old e-mails into their online archives.
"If it's a case of compliance capture, especially if there's litigation involved, virtually 100% of customers will do some sort of legacy conversion [of e-mails]," says John Swanteck, director of professional services at AXS-One Inc., Rutherford, NJ, which specializes in records management, including e-mail archiving software.
Some old e-mails can be imported into an archive simply by copying users' current in-boxes and .PST files. But to get at older e-mails, or for a more complete record, you need to go to backup tapes. "It's always a big job," says Swanteck. It took one AXS-One customer with 23,000 mailboxes six months to get e-mail data stored on tape into one 150TB archive.
For large legacy e-mail extraction and conversion jobs, many firms turn to third parties specializing in extracting data from tape. Renew Data Corp., an Austin, TX, company that provides an array of compliance services, last year processed seven petabytes (PB) of e-mail data, up from 4.4PB the year before, and expects to see similar growth this year. To that end, the company has 3.6PB of disk in its data center, and 500 processing CPUs deployed to its ActiveVault Evidence Management platform.
Getting tape-bound e-mails ready for an archive is a complex, multistep process, says Alan Brooks, Renew Data's VP of marketing. "The most important thing to remember is that this isn't just e-mail, it's potential evidence that will be used in court or a government inquiry," he says. As such, extracting e-mail from tape must be done in such a way that you maintain visual fidelity between systems and demonstrate a clear chain of custody. "You need to be able to show what happens to an e-mail as it's being moved from one system to the next," says Brooks.
The ActiveVault Evidence Management platform also performs other critical functions--sorting data into retention groups, converting between formats, de-duplicating e-mails and attachments, and creating a searchable index. All of that takes a long time. Brooks estimates it would take about a month to process 15TB of raw e-mail data.
On the bright side, once it's been processed, e-mail data stored on tape takes up a fraction of the capacity it previously required. "Sixty to 70 percent of raw e-mail data is exactly the same," says AXS-One's Swanteck. "It's just the same messages that didn't get deleted being backed up night after night." Single-instancing of attachments produces further capacity reduction. Ultimately, says Swanteck, the e-mails that end up in his firm's archive consume only between one-third and one-fifth the space they took up on tape media.
It should come as no surprise that this is expensive, and that not all companies choose to do legacy conversion into their archives. "What's more common is a partial job," says Swanteck.
Smaller companies probably won't do any legacy conversion at all. Three years ago, Scott Redding, IT manager at Avondale Partners LLC, a small brokerage firm in Nashville, TN, moved from AdvisorMail, a hosted e-mail archiving service from LiveOffice Corp., to an in-house e-mail archive from ZipLip, Santa Clara, CA. With only 85 mailboxes, the migration process involved writing the AdvisorMail archive to DVDs. "We thought about [migrating the data to ZipLip], but it really wasn't worth the cost to us," says Redding.