Managing and protecting all enterprise data


E-mail: It's worse than you think

According to a 2003 study by Meta Group, 80% of businesspeople say e-mail is more essential than the telephone. However, storage managers are struggling to keep pace with the growth in message volume and retention requirements. however.

It's official: Nothing beats e-mail. According to a 2003 study by Meta Group, 80% of businesspeople say e-mail is more valuable to them than the telephone. And just like phone networks, e-mail systems experience enormous strains behind the scenes that may cause problems--but users don't want to hear about it. They just go berserk when their e-mail systems don't work.

No Tampering, Please
Many people assume that Rule 17 (specifically subsection 17a-4) refers to write once, read many (WORM) optical discs. After all, when you burn it on WORM optical media, nobody can change it short of destroying it. But in a SEC Interpretation published on May 7, 2003, the Commission made it clear once and for all that "17a-4 does not require that a particular type of technology or method be used to achieve the non-rewriteable and non-erasable requirement." That is, you can use any "electronic storage system that prevents the overwriting, erasing or otherwise altering of a record during its required retention period through the use of integrated hardware and software control codes."
This ruling instantly legitimized a number of magnetic WORM solutions already on the market. These give you the speed of a near-line magnetic disk array, plus special logic and software to make tampering more or less impossible. The best known is EMC's Centera Content Addressed Storage system. Introduced last year, this WORM magnetic disk array that recently got a makeover in a Compliance Edition that claims to meet the letter of the SEC Interpretation. Network Appliance offers similar solutions via its SnapLock software, which can be used in conjunction with several of the company's NearStore arrays.
WORM takes other forms. StorageTek offers a WORM tape solution, as does Sony. And JVC specializes in DVD-R solutions, such as the MC-8000 series. But with rewritable media, the secret is in software that encodes data to prevent tampering--along with policies and procedures to make the process verifiable. Bill Peldzus, a consultant with GlassHouse Technologies, in Framingham, MA, says that you need to have "some type of manual certification that the way you are storing this and the way it's presented is the fact its original form. So there are manual processes along with technological processes that basically verify the integrity of the data."

The stresses on these systems are formidable. Corporate misbehavior has resulted in complex new regulations for e-mail retention--and has forced IT managers to take existing regulations seriously. Meanwhile, spam is flooding enterprises, not just home user mailboxes. Like a Malthusian curve, e-mail storage requirements just keep climbing, with each enterprise user now eating somewhere between 5MB and 10MB in e-mail per day, a volume expected to double by 2006, according to IDC.

"Everyone is having trouble managing the growth in e-mail data stores," says Carolyn DiCenzo, a VP of research for Gartner Inc. "And in some industries, e-mails are also important records of transactions. So people don't just read e-mails and delete them anymore; they actually keep them in their mailboxes for much longer." Larger e-mail data stores make backup and recovery more difficult, says DiCenzo--especially since people now expect nothing less than 24/7 availability.

Unlike large enterprise transaction databases, e-mail databases can't increase without incurring major performance penalties. So archiving old e-mail is essential; it represents a different process than basic backup protection, which can be complex in itself, depending on how coarse (coarse meaning it involves the entire data store) or granular (occurring mailbox by mailbox) the restore needs to be.

Ratcheting requirements even higher, some enterprises now recognize that e-mail stores serve as de facto repositories for intellectual property. As a result, they're beginning to treat messages as just another data type in a document or content management system, making quick search and retrieval a requirement.

To slay the e-mail hydra--and eliminate concerns over backup and recovery, scalability, archiving, spam control and content management--vendors have forged new weapons, and enterprises are establishing new lines of attack. All involve software or hardware solutions used in conjunction with the major e-mail servers: Lotus Domino and Microsoft Exchange. Of course, the servers have also evolved. Both Exchange 2000 and Domino Server Version 6 will store only one copy of an attachment, even if that file is attached to multiple messages. But a complete e-mail management solution, especially for companies with special regulatory needs, requires additional software. And in almost every case, IT must also shoulder the burden of persuading users to change their e-mail habits.

The monster that ate my server
In countless organizations, IT managers express frustration that something as mundane as e-mail seems to take so much of their time. "I spend at least two hours every day dealing with Exchange in some way," says Mike Wolf, manager of technical services for LifeCare Assurance, in Woodland Hills, CA, a fast-growing insurance administration company that specializes in long-term care services. "Our information store went down a week ago, and I was here until 4:00 in the morning."

Wolf's overriding concern is the same basic bugbear that everyone is dealing with: a ballooning e-mail store. Though the company has 80,000 to 90,000 agents in the field, Wolf has only 350 Exchange 2000 mailboxes for employees of the company's corporate offices to worry about. Yet the 60GB of disk Wolf installed two years ago--figuring it would last a long time--is already hitting the wall. "We're at the 22GB [mark] now," he says. "In Exchange, you need [to] double your disk space, so at 30GB we'll be over our limit."

Although 30GB may sound like a pittance, the issue isn't simply space, but also backup and recovery overhead. "Our current recovery strategy is a nightly full tape backup on the data store and an individual mailbox backup for individual message restores,"says Wolf. "The reason I do it that way is because this is an insurance administration company, and one e-mail can save or or ruin the day." If one of LifeCare's VPs wants an e-mail from two years ago, says Wolf, trying to dig that out of a full data store backup is a non-starter, so he restores from the "bricklayer" backup, as the mailbox-level backup is called. But even that is time consuming.

Laying down the law with size limits on end-user mailboxes is one obvious way to reduce such overhead. But here's where the Platonic ideals of best practices wither in the cold light of day. "We look at a lot of case studies on successful Exchange implementations and the policies and procedures for e-mail," says Wolf. "But we haven't found anything that fits LifeCare yet because people work directly out of their inbox."

Wolf ruefully admits that a culture of using the e-mail system for what amounts to document management stems from a lack of foresight on IT's part, but he feels that it's too late for draconian cutbacks on mailbox sizes. "If we were creating a brand-new environment, we would make it that way from the start, and the user would never know the difference," he says. "The problem I've got now is several hundred users who do know the difference. We let it be a free-for-all, and it's going to continue to be that way because they're used to it being that way. So our only choice is to add storage to our Exchange server and rethink our Exchange environment."

That rethinking has coalesced into a plan for next year to move to Exchange 2003 and to set up a local ActiveX cluster sharing the IP SAN device for the information store and the data for Exchange, says Wolf, who eventually hopes to have a three-node cluster with one remote in a redundant hot site facility. "By doing it that way, you get the redundancy you're looking for and you get the storage capacity you're looking for, because with IP, SAN storage is almost unlimited. You can resize those partitions almost on the fly."

The solution Wolf is considering is a storage concentrator from IP storage vendor StoneFly Networks, in San Diego, CA, which circumvents the usual objection to e-mail SANs--too much expense for an application that's less than mission-critical--by using Gigabit Ethernet and iSCSI-based storage (see "Is an e-mail SAN right for you?"). But there's another effort Wolf realizes he needs to make: Soon, he plans to roll out an archiving program that requires users to retain old messages in archive folders or lose those messages for good.

Software for Message Management
Low-level software that supports write once, read many (WORM) verifiability is one thing; software that helps you manage, archive, and/or sample electronic communications is quite another. Here's a sampling of packages to consider:

Legato EmailXtender. This is actually a family of products that includes EmailXtender itself for centralized administration; EmailArchive for policy-based archiving with Exchange; and EmailXaminer for compliance management and sampling. EMC purchased Legato in July.

Veritas Edition for Microsoft Exchange 2000. Basically, this package helps you manage Exchange storage so the mail server is easier to administer and configure across a variety of storage systems. It aids in recovery and performance optimization as well.

Tumbleweed Messaging Management System. This is a platform, rather than a product, with several different components that sit outside the mailstream and examine message content--and perform specific actions based on regulations. Antispam and antivirus components are part of the suite.

iLumin Assentor Enterprise. This software suite provides industrial-strength message management and archiving, with special emphasis on the needs of the financial services industry. Assentor Compliance inspects messages and manages mail; Assentor Discovery is for retrieving and reviewing archived messages.

KVS Enterprise Vault. Originating in the U.K., this highly regarded suite of e-mail archiving, retention and retrieval software integrates with the journaling features of Exchange to store and retrieve e-mails for compliance purposes.

FaceTime Communications IM Director. Yes, instant messages are considered electronic communications, too. IM Director adds logging, archiving, and security to instant messaging, essentially turning instant messages into e-mail messages that can be archived and retrieved.

Where the regulations roam
While some may be reluctant to impose rules, others have rules thrust upon them. The small Manhattan brokerage firm Abel/Noser must deal with the modern version of Securities and Exchange Commission rules first adopted in 1934. According to Abel/Noser compliance officer Ravi Jethmal, the SEC and NASD "require broker dealers to monitor a certain percentage of all ingoing, outgoing and interoffice e-mails and require the broker dealer to maintain and store all e-mails on non-corruptible electronic media (such as WORM) for a minimum of three years."

Ryan Farley saw his first write once, read many (WORM) drive when he came to work at Abel/Noser as a senior systems engineer over a year ago. "Right now, from where we started with the system, we have just about every e-mail that has come into the company," says Farley. About six months before Farley started at Abel/Noser, the company deployed Legato's EmailXtender and EmailXaminer. The former is a policy-based system that automatically collects, organizes, stores and retrieves e-mail messages and attachments. The latter periodically dips into the mail stream and samples e-mails according to their content, in accordance with SEC and NASD rules.

One of the challenges presented by these regulations, says Gartner's DiCenzo, is the stipulation that organizations have "the information in an active archive. You must be able to recover it in hours, not days or weeks. This is why the push to an intermediate disk archive has become popular. It's not just a secondary storage media; it's a program that allows for access of data directly to the archive in a query kind of way."

That's exactly what Abel/Noser has implemented. "Basically, an additional mailbox is created on the Exchange server, and for every single message that comes in, a copy of it is also forwarded to that mailbox," says Farley. "From there, it goes into a Windows message queue on the Xtender server. And from there, it's put into container files, put into a SQL database, random samples are made for the compliance side of it, and then when the container file gets large enough, it's burnt onto the optical disk." In addition, says Farley, "We run complete backups every day onto LTO tape off our file server, which holds all the e-mail files, all the PSTs."

Legato has several de facto safeguards against tampering during the few weeks it takes for e-mail to be stored on magnetic disk. "The SQL logs will show what stuff was sampled," says Farley. "And if anyone looked through our samples, they would be able to see chunks of dates missing if anything was tampered with. The program goes in between the SQL database and the mail server, and there's no way to interface with it directly, at least no way that would be conceivable with Legato's format."

Farley finds that retrieving messages using Xtender is easy--both a Web-based query tool and an Outlook plug-in are provided--and that he can retrieve an e-mail from the WORM archive in less than a minute. On the other hand, Abel/Noser has only a year's worth of e-mails on the system, and he's not sure how it will scale over time. "It's possible we might have to start cycling WORM disks out once we maintain [them] three years," he says. He admits that cycling disks would make the restore procedure "ridiculous," but he also thinks it's pretty unlikely that he would need to restore an e-mail more than a year old.

Although Farley has been happy overall with the system, his experience hasn't been problem free. Recently, the system experienced a hard drive problem that required a complete restore. "The only thing that I don't like about Xtender is, when problems have come up, the support hasn't been what I've wanted it to be," says Farley. "There are basically three components to a good product: how easy it is to use, how well it takes care of itself and what happens when it breaks. This product is easy to use and fairly stable, but the times I've had problems with it, I've found myself disappointed in some of the support."

The scourge of spam
Not even the most solicitous vendor support can stop the vexing nuisance that is spam. An accurate estimate of the percentage of spam among business e-mails is hard to come by--some say it's as high as 50%--but everyone agrees that the scourge of junk e-mail continues to rise, with no end in sight. In an era of scarce IT resources, it's infuriating to think that as much as half of the cost of an e-mail infrastructure is being eaten by parasites.

At a San Francisco public relations firm that uses Lotus Domino, an IT director who prefers to remain anonymous (I'll call him "George") confirms that, by the end of 2002, spam had begun to hover at the 50% level for many of his 100 or so users. In a PR company where reps bill by the hour, time spent sifting through junk e-mail is just "cash out the window," says George. And not only were local personnel and system resources being squandered, but so were backup time and space at an off-site co-lo site, where the company nightly uploads a snapshot of the entire network.

Is an E-mail SAN Right for You?
Most people who want high availability will implement a SAN of some sort. But according to Bill Huber, CTO of storage area network (SAN) hardware company StoneFly, e-mail seldom makes the grade. "Fibre Channel costs too much. You'd use it for SQL, but for e-mail? It's kind of mission-critical--but do I really want to spend $150,000 to get started?"

Instead, Huber and others advocate deploying an IP-based SAN. If you want to keep e-mail online and available, ultimately you don't have that many attractive alternatives. You can simply add storage to the e-mail server as messages grow, but at a certain point, you'll need to upgrade the server--or add an external storage array. Eventually, you'll end up with multiple servers and multiple arrays, which is not only tough to manage but also wasteful, because the storage isn't shared.

Deploying an IP-based e-mail SAN gives e-mail what it craves: more-or-less infinite storage space. And it can be done relatively cheaply.

Huber suggests that you might not even need fancy iSCSI acceleration hardware to enjoy good performance. "In my experience, Exchange is really good at cache management. Its disk I/O load on the back end is cached and buffered and you can run quite a lot of users--I've run benchmarks with 2,000 simulated users--and the I/O load is really pretty light."

Huber says that most people just set up a separate switch to create a modest, dedicated e-mail SAN, rather than tapping into an existing Ethernet network. But even if you were to do that, security wouldn't be much of an issue. "Ethernet is point to point and addressed, so it's not like hubs, where the traffic is just leaking out into the ether," he says.

Like everything else, the cost of hardware related to SANs is headed south. E-mail is worth putting on its own SAN, but only if you take advantage of the plummeting prices to put one together on the cheap.

Revulsion reached a new level when George discovered that spammers were simply locking onto his domain and blasting away, on several occasions nailing his server with thousands of e-mails per day from one spammer. "The tipping point," he says, "came one morning when I was deleting maybe 600 or 700 dead e-mails from the server that had arrived since last night." On further investigation, he says, he discovered that people who had been on the staff for a while and had older e-mail addresses were getting approximately 75 spam e-mails per day.

Yet a quandary presented itself: How do you deploy an effective spam filter for a business that relies so heavily on open communication? Thanks to a tip from a contractor who helps maintain George's servers, he discovered Purepath, an antispam product for Lotus mail servers from Canadian startup Team Technologies. George installed the product on his server just last March.

"Purepath has been great for us because it doesn't use just one technology. It's primarily blacklist-based, but allows you to select any number of blacklists you want," he says. You can even block entire countries: "I turned off North and South Korea," he says, "just because we do no business with those countries, and there's a huge percentage of spam that comes out of them." But it's the various rules and heuristics that you can add on top of blacklists that make Purepath "incredibly smart," says George.

The problem, of course, is the risk of it being too protective and blocking e-mails that should get through. "I've been cautious about implementing things as we go along," says George, who says Purepath's options are "tricked out" to the degree that along with merely bouncing obvious spam, you can review suspect e-mails and sort them into various folders according to specific criteria. But that's exactly what he doesn't want to do--spin cycles reviewing spam. So George has dialed back the rules. As the system is configured now, about five spam e-mails per day get through to each user and, as far as he can determine, there are no false positives.

George is first to admit that no solution is perfect, including his own. "Spam may escalate to the point where the way we use e-mail might be completely different," he says, "because of all the different technologies that are out there to combat spam." Not long from now, George speculates, something like a return receipt may have to become part of standard e-mail procedure. "People just assume today that when they press 'send,' their e-mail will miraculously appear at its destination. We may not be able to take it for granted anymore." Reports George has generated since deploying Purepath bear witness to a truly alarming spam crisis: a fourfold increase in the number of junk e-mails trying to penetrate his defenses in only three months.

The content is in the mail
There's the junk--and then there's the stuff with hidden value. Studies show that much of the average enterprise's intellectual capital is bound up in e-mail messages, though attempts to quantify e-mail's importance amount to pure speculation. The point is this: Everybody hangs on to e-mails that contain unique information critical to their job, and is almost never shared with others.

"It's unstructured data that really hasn't been tapped into," says Joe Fisher, director of product management for Tumbleweed Communications, a Redwood City, CA, e-mail management software company. "If you think about it, a lot of the communications are happening between the organization and its customer base. I mean, here you have communication and conversations with your most valuable asset, your customers. And there might be some really good nuggets in there, whether it's for figuring out how to tailor marketing programs or whether it's how to improve support programs or add new products. You can let other systems around e-mail query the archive and pull information out, whether it's for business process automation or for streamlining marketing programs."

For companies struggling to make nightly e-mail backups more efficient, meta-tagging e-mails and hooking them into a content or document management system sounds pretty far-fetched. It requires the creation of a reasonably high-performance archival repository where old e-mails never die, and it requires you to meta-tag messages based on their content, using software such as Tumbleweed's Messaging Management System or TrueArc, which was acquired late last year by content management software vendor Documentum. Another example: Lotus Domino e-mails archived with IBM's CommonStore can be accessed via IBM's Content Manager software.

Now that e-mail has scared everyone with its potential liability, it's only a matter of time before added investment in e-mail management builds business-side pressure for IT to squeeze more value out of messages. Mike Gundling, VP of product management for e-mail management software vendor iLumin, thinks that giving e-mail that kind of status is a way of recognizing what's already happening. Says Gundling: "Everything is flying around in e-mail."

Article 14 of 19

Dig Deeper on Long-term archiving

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All