Published: 13 Oct 2002
Hyperbole-loving analysts love to talk about how we're generating more data now than ever before. They say there will be more data generated in the next two years than in all of human history. And while exact numbers are hazy, anybody dealing with storage knows that corporate e-mail is fueling the data explosion fire.
Once a novelty, e-mail has become the backbone of modern business. Research firm IDC estimates corporate e-mail volumes have increased 29% annually, from 9.7 billion per day in 2000 to 16.2 billion this year and 20.9 billion daily messages in 2003. That's 7.6 trillion e-mails next year.
Do you know where you're going to store your share? More importantly, do you know how to keep them all available so users can still access them, auditors and lawyers can pore over them and executives can intercept them if necessary?
Of course, the problem of e-mail management is much different than simply backing up files and storing a tape on a shelf. Today's enterprise depends on that information to be available, and as users depend more and more on their e-mail, the volume of information that needs to be available is exploding.
A recent survey by Osterman Research, Black Diamond, WA, found the average Microsoft Exchange user's mailbox consumed 72MB of disk space. The survey found one in six users have a mailbox larger than 100MB, and the median number of e-mails sent and received is 53. Accommodating this growth, which will only get worse, has placed a significant burden on storage administrators.
E-mail management programs
Finally recognizing the beast that e-mail has become, a number of storage vendors have recently released products specifically designed to reduce e-mail storage requirements while automating the archiving, aging and retrieval of e-mails.
StorageTek, Louisville, CO, cast its lot in March with the release of Email Xcelerator, which integrates with Microsoft Exchange and Lotus Notes servers to provide policy-based e-mail management. The suite incorporates Email Archive Manager, which intercepts all sent and received e-mails and sends copies onto near-line backup, Email Content Manager, which automatically indexes those e-mails and lets users later search through the message store when they need to and Application Storage Manager, which moves archived e-mails between different storage types - for example, from slow IDE near-line storage to tape - and deletes them according to user-defined rules.
|Total e-mail messages sent and received per user, per day|
Bill Tolson, a senior industry solutions consultant with StorageTek, blames users' packrat mentality for the need to better manage e-mail requirements. "The vast majority of users are using their e-mail systems as filing systems," he says. "Most e-mail systems are already overloaded, and in most companies storage administrators will fix this by imposing a limit on users' e-mail boxes. But by imposing a mailbox limit, you're causing end users to make decisions as to what they delete."
One alternative to imposing limits, though, has been to give users larger size limits - or no limits at all - on their e-mail inboxes. But lack of size limits causes its own problems, as Ontario, Canada-based Dofasco, found out. The company, Canada's second largest steel manufacturer, found that its 7,000 e-mail users quickly consumed most of the approximately 400GB of space spread across the four post office servers supporting its Exchange 5.5 environment.
"We were out of space on the Exchange servers and either had to buy more servers or ask people to delete any old mail," says Terry Chisholm, Dofasco's divisional account representative for information systems. "We didn't get a lot of response from that, and most people claimed they needed to keep their mail. They were using Exchange as a file store, but we were over 90% full on our servers." Dofasco eventually implemented Ixos-eCONserver for Microsoft Exchange from Ixos Software AG, of Munich, Germany. eCONserver helped Dofasco set up an e-mail management policy in which messages were kept online for six months, then moved onto a Hewlett-Packard write once read many (WORM) jukebox.
The jukebox holds 56 WORM disks, and each disk stores around 9GB of data in a format suitable for legal discovery. The solution reduced Dofasco's storage requirements by 40%, and the jukebox now stores more than three years' worth of messages. New employees are given a 50MB space limit, and existing users are given around 100MB and the offer to build .PST files containing any messages over that limit.
Once the jukebox fills up - in around two more years, Chisholm estimates - WORM disks are cycled through the jukebox. If messages are needed that are more than five years old, old disks can be loaded into the jukebox.
The dangers of failing to maintain out-of-date information become clear when lawsuits require extensive discovery of archived e-mails. One such case, Linnen v A.H. Robbins Co., Inc., cost the Massachusetts-based company more than $1.1 million to search 823 backup tapes for e-mails related to 15 employees in question. The increased scrutiny on corporate accountability - particularly due to the requirements of the SEC's new Rule 17a-4 - has heightened the chance of having to manage a mass e-mail recovery.
Using automated e-mail management systems, recovery can be affected by using a Web interface to search through indexed archives and deliver results - and messages - instantly. Such indexes are maintained regardless of where messages end up, allowing storage administrators to move old messages onto cheaper, slower storage.
The vast majority of e-mails are never needed after 30 days. Using this as a guideline, you can reduce e-mail systems' use of online storage. Captured e-mails, for example, might be initially stored on fast disk arrays for seamless access, and then moved to slower IDE disks after 30 days. After 90 days, they could be moved onto tape, where they're still accessible to users. After the statutory seven years, the data can then be scrubbed using automatic e-mail aging features.
|Reducing storage demand in a customer's e-mail system|
Recognizing the importance of better e-mail management, vendors have been pushing out products specifically targeted at this area. Veritas, Mountain View, CA, offers hierarchical storage management (HSM)-like e-mail archiving through its NetBackup Storage Migrator (NSM) application. NSM, which only works with Exchange, lets storage administrators create policies that manage the retention of e-mails. Messages past a certain age, having certain keywords, containing attachments of a certain size or meeting other criteria are automatically moved off the Exchange server onto nearline or offline storage. On the client end, archived messages are indicated with a special icon, letting users retrieving know with a notification that it's being pulled out of the archive.
Veritas' NSM doesn't yet offer features for eliminating space consumed by redundant attachments, but still offers space savings that storage administrators should definitely look for it when choosing an e-mail management platform.
Legato Systems, also of Mountain View, CA, spent $403 million to buy OTG Software, Rockville, MD, whose DiskXtender, ApplicationXtender, and EmailXtender products help improve the archiving and management of data between storage media.
DiskXtender monitors usage of files and moves less-used files to nearline or offline storage. ApplicationXtender takes a similar approach for enterprise content, although this method doesn't work for e-mail, since multimegabyte e-mail files are constantly changing. This makes DiskXtender think they're frequently used and therefore leaves them alone.
EmailXtender, however, works at the individual e-mail level to intercept, archive, index and retrieve the messages one by one. More importantly, its Single Instance Storage (SIS) algorithm breaks e-mails into their individual elements, spotting cases where more than one message incorporates the same attachment. One instance of the attachment is stored in the system's database, and the e-mail's reference to that attachment is changed into a pointer that directs users to that single copy.
"The bulk of the storage impact is in e-mail messages and attachments," says Amena Ali, senior vice president of marketing for Legato's Management Solutions Group. "Our customers confirm the 80/20 rule: Generally speaking, 80% of data can be archived because most people don't use it. So we offer the ability to do more effective storage management, with storage driven by policies that are relevant to your applications. Think of it as HSM on steroids."
The effect can be dramatic. When Bob in marketing forwards his 400KB Excel spreadsheet to 15 of his co-workers, those 16 copies of the data would normally consume 6MB of disk space. Using SIS, the same messages are archived with just the one instance of the spreadsheet and pointers to it in the 15 copies. Users don't notice the difference, and total disk consumption is just 400KB. Extrapolate these savings to a real network with thousands of users, and it's clear just how much easier such systems can help ease the e-mail flood.
Breaking old habits
Changing the way e-mail is managed has taken time, particularly since e-mail vendors have built their systems on the assumption that the best way to keep pace with e-mail volumes is to add more disks. That approach was becoming a real problem for Peel Children's Aid (PCA), in Ontario, Canada, a statutory body charged with protecting the welfare of children in Canada's million-strong Peel region.
|Average number of e-mails stored|
The organization found its e-mail requirements were exploding after installing a Lotus Notes-based workflow and e-mail system in 1998. While the capabilities of Notes made it invaluable for the organization, Chris Harbour, manager of information services with PCA, says the system's demands on storage space quickly became an issue.
"Because of the service that we're in, we're accountable for keeping track of information," he says. "That was the reason for installing Notes: There were documents all over the place and managing them was difficult. We were looking at making better use of the servers and offloading a lot of that information."
PCA installed OTG's EmailXtender running against the Notes server, with 320GB of online Maxtor disk array available to the system. The setup archives the organization's e-mail, with the indexed database growing at around a gigabyte per month.
Having the e-mails so comprehensively archived, and so immediately accessible, has made life much easier for PCA. "In the past, a funding realignment meant we were going through workers very quickly and generating, deleting and regenerating e-mail databases," says Harbour. "We were losing messages that way. But cases are audited frequently, and the archiving has been a lifesaver a couple of times when we were trying to do an audit and the Notes pointers had lost track of where a thread had gone. The only way we could have found them was by going through the e-mails one by one, but by centralizing we have been able to manage them."
While proprietary products staked out the early market for intelligent e-mail management, mainstream e-mail vendors are coming to the party too. Just as many corporate manufacturing interests are being charged with cleaning up toxic byproducts of their business, e-mail vendors are building better e-mail storage management into coming system upgrades.
IBM subsidiary Lotus Software, of Cambridge, MA, has restructured its message structure, replication protocol and compression techniques to allow records to be managed with a much higher degree of granularity. Called the Streaming Replication and Attachments model, the new technique will be embodied in the upcoming Notes R6, according to Will Raabe, IBM's director of product introduction for Lotus products.
Notes R6 will include Single Copy Template (SCT), an SIS-like feature in which a single copy of each document template will be stored in the Notes database. Documents built using that template will be stored with a pointer to the template, rather than storing the entire template in the document itself.When applied to the existing e-mail databases of IBM test users, says Raabe, SCT immediately trimmed the size of those databases from an average 9MB to around 1.5MB.
Microsoft, which is readying a major upgrade of Exchange for release next year, has refrained from delving too far into e-mail storage management. Although it includes single instance storage features for reducing e-mail overhead and deleted items retention features for automatically deleting messages after a certain time, Microsoft has left more sophisticated HSM-like features to third parties, says Microsoft program manager Ken Ewert.
"I'm seeing that these solutions that [third party] vendors have are being used in very creative ways," he says. "Storage in general is undergoing a great radical change, since it's the one thing that people are finally trying to get a hold of. Single instance storage is very complex ... we haven't seen a lot of pressure until this year in this space."
Far more eager to make changes is SendMail Inc., Emeryville, CA, which offers an inline mail filtering API that lets IT staffers screen messages as they pass through the mail server. SendMail has previously partnered with UK firm ArchiveIt, which offers automated e-mail archiving using SendMail's hooks. But as the need for integrated e-mail management grows, Jeff Morris, product line manager with SendMail, concedes it is likely to become part of the e-mail server itself.
"For now we're doing it as a third party thing, but we are investigating becoming that third party," he says. "We're getting more and more granular about the types of messages we want to pass to storage, and doing things now to enhance control over mail as it's in transit. I'm quite interested in packaging our message server as a device with a single instance message store with digital signatures to ensure the identity of the e-mail's owner."
|Gartner's four steps to e-mail sanity|
Outsourcing e-mail storage
Storage administrators keen to make a total break may want to outsource their archiving to a company such as Iron Mountain, which has expanded its core business in physical records management to include archiving of e-mail and other digital content.
Using secure VPN connections, enterprises keep a connection open to Iron Mountain's massive data center, which includes 34TB of managed online storage and continues to grow every day.
Content sent to Iron Mountain is digitally signed to verify its source, then indexed and locked down within the company's virtual vaults, which are the latest addition to its existing stores of more than 200 million cubic feet of paper records.To access archived records, employees use a standard Web interface to enter their search.
Iron Mountain's largest customer currently sends the company 20,000 e-mails a day, but the service can receive any type of digital document from applications written to an Iron Mountain-provided API.
Peter Delle Donne, president of Iron Mountain's Digital Archives Division, believes digital archiving will become a $200 million business for the company within five years. Part of this appeal is the fact that Iron Mountain's systems are built around policies that ensure records are managed according to established best practices and the requirements of SEC Rule 17a-4. This includes enforcement of deletion policies that track down and destroy every instance of an archived document or e-mail when it's reached a certain age.
That's a big time saver for storage managers, who can redirect their e-mail archiving to Iron Mountain and more or less never have to think about the messages again.
Sensing the potential to be part of a lucrative market, some vendors have tried to simplify storage managers' jobs by offering e-mail archiving appliances designed to be plugged in, installed and left alone.
So far, customers have given such a concept a cool reception. Michael Schutte, chief technology officer and founder of archiving appliance vendor Rising Edge Technologies, Herndon, VA, concedes the company has "not been very successful in selling e-mail archive appliances. Although many of the customers we meet like the idea of keeping e-mail accessible, they don't necessarily want such a system for their business."
Therein lies the challenge facing vendors: Own up to the e-mail monster they've created, offer a workable solution and help storage administrators use the technology to make their own storage requirements far more predictable than they have been in the past. As e-mail continues along its preordained mission to bury corporate networks, it's clear this is the best way to meet archival requirements while keeping your storage sanity.