This article can also be found in the Premium Editorial Download "Storage magazine: Email storage lessons learned from Citigroup."
Download it now to read this article plus other related content.
|Best practices for e-mail storage|
Tuning e-mail storage
E-mail is one of the fastest growing applications in terms of storage capacity. With Sarbanes-Oxley and other regulations, the problems of managing e-mail have increased. (See "Regulations squeeze storage.") Because of the requirements for data retention, e-mail storage will have two components: primary (online) storage and secondary (archival) storage.
E-mail storage should be tuned for simple capacity scaling and long-term data retention. I/O performance and availability are secondary requirements. Most e-mail messages are accessed within the first few days they are received and then they are accessed infrequently--if ever again. Users may search their old messages for specific information, such as another person's e-mail address, but most old messages are never opened and read after several days.
This is partly due to the fact that e-mail replies and discussion threads often have the message text from a previous message copied in subsequent messages. E-mail storage can be described as a write-infrequently, read-rarely facility. It's lightly used, although that might not be obvious, based on the capacity problems that it causes. The characteristics of archived e-mails are different--the archive must be designed to last for many years, and in some cases, prove that e-mails haven't changed.
First and foremost, e-mail primary storage should be cheap, but scalable. Disk drives running in server cabinets are not the least bit scalable, although this is probably the most common type of e-mail storage used today. Virtualization in storage area networks (SANs) is a much better option. Virtualization's ability to change storage capacity on demand would dramatically lower the cost of primary e-mail storage.
E-mail iSCSI SAN
Another way to lower the cost of e-mail storage is to use cheaper SAN connection technology for e-mail servers. This sounds like a job for iSCSI, and it is. Of course, this may require creating a new iSCSI SAN or using iSCSI routers to provide iSCSI servers access to Fibre Channel (FC) storage subsystems.
Implement your iSCSI SAN independent of other Ethernet networks--leveraging an existing Ethernet network only invites problems and there's not much synergy to be had uniting legacy LANs and iSCSI SANs. Do not bother with TCP offload engines (TOE) because e-mail servers don't generate enough I/O traffic to justify the additional TOE cost.
If you plan to leverage an existing FC SAN, you will need an iSCSI router. Cisco Systems Inc., Crossroads Systems Inc., FalconStor Software Inc., McData Corp. and Sanrad Ltd. all offer iSCSI storage routers. Some of these products also provide the virtualization function that you need for optimal e-mail storage. You can put the virtualization product on the FC side of the router using any number of SAN virtualization products. I recommend putting the virtualization system on the FC side so it can be leveraged by other servers in the SAN. If implemented successfully, virtualization will increase your storage capacity levels more than any other storage technology on the market today.
Additionally, you should oversubscribe or multiplex server connections for e-mail servers. A single 2Gb storage subsystem port should be able to accommodate between 15 and 20 e-mail servers. Surprisingly, you might find that this number could be even higher. A fast LAN connection is necessary to migrate data transparently between storage subsystems.
This was first published in July 2004