Home > Storage Magazine > Features > Creating a large e-mail system
EMAIL THIS LICENSING & REPRINTS
Storage Magazine

  CURRENT ISSUE  

  FEATURES  

  TOOLS, TRENDS & ANALYSIS  

  COLUMNS  

  ARCHIVES  

  SUBSCRIBE/RENEW  
 

Creating a large e-mail system
by Jim Booth
Issue: Jul 2003
printer-friendly
licensing & reprints
< PREV PAGE   |   1  |   2  |   NEXT PAGE  >
A few years ago, I helped an ISP redesign its e-mail storage to strengthen the application's availability, scalability and recovery. Many of the steps we took were to transform a monolithic storage design into a more flexible and scalable system, one that provided higher levels of availability and performance that may be transferable to other large e-mail storage environments.

At the time, the ISP was growing rapidly, adding nearly 10,000 new accounts a day. To keep the system running, the staff was constantly adding more hardware and servers to handle the increase in database traffic and more storage. It was obvious the current architecture would soon hit a wall as to the number of users it could service. The redesign needed to address these three critical areas:

Lessons learned
Although most people aren't creating an ISP-size mail system, the e-mail project's goals and design considerations may overlap with many companies' plans for their next storage system.
Plan for explosive growth: The old monolithic e-mail quickly became a nightmare to manage on a day-to-day basis. When designing a new storage system, plan for more growth than you expect.
Divide the implementation into manageable chunks: By breaking the rollout of the e-mail system into manageable phases, we were able to keep the user community and our client happy.

Availability. This problem was the most significant of the three. E-mail couldn't go down for any reason. Nor could an e-mail message be corrupted without doing harm to the ISP's reputation. The new design needed database failover, dynamic multipathing and multiple mirror sets.

Scalability. Scalability was the greatest challenge, because the system was servicing thousands of users. The servers, storage and software had to be staged, cut over and reclaimed, and this process couldn't affect the users.

Recovery. Management also wanted to add new features that would allow fast recovery of a corrupted database and off-site archival of low-usage accounts. This required multiple mirrors, scripts to trigger synchronization points and a large common repository that could be mirrored remotely for the highest level of recovery.

Monolithic approach
The e-mail system stores all security and account information for the individual e-mail users in a database, and all messages in a file systems. Originally, database and file system data were on the same physical drives, channels and subsystems without regard for the distinct performance characteristics of each data type. The storage subsystem was trashing between random, small block and sequential, large block data.

The server layout wasn't optimized for performance, either. Large servers handled the massive amount of database traffic while also streaming message objects to users. Each Sun 6000 server ran the e-mail application and an Oracle database. While these machines have decent I/O and expandability, the database was using a lot of system resources. Message streaming lowered the CPU usage, but consumed the majority of I/O bandwidth. At first glance, this doesn't seem like a bad solution. The two applications are using different resources within the same server. Oracle was using mainly CPU and memory, and the e-mail application was using only the I/O subsystem.

A database server is typically much more complex and finely tuned than a file server. Add the fact that both are tuned differently, and you begin to have problems. The combination of these two applications had availability implications. A slight problem that causes a sensitive database to hang wouldn't have affected a simple file server. In addition, the scaling of a large server required either buying more large servers or consolidating on an even larger server. The solution: Decouple the database and file system servers. This was also done at the data level, which I'll discuss next.

As mentioned earlier, the e-mail system was composed of both message files and database objects. The original design treated these two distinct data types as equals. This caused all sorts of I/O problems. The channel utilization for the storage subsystem was low because of the mix of block sizes. The mixture of random and sequential data caused trashing within the cache with minimal reuse. Physical recovery was difficult because it was necessary to restore the full volume to another disk and then pull off the needed data. The only logical step was to isolate the database and message file data so they could be tuned and matched with the resources needed to provide an optimum solution. This sounds good in theory, but how could we redesign the entire e-mail system while still servicing thousands of users with no downtime?

Modular approach
As we moved from a monolithic to modular architecture, the e-mail system redesign took several phases to complete. In the first phase, the database and file system servers were decoupled to make better use of the available resources. The 6000-class Suns were replaced with 4000-class systems and 200-class systems were used for file servers. (The 6000s were redeployed for another project within the company.) In the second phase, we built new storage modules. The next phase migrated users off the hardware to the new modules, which were built from the decommissioned hardware; so the only new hardware needed was for the initial module. Users were migrated when there was the smallest amount of e-mail traffic. It's important to note the features that were added to the redesigned system to increase the e-mail system's availability: failover and dynamic multipathing.

The failover strategy was an elegant solution consisting of a dual server architecture and implementation of an Oracle standby database. In the event of a problem with the active server, the failover software would cut over to the stable server and use the standby database. The standby database was kept synchronized by updating the standby server's logs on a periodic basis. This made maintenance and upgrades non-disruptive.

Dynamic multipathing also improved channel utilization, thus enhancing performance. A total of eight paths were used in this implementation with four dedicated to each server (see "Final e-mail system" on this page). The multipathing allowed for host bus adapter (HBA) failures without affecting the database environment. Dynamic multipathing was also implemented on the file servers, primarily for performance.
< PREV PAGE   |   1  |   2  |   NEXT PAGE  >





TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts