This article can also be found in the Premium Editorial Download "Storage magazine: Evaluating the benefits of IP SANs."
Download it now to read this article plus other related content.
A few years ago, I helped an ISP redesign its e-mail storage to strengthen the application's availability, scalability and recovery. Many of the steps we took were to transform a monolithic storage design into a more flexible and scalable system, one that provided higher levels of availability and performance that may be transferable to other large e-mail storage environments.
At the time, the ISP was growing rapidly, adding nearly 10,000 new accounts a day. To keep the system running, the staff was constantly adding more hardware and servers to handle the increase in database traffic and more storage. It was obvious the current architecture would soon hit a wall as to the number of users it could service. The redesign needed to address these three critical areas:
Availability. This problem was the most significant of the three. E-mail couldn't go down for any reason. Nor could an e-mail message be corrupted without doing harm to the ISP's reputation. The new design needed database failover, dynamic multipathing and multiple mirror sets.
Scalability. Scalability was the greatest challenge, because the system was servicing thousands of users. The servers, storage and software had to be staged, cut over and reclaimed, and this process couldn't affect the users.
Recovery. Management also wanted to add new features that would allow fast recovery of a corrupted database and off-site archival of low-usage accounts. This required multiple mirrors, scripts to trigger synchronization points and a large common repository that could be mirrored remotely for the highest level of recovery.
The e-mail system stores all security and account information for the individual e-mail users in a database, and all messages in a file systems. Originally, database and file system data were on the same physical drives, channels and subsystems without regard for the distinct performance characteristics of each data type. The storage subsystem was trashing between random, small block and sequential, large block data.
The server layout wasn't optimized for performance, either. Large servers handled the massive amount of database traffic while also streaming message objects to users. Each Sun 6000 server ran the e-mail application and an Oracle database. While these machines have decent I/O and expandability, the database was using a lot of system resources. Message streaming lowered the CPU usage, but consumed the majority of I/O bandwidth. At first glance, this doesn't seem like a bad solution. The two applications are using different resources within the same server. Oracle was using mainly CPU and memory, and the e-mail application was using only the I/O subsystem.
A database server is typically much more complex and finely tuned than a file server. Add the fact that both are tuned differently, and you begin to have problems. The combination of these two applications had availability implications. A slight problem that causes a sensitive database to hang wouldn't have affected a simple file server. In addition, the scaling of a large server required either buying more large servers or consolidating on an even larger server. The solution: Decouple the database and file system servers. This was also done at the data level, which I'll discuss next.
As mentioned earlier, the e-mail system was composed of both message files and database objects. The original design treated these two distinct data types as equals. This caused all sorts of I/O problems. The channel utilization for the storage subsystem was low because of the mix of block sizes. The mixture of random and sequential data caused trashing within the cache with minimal reuse. Physical recovery was difficult because it was necessary to restore the full volume to another disk and then pull off the needed data. The only logical step was to isolate the database and message file data so they could be tuned and matched with the resources needed to provide an optimum solution. This sounds good in theory, but how could we redesign the entire e-mail system while still servicing thousands of users with no downtime?
As we moved from a monolithic to modular architecture, the e-mail system redesign took several phases to complete. In the first phase, the database and file system servers were decoupled to make better use of the available resources. The 6000-class Suns were replaced with 4000-class systems and 200-class systems were used for file servers. (The 6000s were redeployed for another project within the company.) In the second phase, we built new storage modules. The next phase migrated users off the hardware to the new modules, which were built from the decommissioned hardware; so the only new hardware needed was for the initial module. Users were migrated when there was the smallest amount of e-mail traffic. It's important to note the features that were added to the redesigned system to increase the e-mail system's availability: failover and dynamic multipathing.
The failover strategy was an elegant solution consisting of a dual server architecture and implementation of an Oracle standby database. In the event of a problem with the active server, the failover software would cut over to the stable server and use the standby database. The standby database was kept synchronized by updating the standby server's logs on a periodic basis. This made maintenance and upgrades non-disruptive.
Dynamic multipathing also improved channel utilization, thus enhancing performance. A total of eight paths were used in this implementation with four dedicated to each server (see "Final e-mail system" on this page). The multipathing allowed for host bus adapter (HBA) failures without affecting the database environment. Dynamic multipathing was also implemented on the file servers, primarily for performance.
This was first published in July 2003