Published: 01 Apr 2011
The latest version of Exchange Server has some significant changes that will impact the storage supporting the mail system.
By Brien M. Posey
With Exchange Server 2010, Microsoft Corp. made some major changes to the database structure that underlies the email application. These architectural changes have a significant impact on planning for Exchange Server's data storage requirements.
The biggest change Microsoft made was eliminating single-instance storage (SIS). Previously, if a message was sent to multiple recipients, only one copy of the message was stored within the mailbox database. User mailboxes received pointers to the message rather than a copy of the entire message.
The elimination of single-instance storage means that when a message is sent to multiple recipients, each recipient receives a full copy of the message. In terms of capacity planning, the overall impact of this change will vary depending on how many messages include attachments.
Text and HTML-based messages are typically small and will have a minimal impact on capacity planning, and Microsoft further reduces the impact by automatically compressing such messages. However, if you have users who routinely send large attachments to multiple recipients, those messages could have a major impact on database growth. Microsoft's primary goal in designing the new database architecture was to decrease database I/O requirements. As such, Microsoft chose not to compress message attachments because of the additional I/O that would have been required to compress/decompress them.
It may seem odd that at a time when storage managers are looking to reduce duplication in primary storage Microsoft removes a data reduction feature from Exchange. But Microsoft scrapped single-instance storage because Exchange mailbox databases perform much more efficiently without it. Microsoft claims database I/O requirements have been reduced by approximately 70% in Exchange 2010.
One of the most common methods of keeping Exchange 2010 mailbox databases from growing too large is to use mailbox quotas. Quotas prevent individual mailboxes from exceeding a predetermined size, and the quotas in Exchange 2010 work as they did in previous versions of Exchange with one notable exception. Exchange 2010 introduces the concept of archive mailboxes (discussed later). If a user has been given an archive mailbox, the mailbox quota won't count the archive mailbox's contents when determining how much storage the user is consuming. Exchange does, however, let you manage archive storage through a separate quota.
The use of mailbox quotas is a tried-and-true method for limiting data storage consumption. But Microsoft has been encouraging organizations to make use of low-cost storage rather than mailbox quotas. The argument is that organizations can accommodate the increased database size without spending a lot on expensive storage solutions.
The low-cost storage recommendation is based on more than just storage cost. Many organizations have been forced to set stringent mailbox quotas that have forced users to delete important messages. Ostensibly, cheaper storage will allow for larger mailbox quotas or for the elimination of quotas altogether.@pb
Previously, using lower end storage subsystems in production Exchange Server environments was unheard of, but Exchange 2010's reduced I/O requirements make storage options such as SATA drives practical. And Exchange Server 2010 is flexible in terms of the types of storage it can use; it will work with direct-attached storage (DAS) or storage-area network (SAN) storage (or with an iSCSI connection to a storage pool). However, Microsoft does prevent you from storing Exchange Server data on any storage device that must be accessed through a mapped drive letter. So you won't be able to store a mailbox database on a network-attached storage (NAS) system unless it supports iSCSI connectivity.
Even though low-cost storage might provide adequate performance, it's still important to choose a storage subsystem that also meets your organization's reliability requirements. For instance, if you opt for SATA storage, it's best to create a fault-tolerant SATA array. Microsoft recommends using RAID 1+0 arrays. Some organizations use RAID 5 because it's less costly and still provides fault tolerance, but RAID 1+0 arrays generally offer better performance.
It's worth noting that database size can have a direct impact on performance. As a general rule, mailbox databases on standalone mailbox servers should be limited to 200 GB or less. If a mailbox database grows larger than 200 GB, you may benefit from dividing the database into multiple, smaller databases. For mailbox databases that are part of a Database Availability Group, the recommended maximum database size is 2 TB.
Determining storage requirements
Determining the storage requirements for an Exchange 2010 deployment can be a big job, but Microsoft offers a free tool that can help. The Exchange 2010 Mailbox Server Role Requirements Calculator is an Excel spreadsheet that calculates your Exchange storage requirements based on your organization's Exchange usage.
To use the Exchange 2010 Mailbox Server Role Requirements Calculator, fill in a series of cells by answering questions related to the intended Exchange Server configuration and usage. For instance, the spreadsheet asks questions about the average size of an email message and the number of messages users send and receive each day. Formulas built into the spreadsheet will use the information you provide to determine the required storage architecture.
Keep in mind, however, that while the Exchange 2010 Mailbox Server Role Requirements Calculator may be the best tool available for estimating Exchange mailbox server storage requirements, the recommendations it offers are only as accurate as the data you provide. To compensate, Microsoft recommends you provision enough disk space to accommodate at least 120% of the calculated maximum database size.@pb
Exchange archive mailboxes
There are other factors to consider that may impact your Exchange Server storage planning, such as whether you plan to implement user archive mailboxes, a new and optional feature. User archive mailboxes are secondary mailboxes that can be used for long-term retention of messages. What makes archive mailboxes different from other Exchange archiving methods is that unlike a more traditional archive (such as a journal mailbox), the user retains ownership of the items in the archive mailbox. As such, each user's archives are readily accessible.
Archive mailboxes are designed to take the place of PST files. But unlike PST files, archive mailboxes are stored within a mailbox database on the Exchange Server where they can be managed and regulated by the Exchange administrator.
In the original RTM release of Exchange 2010, user archive mailboxes were in the same mailbox database as users' primary mailboxes. In SP1, Microsoft provided the option of relocating user archive mailboxes to a separate mailbox database that allows the archives to be offloaded so they don't impact the primary mailbox storage.
Microsoft generally recommends placing the archive mailboxes on a low-end mailbox server that uses inexpensive direct-attached storage (such as a SATA array). Remember, if a mailbox database contains only archive mailboxes then it won't be subject to the same I/O load as a mailbox database that's used to store the user's primary mailboxes. Another advantage to using low-cost storage for user archive mailboxes is that doing so makes it practical to set a high mailbox capacity quota on the archive mailboxes. (See "Can Exchange Server's archiving and e-discovery replace third-party products?" below.)
|Can Exchange Server's archiving and e-discovery replace third-party products?|
Prior to the release of Exchange Server 2010, an entire industry emerged around creating archival and e-discovery products for Exchange Server. Now that Exchange 2010 offers native support for user archives and has built in e-discovery capabilities, it seems only natural to consider whether these new features can replace third-party products.
Exchange 2010's e-discovery and archiving features may be sufficient for some smaller organizations, but they're not enterprise-ready. The archiving and e-discovery features both have limitations you won't encounter with most third-party tools.
For example, Exchange 2010's archive mailboxes aren't a true archiving solution. Archive mailboxes let users offload important messages to a secondary mailbox that's not subject to strict retention policies or storage quotas. But if you want to do true archiving at the organizational level you still must use Exchange's journaling feature. The journal works, but third-party archivers provide much better control over message archival, retention and disposal.
The situation's the same for Exchange 2010's multi-mailbox e-discovery search feature. Multi-mailbox search has some major limitations. For example, it can only be used with Exchange 2010 mailboxes, so you'll still need a third-party product to search legacy Exchange mailboxes or PSTs.
Multi-mailbox search also lacks some of the rich reporting options and export capabilities commonly found in specialized e-discovery products.
Another consideration to take into account is the journal mailbox. If you use journaling to archive messages at the hub transport level then all the archived messages are placed into the journal mailbox.
I've never come across any Microsoft best practices for the placement of journal mailboxes, but I like to put the journal mailbox in its own mailbox database. This is because the journaling process tends to be very I/O intensive and placing the journal mailbox in a dedicated mailbox database ensures its I/O doesn't degrade the performance of the other mailbox databases. If all messages are journaled, locating the journal mailbox within the same store as the user mailboxes will double the I/O requirements because Exchange 2010 doesn't use single-instance storage. In other words, journaling causes an extra copy of each message to be created within the mailbox store.
If you were to create the journal mailbox in the same database as the user mailboxes, it would have a major impact on the replication process (assuming that database availability groups are being used -- see "Protecting Exchange Data," below).
|Protecting Exchange data|
Exchange Server has always been somewhat difficult to protect. If you do a traditional nightly backup of your Exchange servers, a failure could potentially result in the loss of a full day's worth of messages. For most companies, such a loss is unacceptable.
Exchange administrators have taken a number of different steps to prevent substantial data loss. In Exchange 2007, for example, it was a common practice to use continuous replication to replicate mailbox data to another mailbox server. A continuous replication solution provides fault tolerance and acts as a mechanism for protecting data between backups. (Of course, using a continuous data protection solution such as System Center Data Protection Manager is also a good option.)
Some observers feel Microsoft is working toward making Exchange Server backups completely unnecessary. The idea is that Database Availability Groups will eventually make Exchange resilient enough that you won't need backups.
Database Availability Groups are an Exchange 2010 feature that lets you create up to 16 replicas of a mailbox database. These replicas reside on other mailbox servers, and it's even possible to create database replicas in alternate data centers. Despite the degree to which Database Availability Groups can protect mailbox data, you shouldn't abandon your backups just yet.
Having multiple replicas of each database makes it easier to protect Exchange Server, but if a mailbox database becomes corrupted or gets infected with a virus, the corruption or viral code is copied to the replica databases.
But Microsoft does offer a delayed playback feature in which lagged copy servers are used to prevent transactions from being instantly committed to replica databases. If a problem occurs, you'll have enough time to prevent the bad data from being committed to a replica database. Once you've stopped the bad data from spreading, you can revert all your mailbox databases to match the state of the uncorrupted replica.
While this approach sounds great in theory, Microsoft still has a lot of work to do to make it practical. Right now the procedure requires you to take an educated guess as to which transaction log contains the first bit of corruption and then work through a complicated manual procedure to prune the log files. So while Exchange 2010's storage architecture makes it easier to protect your data by way of Database Availability Groups, you shouldn't rely on them as the only mechanism for protecting Exchange data.
Another advantage to locating the journal mailbox in a separate mailbox database is that it makes it easy to manage storage quotas and message retention based on mailbox function. You can create one set of policies for user mailboxes and another set of requirements for the journal mailbox.@pb
The last type of mailbox you should consider when planning for Exchange 2010 storage is the discovery mailbox. The discovery mailbox is only used when a multi-mailbox search (e-discovery) is performed. The search results are stored in the discovery mailbox.
By default, the discovery mailbox is assigned a 50 GB quota. This sounds large, but it may be too small for performing e-discovery in a large organization.
When it comes to choosing a storage location for a discovery mailbox, capacity is generally more important than performance. While the e-discovery process is I/O intensive, the I/O load is split between the database containing the user mailboxes and the database holding the discovery mailbox.
If e-discovery isn't a priority, then you may consider not even bothering to create a discovery mailbox until you need it. If that's not an option, your best bet is to place the mailbox in a dedicated mailbox database that lives on a low-cost storage system with plenty of free disk space.
More planning required
Clearly, there are a number of considerations that must be taken into account when planning an Exchange Server storage architecture. Even though Exchange 2010 isn't as I/O intensive as its predecessors, I/O performance should still be a major consideration in the design process. Other important considerations include capacity and fault tolerance.
BIO: Brien M. Posey is a seven-time Microsoft MVP for his work with Exchange Server, Windows Server, Internet Information Server (IIS) and File Systems/Storage. He has served as CIO for a nationwide chain of hospitals and was once a network administrator for the Department of Defense at Fort Knox.