Published: 10 Sep 2006
Microsoft Exchange has some unique requirements and quirks that storage managers will have to grapple with to ensure high availability of the e-mail service.
E-mail has shot up through the hierarchy of critical systems, from business critical to mission critical to absolutely essential. "E-mail today is as important as the telephone for business communications," says Mark Levitt, program vice president for collaborative computing and the enterprise workplace at IDC, Framingham, MA. Becky Swails, network engineer at Citgo Petroleum Corp. in Tulsa, OK, agrees: "When it comes to e-mail, we're like crack addicts."
In theory, high-availability e-mail is straightforward. "The solution requires redundant storage, redundant servers and automatic failover," says Levitt. From that standpoint, any disaster recovery strategy should do the trick. The catch is the speed of the recovery, as e-mail users have zero tolerance for downtime.
When it comes to high-availability e-mail, there are solutions to fit almost every need, budget and IT skill level. Options tend to be either server- or storage-based, according to Donna Scott, vice president and distinguished analyst at Gartner Inc.'s offices in Virginia Beach, VA. These options include managed hosting, specialized e-mail appliances, local and remote replication software, continuous data protection (CDP), local and remote SAN-based replication, and server clustering. Each option has its advantages and disadvantages (see "High-availability options," at right).
Microsoft Corp. promotes active/active or active/ passive clustering as the answer for Exchange high availability. Many organizations, however, are discouraged by the cost and complexity of clustering Windows and Exchange. Exchange clustering isn't necessarily complicated when used in a two-node, active/passive configuration with the cluster restricted to one location. But it gets complicated when there are more than two nodes in the cluster, the clustering operates globally and you rely on automated failback.
Lotus/Domino, the other major enterprise e-mail application, minimizes e-mail availability problems by:
For example, the continuous allocation and release of various memory blocks within an Exchange process leads to virtual memory fragmentation. According to Microsoft, when virtual memory is available for a process but the virtual memory blocks aren't large enough, fragmentation occurs over time. Virtual memory issues are more prevalent in a clustered Exchange (2000 and 2003) configuration because these environments are typically used to scale Exchange to host thousands of users with multiple storage groups and messaging databases. In short, Exchange clusters aren't for the inexperienced.
Standard e-mail backup doesn't cut it. "There's a lot of data and it's changing rapidly," says Lee Benjamin, an analyst at Ferris Research in San Francisco. Just do the math: A company with 240 mailboxes and a 250MB quota will end up with a 60GB Exchange data store. Backing it up will take 3.5 hours; restoring it will take another 3.5 hours, calculates Keith McCall, chief technology officer at Azaleos Corp., a high-availability Exchange appliance provider in Redmond, WA.
"How many users will wait 3.5 hours for e-mail access?" he asks.
Further complicating matters is the way Exchange stores data. Referred to as the information store, there are two separate databases: one manages data in user mailboxes, while the other handles public folders. In addition, the two databases must work with the Exchange database engine, which handles disk storage and memory, and also manages the caching between disk and memory. IT managers complain in online postings that the Exchange information store is prone to corruption. Microsoft postings describe the many conditions under which the Exchange database may become corrupted, such as during virus scanning. Lotus/Domino is architected differently than Exchange and doesn't present the same problems (see "Lotus/Domino availability," this page).
High-availability solutions must take pains not to replicate any database corruption. "One of the things we liked about Teneros [an e-mail appliance from Teneros Inc., Mountain View, CA] is that it replicates object by object," says William Cumming, director of information technology at Strome Investment Management, a small investment firm in Santa Monica, CA. "That avoids corruption." Other approaches operate at the bit level, which can replicate corrupted data, although bit-level recovery can be much faster.
High availability can equal high cost
Like any other complex system, ensuring high availability for Exchange often doesn't come cheap. In general, redundancy forms the core of any high-availability product. As such, high availability essentially doubles the overall cost because there are two servers, two software licenses, two storage arrays and two communications links.
Citgo Petroleum figured it was looking at a minimum investment of $500,000 to completely replicate its Exchange environment for high availability. "Including all the hardware we would need--more disk, a new SAN--and SAN consulting, it would be at least a $500,000 project," estimates the firm's Swails.
Instead, the company opted for CA XOsoft's WANSyncHA (XOsoft was recently acquired by CA Inc.), which provides asynchronous replication and automatic failover for Exchange and other applications running on Windows servers. "XOsoft continuously replicates writes in small chunks," explains Swails. Citgo had to purchase an additional license of Exchange and added extra Windows servers, as well as some communications bandwidth (less than a T1 line). But the total cost was "less than 10% of the $500,000 we expected to pay," she reports.
Continuous protection for Exchange
The Sidney Kimmel Cancer Center (SKCC), a leading research organization in San Diego, employs approximately 150 scientists, staff and administrators who use Exchange extensively. For years, the organization relied on traditional block-level tape backups, either full or incremental, to ensure it could recover lost or corrupted Exchange e-mail.
"We never had a disaster, but I wasn't sure we could recover quickly using tape," says Jeff Wood, SKCC's former director of IT. The organization wanted to quickly recover anything from a single lost or corrupted message to the entire e-mail database.
Conventional high-availability solutions, such as those involving frequent replication, didn't appeal to SKCC. "They required redundant systems and involved a lot of steps, particularly if we needed to recover an individual message," recalls Wood. And its conventional backup system didn't really address the problem because any messages created since the last backup wouldn't be recoverable if the Exchange data store became corrupted.
The CDP approach of continuously journaling changes appealed to SKCC. "It was a simple concept. You captured all changes and could roll back to any point in time, almost like TiVo," says Wood, referring to the popular TV recording device. He also liked how CDP fit into SKCC's existing Exchange and backup infrastructure. The organization could have both tape and CDP, and the cost was right. Wood found CDP software products that SKCC could license for less than $10,000.
After evaluating Revivio Inc., TimeSpring Software Corp. and CA XOsoft products, SKCC opted for TimeSpring. "TimeSpring would let us run CDP on the same server we were running Exchange if we needed to," says Wood. He also liked the look and feel of the software. Since deploying the TimeSpring product, SKCC was able to meet a crucial grant deadline because it could quickly recover critical work in Exchange that had been corrupted.
Host-based replication software has one key advantage: hardware independence. "We chose Neverfail [Group Ltd.] because it lets us use disparate hardware," says Todd Sons, director of information technology at Jackson Walker LLP, a law firm headquartered in Dallas. Neverfail for Microsoft Exchange replicates the law firm's Exchange data to a passive server and keeps the two servers synchronized. In the event the active Exchange server fails, Neverfail immediately fails over to the passive server. The cost was just $10,000 for two Neverfail licenses that Sons can run on any Windows-capable server.
Host-based replication like that provided by Neverfail not only replicates the data, but monitors Exchange and the physical server hardware, network and operating system. It will fail over automatically if it discovers a problem it can't fix. Jackson Walker's Sons, however, disabled the automatic failover.
"We're doing manual cut-over. I want to make the decision," he says. Because users generally work from the Exchange cache, most never realize when they fail over or switch back. Products from CA XOsoft, Double-Take Software, Mimosa Systems Inc. and SteelEye Technology Inc. provide similar high-availability capabilities.
Exchange e-mail appliances
E-mail appliances, which are managed by outside companies, are the latest high-availability Exchange options. The appliances monitor Exchange and capture data going into the data store. "It seemed like voodoo at first," says Strome Investment Management's Cumming, referring to the Teneros-managed appliance his company deployed. He plugged in the box, gave it an IP address and it synchronized itself with Exchange.
"We didn't need to provide any hardware or baby-sit it," reports Cumming. Failover and switchback are automatic. Strome Investment Management uses 70 mailboxes. Teneros charges $15,000 for a 250-user appliance and $25,000 for a 500-user appliance. The annual maintenance fee ranges from $3,000 to $6,000, depending on the capacity of the appliance.
Whereas Teneros focuses on the small business market (up to 500 mailboxes), Azaleos is geared to larger organizations. Like Teneros, Azaleos installs what it describes as a clustered appliance containing redundant servers and storage inside the user's data center. The Azaleos product scales to 2,500 users.
"We look at Azaleos as a way to outsource e-mail availability while keeping our data within our grasp," says Lee Hudson, director of IT at Zumiez Inc., a fashion retailer in Everett, WA. The appliance itself contains an active/passive Exchange cluster and a third server that handles Active Directory. Azaleos costs approximately $30,000 per appliance and $15,000 for onsite storage. Monitoring services are $7 per month per user, plus $4 per month per user for offsite disaster recovery. The total cost for 1,000 users comes to about $200,000, half of which is for services.
Regardless of what approach you take to high-availability e-mail, you will lose e-mail service eventually. But being able to restore e-mail service within an hour or less is, as the credit card ad says, priceless.