10 points to consider before deploying an e-mail archive
Selecting an e-mail archiving application based solely on functionality may result in unexpected administration costs.
As storage shops implement e-mail archiving, many of them are confronting issues related to a company's vertical industry or the capabilities/maturity of the IT organization and its e-mail user community. The best practices presented here--which are focused on archiving for Microsoft Exchange--will ease some implementation, operational and maintenance concerns.
Your firm's industry will determine how you implement e-mail archiving. If financial compliance is a key driver, the emphasis will be on proof points and mandated retention policies. For organizations susceptible to expensive legal discovery actions, retention and ease of retrieval will be paramount; firms with similar requirements might also need sophisticated search criteria.
Other implementation issues are related more to company size than to specific industries. For example, some firms have crossed a complexity threshold due to the sheer size of their Exchange environments; therefore, their archiving priority may be reducing the size of message stores to improve daily operations. And regardless of their particular requirements, most organizations will also be looking for a degree of transparency to users.
An organization will typically draft a list of requirements the projected archival application must support. I recommend you also take a moment to consider the impact on operations and administration that might occur depending on how the vendor has architected its application. The product's other advantages might outweigh these considerations, but remember that operational and administration overhead is forever. At the very least, a prudent IT professional will understand the tradeoffs between application features and functions and operational overhead.
Where does this overhead come from? Our best practices suggest reviewing the following areas, which can add operational/administration costs to your archiving solution.
- Archive database management. Archiving applications write data to a database, which means that some of the tables require regular housekeeping. For example, the archiver might maintain a table that contains "questionable use" e-mails. If not serviced regularly, this table might increase the size of the database to a point where it affects the overall application. Some issues to consider include determining what the appropriate safety overhead is and if the suspect table can trigger a threshold alarm. Find out how the alarm works and determine who will address it.
- File purging and housekeeping. If data is written to an unstructured file, look for those functions that require regular purging. An example here is the Exchange journal. Consider the operational impact on the chosen storage tier if archiving isn't possible for a period of time. An operational threshold metric should be set to trigger an emergency alert if regular journal purging fails to occur. This may happen if the archiving environment itself experiences server, network or storage failures.
- Archive file structure. The archiving file structure is where the actual data is kept and, hopefully, it's in a single- instanced store. In large organizations, there will be tens if not hundreds of Exchange servers. An archiving application will likely be able to run on multiple servers in some ratio to the Exchange servers, but find out if the archive and index data is written to a single data store. If not, consider the additional management effort required to manage this type of environment, particularly as it scales.
- Application availability. E-mail is increasingly a mission-critical application, and many organizations look to clustering or other high-availability techniques to keep their Exchange environment operational. An archiving application may also require similar availability (if only to maintain journal purging capabilities). The architecture to support this can add significantly to operational overhead. Unless some kind of automated failover capability is included in the archiving product, Windows clustering or a similar technology can double the number of servers to be managed.
- Load balancing. With multiple archiving servers supporting multiple Exchange servers, load balancing can become a key issue. Manual load balancing is time consuming and an inefficient use of resources. Look for some form of automated load-balancing capability, either through traditional middleware or, preferably, within the archiving application.
- Index rebuild. Even in the best-run environments, an index will occasionally become corrupted. When this index points to literally millions of e-mail entities, a rebuild can be a nightmare. Issues to consider include the following: Are rebuilds transparent to users? Is Exchange operation compromised? What's the impact on journal purging during an index rebuild? When the index is finally rebuilt, does Exchange or the archiving software need to be rebooted? Index rebuilding can significantly affect administration overhead, recovery times and end-user service levels.
- Client agents. To achieve high levels of user transparency, some products require the installation of an agent on each Exchange client (Outlook). This might make life easier for users, but it can be a major burden for IT administrators, particularly in companies that don't have efficient methods for pushing out client agents. Installing client agents is an ongoing process that will need to be repeated for many revisions and changes to the archiving software or Exchange.
- Reporting and metrics. With e-mail archiving, you'll be dealing with an astonishing amount of information that's saved daily in an endlessly expanding archive. To manage the environment, you need to know the number of e-mails and attachments (and their size) moving through your Exchange, archiving and tiered storage. Managing user-retrieval needs and allocating appropriate class of service means you need to have an aged analysis of e-mail and be able to determine the last-accessed date for each age group. A comprehensive metrics component will help you effectively administer the archiving environment.
- Backing up the archive. Once data is archived, it doesn't make sense to keep backing it up in full every week. Archived data can be backed up to two or three copies, and then not require attention until the media refresh threshold is reached. An archiving application must allow archived e-mail to be moved down through storage tiers based on age, where only the top tier is backed up regularly. In addition, find out if the archiving product requires services to be manually shut down before backup takes place.
- Scalability. The main scalability issue is how much hardware the archiving application requires to support the Exchange environment. This may be expressed as the number of Exchange servers that can be supported by an archiving server, or how many archiving servers are required to support a particular number of mailboxes or e-mails. Whatever the metric chosen, it's important to consider what happens when e-mail growth is projected out three to five years. If the key metric--such as the number of Exchange servers or mailboxes--doubles, will the archiving solution also need to double?
By considering these 10 issues, along with archiving feature and function requirements, you can align your understanding of the process with vendor statements and real-world results.