Problem solve Get help with specific problems with your technologies, process and projects.

Email archiving implementation: What you need to consider

Selecting an email archiving application based solely on functionality may result in unexpected administration costs. Considering the following 10 points before deploying an email archive can help you hang onto your loot.

What you will learn from this tip: Selecting an email archiving application based solely on functionality may result in unexpected administration costs. Considering the following 10 points before deploying an email archive can help you hang onto your loot.

As storage shops implement email archiving, many of them are confronting issues related to a company's industry or the capabilities/maturity of the IT organization and its email user community. If financial compliance is a key driver, the emphasis will be on proof points and mandated retention policies. For organizations susceptible to expensive legal discovery actions, retention and ease of retrieval will be paramount; firms with similar requirements might also need sophisticated search criteria.

Other implementation issues are related more to company size than to specific industries. An organization will typically draft a list of requirements the projected archival application must support. I recommend you also take a moment to consider the impact on operations and administration that might occur depending on how the vendor has architected its application.

The best practices presented here -- which are focused on archiving for Microsoft Exchange -- will ease some implementation, operational and maintenance concerns.

1. Archive database management. Archiving applications write data to a database, which means that some of the tables require regular housekeeping. For example, the archiver might maintain a table that contains "questionable use" emails. If not serviced regularly, this table might increase the size of the database to a point where it affects the overall application. Some issues to consider include determining what the appropriate safety overhead is and if the suspect table can trigger a threshold alarm. Find out how the alarm works and determine who will address it.

2. File purging and housekeeping. If data is written to an unstructured file, look for those functions that require regular purging. An example here is the Exchange journal. Consider the operational impact on the chosen storage tier if archiving isn't possible for a period of time. An operational threshold metric should be set to trigger an emergency alert if regular journal purging fails to occur. This may happen if the archiving environment itself experiences server, network or storage failures.

3. Archive file structure. The archiving file structure is where the actual data is kept and, hopefully, it's in a single-instanced store. In large organizations, there will be tens if not hundreds of Exchange servers. An archiving application will likely be able to run on multiple servers in some ratio to the Exchange servers, but find out if the archive and index data is written to a single data store. If not, consider the additional management effort required to manage this type of environment, particularly as it scales.

4. Application availability. Email is increasingly a mission-critical application, and many organizations look to clustering or other high-availability techniques to keep their Exchange environment operational. An archiving application may also require similar availability (if only to maintain journal purging capabilities). The architecture to support this can add significantly to operational overhead. Unless some kind of automated failover capability is included in the archiving product, Windows clustering or a similar technology can double the number of servers to be managed.

5. Load balancing. With multiple archiving servers supporting multiple Exchange servers, load balancing can become a key issue. Manual load balancing is time consuming and an inefficient use of resources. Look for some form of automated load-balancing capability, either through traditional middleware or, preferably, within the archiving application.

6. Index rebuild. Even in the best-run environments, an index will occasionally become corrupted. When this index points to literally millions of email entities, a rebuild can be a nightmare. Issues to consider include the following: Are rebuilds transparent to users? Is Exchange operation compromised? What's the impact on journal purging during an index rebuild? When the index is finally rebuilt, does Exchange or the archiving software need to be rebooted? Index rebuilding can significantly affect administration overhead, recovery times and end-user service levels.

7. Client agents. To achieve high levels of user transparency, some products require the installation of an agent on each Exchange client (Outlook). This might make life easier for users, but it can be a major burden for IT administrators, particularly in companies that don't have efficient methods for pushing out client agents. Installing client agents is an ongoing process that will need to be repeated for many revisions and changes to the archiving software or Exchange.

8. Reporting and metrics. With email archiving, you'll be dealing with an astonishing amount of information that's saved daily in an endlessly expanding archive. To manage the environment, you need to know the number of emails and attachments (and their size) moving through your Exchange, archiving and tiered storage. Managing user-retrieval needs and allocating appropriate class of service means you need to have an aged analysis of email and be able to determine the last-accessed date for each age group. A comprehensive metrics component will help you effectively administer the archiving environment.

9. Backing up the archive. Once data is archived, it doesn't make sense to keep backing it up in full every week. Archived data can be backed up to two or three copies, and then not require attention until the media refresh threshold is reached. An archiving application must allow archived email to be moved down through storage tiers based on age, where only the top tier is backed up regularly. In addition, find out if the archiving product requires services to be manually shut down before backup takes place.

10. Scalability. The main scalability issue is how much hardware the archiving application requires to support the Exchange environment. This may be expressed as the number of Exchange servers that can be supported by an archiving server, or how many archiving servers are required to support a particular number of mailboxes or emails. Whatever the metric chosen, it's important to consider what happens when email growth is projected out three to five years. If the key metric -- such as the number of Exchange servers or mailboxes -- doubles, will the archiving solution also need to double?

Managing an archiving environment isn't only about making users happy with transparency, risk managers happy with retention, legal people happy with search criteria and IT people happy with a smaller Exchange database. You also need to understand the investment you may need to make in additional administrative overhead to manage your archiving environment.

By considering these 10 issues, along with archiving feature and function requirements, you can align your understanding of the process with vendor statements and real-world results.

Do you know…

How to create an email retention policy?

About the author
Dick Benton is a principal consultant at GlassHouse Technologies, Framingham, Mass.

Next Steps

Be smart when it comes to data retention

Records retention: How to avoid recovery headaches

Caringo breaks out CAS mark II

Dig Deeper on Long-term archiving

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.