Data retrieval strategies: Retrieving email and database archives overview
Email and database applications have emerged as mission-critical business tools, but managing the spiraling volume of email and database records has proven to be an enormous challenge for storage administrators. The data generated by email and database systems can quickly bloat a storage system, and that data is frequently offloaded to an archival storage system where it can be retrieved as needed. In actual practice, email and database archives share many of the same concerns as more traditional
-- the data should be unchanging, readily searchable and managed with a strict retention/deletion policy. However, email and database archives are even more sensitive to search and retention issues than other forms of data, and long-term archival storage demands special considerations.
Litigation and e-discovery support
Email is often at the crux of litigation -- proving or disproving allegations ranging from personal harassment to corporate misconduct and more. The success or failure of a legal action may hinge on the prompt retrieval of a key email or thread in response to a discovery request. Consequently, it's important to capture every new email (a process called "journaling") and then apply comprehensive indexing so that each message can be located later. For example, one email archive tool indexes key words and properties (to, from, subject, date, etc.) of an email and attachments. Search tools can then access the index metadata and even perform deeper content searches within each email message. Industry experts agree that comprehensive indexing as the email is captured will streamline searches later.
Email retention and data retrieval is also influenced by features like "litigation hold." For example, data that may be subject to legal discovery can be placed into a protected mode where it cannot be deleted even after its retention period expires. This ensures that pertinent data will remain available throughout the course of any legal proceeding.
Restoration, storage and security
Email archives also frequently fill a backup role in the enterprise, preserving emails and database records against accidental deletion or corruption. For example, a user may delete an email containing important account information, but that message can be located and restored from an archive -- often by the user themselves without any direct interaction from the storage administrator.
Prior to commercial email archive tools, individual users created their own .pst (personal store) files to archive emails to their PC or corporate data center. However, .pst files can be quite large -- Outlook 2003/2007 limits the size to 20 GB. In an organization with many employees, .pst files can demand an inordinate amount of storage space. Once an email system is put into place, most storage administrators actively discourage user .pst files in the enterprise. Eliminating .pst files saves storage resources and makes it easier to enforce corporate-wide email retention and deletion policies. Eliminating .pst files also reduces the burden of email discovery since there would be considerably less email data to examine.
Many email platforms include data reduction techniques to help conserve storage space. The most common form of data reduction is often termed "stubbing," where only one copy of an email attachment is actually saved and all subsequent instances are simply denoted with a pointer to the one actual copy. This is a form of data deduplication.
Since end users can frequently search and restore their own email messages from the archive, security is also critically important. Many email archive platforms allow users to search their own messages. Supervisors can search for email from groups. Similarly, a user or group can potentially access to archived messages within a specified folder. Administrators must use care when assigning retrieval rights and take steps to ensure that messages cannot be altered or deleted by those users.
The effect of change
Another aspect of email/database data retrieval is the continued readability of the archival media. Anytime that underlying software or hardware is changed, archive readability can be affected. For example, email archives may be offloaded to a tape library. For example, if the tape library is upgraded to a newer technology, older archive tapes may no longer be readable. This might happen when moving from DLT to LTO technology. Similarly, upgrading or replacing the email/database archive software can also render previous media unreadable. It's important for storage administrators to consider changes on existing archives and have a refresh plan available that will migrate existing archives to the new platform.
One means of change mitigation is the use of an archive service provider. This places the burden of archive management and maintenance on a third party firm and allows your enterprise to manage recurring costs by paying only for the level of service used. You don't need to worry about recovery as its infrastructure evolves. However, you will need to make a commitment to that particular service provider and rely on its integrity and availability into the future. Organizations with significant legal exposures (e.g., banking) generally do not opt for third-party services.
This was first published in May 2007