HOME  |  CURRENT ISSUE
Database archiving done right
http://searchStorage.techtarget.com/magItem/0,291266,sid5_gci1258152,00.html
by: James Damoulakis and Steve Jeffreys
Issue: Mar 2004

As springtime approaches, perennial household chores such as spring cleaning take center stage. Unlike an annual once-over for a messy house, deciding what to clean in a relational database is an ongoing problem.

File-oriented data can be managed using familiar operating system utilities, but from a storage administrator's perspective, a database is a monolithic container. The storage administrator only knows the size of the container and where it's located. Managing its contents is the domain of the database administrator. But make no mistake--all databases need to be archived for three main reasons:

  1. Regulatory compliance. For many organizations, the primary reason for considering an archiving strategy is to address regulatory concerns and risks. These days, almost everyone is aware of the increased focus on the need to retain and retrieve certain kinds of information and the penalties associated with the inability to do so. Many of these regulations dictate long-term retention requirements that place a significant burden on organizations attempting to comply.


  2. Managing data growth. Another factor is the continuing--and often uncontrolled--growth of data. Corporate databases are becoming the primary repositories for critical enterprise data, and are growing at an estimated 60% to 125% annually. Many primary-tier applications are built on relational databases, and the use of advanced data protection techniques such as rolling snapshot volumes and remote replication are most often applied to database volumes. It's surprising how many copies of the data may exist when you add backups to the mix (see "When is a gigabyte not a gigabyte?").


  3. Application Performance. One of the most critical drivers for database archiving is application performance. Quite simply, as databases grow, they slow down. Typically, 50% or more of the data residing in databases is historical or inactive. Yet when database searches or lookups are performed, this inactive data is processed and combed through along with current data, resulting in a significantly slower application response. And of course, the performance of ancillary activities such as backup and recovery is also affected (see "Why upgrade when you can archive?").
Issues and challenges
Archiving data that resides within a database presents some significant challenges. The first challenge is to determine what needs to be archived and when: in other words, data classification and policy development. This can be a complex issue because in most organizations, arriving at an answer is a multidepartmental and multifunctional effort. Cross-functional teams of IT infrastructure and application groups and lines of business, as well as functional areas such as finance and legal, are required to classify data and establish policies for movement and retention. Then, there's the whole range of technical and process issues. These include:

Heterogeneity. Most organizations deploy a wide range of applications that are often built on several different databases. This could include legacy products or older versions of current products. Archiving this range of information could require a variety of tools and processes, thereby increasing complexity.

Application complexity. Even after completing data classification and establishing policies, the rules associated with the business logic of the application and internal relationships and dependencies must be considered. This requires a comprehensive understanding of the application.

Retrieval considerations. Storing years of database backup tapes isn't difficult; retrieving a particular set of information is the hard part. The difficulty of retrieving this information increases in direct correlation to the age of the data. Issues such as media readability and compatibility, system dependencies and application versions must be considered. Above all, there's the problem of locating and identifying the specific information among the many generations of data.

Data destruction. The news is rife with high-profile investigations demonstrating that in many situations, it's just as important to destroy data as it is to retain it. Data retained for longer periods than required for regulatory purposes may become a liability. If an investigation or legal proceeding takes place and some relevant data is discovered, it may need to be produced, even if there was no legal obligation to retain it. In addition to potential liabilities, the cost of retrieving the data can be substantial.

Where is the technology today?
Specialized database archiving technology is relatively young, with dedicated third-party products entering the market in late 1999 and 2000. The enterprise faced with problematic database growth must choose among three options: ad hoc methods, administrative tools native to their database applications or specialized third-party archiving products (see "Comparison of database archiving approaches").

Comparison of database archiving approaches

Ad hoc archiving typically begins as a reaction to intolerable circumstances. The database has grown so large that batch processing windows are being missed, and ever-expanding storage and processing requirements catch the attention of senior management. The quick response is the equivalent of an emergency liposuction: A highly skilled team or individual combs through the database, pulling out as much old or unused data as they safely can. Time, cost and expertise constraints limit the depth and breadth of this exercise, so only the easy targets are found.

The operation continues until the database is lean enough to meet minimal operational requirements. If done with some consideration for the future, the purged data will be kept in a separate instance of the database, so referential integrity will (or may) survive, as will some ability for future access. However, at the next crisis a different team may respond using different techniques, and over time it's a near certainty that the purged data will be either lost or become so disjointed as to be unusable. Building and supporting a customized tool to handle the situation effectively is often beyond the budgets and skill sets available.

Native database administrative tools start to look attractive after the enterprise has dealt with a couple of crisis situations. Tools native to database vendors have the advantage of working well with the application and are supported by expert technologists. Oracle, SAP and other application vendors have their own tools and methodologies. The tools make it easier to extract and manage data; continued access by the application is straightforward. However, it isn't uncommon for a major upgrade of a database product to require production data to be brought forward into a new format incompatible with the previous version. This is no small task.

Third-party database archiving products

Maintaining access to archived data requires that it be migrated with production data, which adds to the support burden. However, many native database management tools weren't designed with archiving in mind, and lack the notion of higher-level policies to drive and automate the archiving process. This can be overcome by building custom scripts and procedures to manage the process, but this brings us back to the long-term support issues involved with the ad hoc approach. In a single-vendor shop, this may be manageable. In a large shop with several different applications or databases, the problem is managing the separate solutions across all of the different platforms. An enterprise with active DB2, Oracle, SAP, Siebel and PeopleSoft applications is faced with developing and managing separate archiving solutions for each flavor, and the costs quickly become untenable.

A small number of third-party products are now available that are specially designed for database archiving, such as products offered by Princeton Softech, in Princeton, NJ, and OuterBay Technologies, Cupertino, CA (see "Third-party database archiving products"). These products feature policy engines to help capture and manage the business rules that drive the archiving process. They have tools that facilitate extracting data and managing archives. Some also have monitoring and reporting facilities to track data growth and make projections on database and archive size. They provide a way for the application to easily access archived data as needed.

Archived data can be kept online in application-native format or in an application-independent format that still preserves referential integrity. The latter is especially useful for moving less-used data to nearline or offline storage for extended periods of time, without the necessity of bringing archived data forward with each new release of the application. Once an archiving policy for the database has been established, the archiving product handles movement of data between application and archive, and performs translation between application versions as needed. Additionally, these specialized third-party tools cover an increasing number of database and enterprise resource planning (ERP) applications, so that one archiving tool can serve the entire enterprise.

Getting started
Developing an archiving strategy is a significant undertaking. As mentioned earlier, a cross-functional team will be required to ensure that the business and technical needs are adequately addressed. The phases for developing and deploying an archiving strategy include:

The greatest challenge to database archiving isn't technology. The greatest obstacle for most organizations is coming to an agreement on what to archive and establishing appropriate policies for archived data. To address this, the first step is to assess your needs. Is the primary factor regulatory compliance and long-term retention, or is it capacity and performance? For some industries, compliance issues may require adopting records-management applications in addition to data archiving techniques.

Application data characteristics and dependencies can greatly impact the feasibility and cost to implement archiving. A major application like PeopleSoft can have thousands of tables, and understanding the business rules and logic to determine how to archive this data isn't trivial. Application complexity is a major driver for the adoption of third-party archiving tools. The classification of data and the development of policies for retention, migration to the archive and capabilities for retrieval are essential.

The phases of product evaluation and piloting should not just focus on the technology. They should also include the development and testing of standard operating procedures and the identification of roles and responsibilities needed to ensure that archiving policy requirements can be met. A wide-range rollout of an archiving solution demands regular monitoring and measurement to ensure policy compliance and evaluate whether performance and data capacity levels are meeting expectations.

The benefits of a successfully deployed database archiving strategy can be far-reaching. Performance improvements, better storage management and improved data retention are significant paybacks. Third-party database archiving products are starting to play a more prominent role in automating the archiving process. Take the necessary time to properly evaluate, design and test these new database archiving applications to achieve success.

 

Storage Magazine is a part of the TechTarget portfolio of enterprise IT-focused media.
Copyright 2001-2004, TechTarget. All Rights Reserved.