![]() |
![]() |
| HOME | CURRENT ISSUE | |
As springtime approaches, perennial household chores such as spring cleaning take center stage. Unlike an annual once-over for a messy house, deciding what to clean in a relational database is an ongoing problem.
File-oriented data can be managed using familiar operating system utilities, but from a storage administrator's perspective, a database is a monolithic container. The storage administrator only knows the size of the container and where it's located. Managing its contents is the domain of the database administrator. But make no mistake--all databases need to be archived for three main reasons:
Heterogeneity. Most organizations deploy a wide range of applications that are often built on several different databases. This could include legacy products or older versions of current products. Archiving this range of information could require a variety of tools and processes, thereby increasing complexity.
Application complexity. Even after completing data classification and establishing policies, the rules associated with the business logic of the application and internal relationships and dependencies must be considered. This requires a comprehensive understanding of the application.
Retrieval considerations. Storing years of database backup tapes isn't difficult; retrieving a particular set of information is the hard part. The difficulty of retrieving this information increases in direct correlation to the age of the data. Issues such as media readability and compatibility, system dependencies and application versions must be considered. Above all, there's the problem of locating and identifying the specific information among the many generations of data.
Data destruction. The news is rife with high-profile investigations demonstrating that in many situations, it's just as important to destroy data as it is to retain it. Data retained for longer periods than required for regulatory purposes may become a liability. If an investigation or legal proceeding takes place and some relevant data is discovered, it may need to be produced, even if there was no legal obligation to retain it. In addition to potential liabilities, the cost of retrieving the data can be substantial.
Where is the technology today?
Specialized database archiving technology is relatively young, with dedicated third-party products entering the market in late 1999 and 2000. The enterprise faced with problematic database growth must choose among three options: ad hoc methods, administrative tools native to their database applications or specialized third-party archiving products (see "Comparison of database archiving approaches").
| Comparison of database archiving approaches |
![]() |
Ad hoc archiving typically begins as a reaction to intolerable circumstances. The database has grown so large that batch processing windows are being missed, and ever-expanding storage and processing requirements catch the attention of senior management. The quick response is the equivalent of an emergency liposuction: A highly skilled team or individual combs through the database, pulling out as much old or unused data as they safely can. Time, cost and expertise constraints limit the depth and breadth of this exercise, so only the easy targets are found.
The operation continues until the database is lean enough to meet minimal operational requirements. If done with some consideration for the future, the purged data will be kept in a separate instance of the database, so referential integrity will (or may) survive, as will some ability for future access. However, at the next crisis a different team may respond using different techniques, and over time it's a near certainty that the purged data will be either lost or become so disjointed as to be unusable. Building and supporting a customized tool to handle the situation effectively is often beyond the budgets and skill sets available.
Native database administrative tools start to look attractive after the enterprise has dealt with a couple of crisis situations. Tools native to database vendors have the advantage of working well with the application and are supported by expert technologists. Oracle, SAP and other application vendors have their own tools and methodologies. The tools make it easier to extract and manage data; continued access by the application is straightforward. However, it isn't uncommon for a major upgrade of a database product to require production data to be brought forward into a new format incompatible with the previous version. This is no small task.
| Third-party database archiving products |
![]() |
Maintaining access to archived data requires that it be migrated with production data, which adds to the support burden. However, many native database management tools weren't designed with archiving in mind, and lack the notion of higher-level policies to drive and automate the archiving process. This can be overcome by building custom scripts and procedures to manage the process, but this brings us back to the long-term support issues involved with the ad hoc approach. In a single-vendor shop, this may be manageable. In a large shop with several different applications or databases, the problem is managing the separate solutions across all of the different platforms. An enterprise with active DB2, Oracle, SAP, Siebel and PeopleSoft applications is faced with developing and managing separate archiving solutions for each flavor, and the costs quickly become untenable.
A small number of third-party products are now available that are specially designed for database archiving, such as products offered by Princeton Softech, in Princeton, NJ, and OuterBay Technologies, Cupertino, CA (see "Third-party database archiving products"). These products feature policy engines to help capture and manage the business rules that drive the archiving process. They have tools that facilitate extracting data and managing archives. Some also have monitoring and reporting facilities to track data growth and make projections on database and archive size. They provide a way for the application to easily access archived data as needed.
Archived data can be kept online in application-native format or in an application-independent format that still preserves referential integrity. The latter is especially useful for moving less-used data to nearline or offline storage for extended periods of time, without the necessity of bringing archived data forward with each new release of the application. Once an archiving policy for the database has been established, the archiving product handles movement of data between application and archive, and performs translation between application versions as needed. Additionally, these specialized third-party tools cover an increasing number of database and enterprise resource planning (ERP) applications, so that one archiving tool can serve the entire enterprise.
Getting started
Developing an archiving strategy is a significant undertaking. As mentioned earlier, a cross-functional team will be required to ensure that the business and technical needs are adequately addressed. The phases for developing and deploying an archiving strategy include:
Application data characteristics and dependencies can greatly impact the feasibility and cost to implement archiving. A major application like PeopleSoft can have thousands of tables, and understanding the business rules and logic to determine how to archive this data isn't trivial. Application complexity is a major driver for the adoption of third-party archiving tools. The classification of data and the development of policies for retention, migration to the archive and capabilities for retrieval are essential.
The phases of product evaluation and piloting should not just focus on the technology. They should also include the development and testing of standard operating procedures and the identification of roles and responsibilities needed to ensure that archiving policy requirements can be met. A wide-range rollout of an archiving solution demands regular monitoring and measurement to ensure policy compliance and evaluate whether performance and data capacity levels are meeting expectations.
The benefits of a successfully deployed database archiving strategy can be far-reaching. Performance improvements, better storage management and improved data retention are significant paybacks. Third-party database archiving products are starting to play a more prominent role in automating the archiving process. Take the necessary time to properly evaluate, design and test these new database archiving applications to achieve success.