This article can also be found in the Premium Editorial Download "Storage magazine: Hot tips for buying storage technology."
Download it now to read this article plus other related content.
|When is a gigabyte not
a gigabyte? (continued)
| is that other copies usually exist, and are controlled by different organizations. Each parallel line of development will have its own snapshot of a stable copy of the production data, plus another copy to test against. Both are RAID-protected like the production data. An independent test organization will have similar storage requirements. The stable copy of data is probably backed up to tape for both organizations. In this scenario, a single development and test effort would add up to ten more copies of the data. The line of business staff may request additional backups of the production data before any significant change to an application, perhaps twice a month.
Viewed across the entire enterprise, one can see how each gigabyte of production data could require an additional 25 copies of the data to support it. Are you running a tight shop, and using tape-based disaster recovery to save money on storage? It would still be hard to operate with less then ten copies of the data.
Where is the technology today?
Specialized database archiving technology is relatively young, with dedicated third-party products entering the market in late 1999 and 2000. The enterprise faced with problematic database growth must choose among three options: ad hoc methods, administrative tools native to their database applications or specialized third-party archiving products (see "Comparison of database archiving approaches").
|Comparison of database archiving approaches|
Ad hoc archiving typically begins as a reaction to intolerable circumstances. The database has grown so large that batch processing windows are being missed, and ever-expanding storage and processing requirements catch the attention of senior management. The quick response is the equivalent of an emergency liposuction: A highly skilled team or individual combs through the database, pulling out as much old or unused data as they safely can. Time, cost and expertise constraints limit the depth and breadth of this exercise, so only the easy targets are found.
The operation continues until the database is lean enough to meet minimal operational requirements. If done with some consideration for the future, the purged data will be kept in a separate instance of the database, so referential integrity will (or may) survive, as will some ability for future access. However, at the next crisis a different team may respond using different techniques, and over time it's a near certainty that the purged data will be either lost or become so disjointed as to be unusable. Building and supporting a customized tool to handle the situation effectively is often beyond the budgets and skill sets available.
Native database administrative tools start to look attractive after the enterprise has dealt with a couple of crisis situations. Tools native to database vendors have the advantage of working well with the application and are supported by expert technologists. Oracle, SAP and other application vendors have their own tools and methodologies. The tools make it easier to extract and manage data; continued access by the application is straightforward. However, it isn't uncommon for a major upgrade of a database product to require production data to be brought forward into a new format incompatible with the previous version. This is no small task.
|Third-party database archiving products|
Maintaining access to archived data requires that it be migrated with production data, which adds to the support burden. However, many native database management tools weren't designed with archiving in mind, and lack the notion of higher-level policies to drive and automate the archiving process. This can be overcome by building custom scripts and procedures to manage the process, but this brings us back to the long-term support issues involved with the ad hoc approach. In a single-vendor shop, this may be manageable. In a large shop with several different applications or databases, the problem is managing the separate solutions across all of the different platforms. An enterprise with active DB2, Oracle, SAP, Siebel and PeopleSoft applications is faced with developing and managing separate archiving solutions for each flavor, and the costs quickly become untenable.
A small number of third-party products are now available that are specially designed for database archiving, such as products offered by Princeton Softech, in Princeton, NJ, and OuterBay Technologies, Cupertino, CA (see "Third-party database archiving products"). These products feature policy engines to help capture and manage the business rules that drive the archiving process. They have tools that facilitate extracting data and managing archives. Some also have monitoring and reporting facilities to track data growth and make projections on database and archive size. They provide a way for the application to easily access archived data as needed.
Archived data can be kept online in application-native format or in an application-independent format that still preserves referential integrity. The latter is especially useful for moving less-used data to nearline or offline storage for extended periods of time, without the necessity of bringing archived data forward with each new release of the application. Once an archiving policy for the database has been established, the archiving product handles movement of data between application and archive, and performs translation between application versions as needed. Additionally, these specialized third-party tools cover an increasing number of database and enterprise resource planning (ERP) applications, so that one archiving tool can serve the entire enterprise.
This was first published in March 2004