Published: 12 Apr 2007
Archiving ERP tablespace data requires specialized tools and a close working relationship with the application's DBA.
Enterprise resource planning (ERP) applications are demanding, consume increasing amounts of storage and computing resources, and usually need to be running 24/7. All of that translates into a complicated backup and recovery environment. ERP applications are typically the first to be replicated for disaster recovery and, in some cases, parts of the application databases are the first to be archived.
An ERP app contains a treasure trove of data intricately tied to numerous critical processes that reflect the overall performance and health of a business. As more data is collected, the storage infrastructure can become strained, creating slower end-user response times, longer batch runs, shrinking backup windows and recovery tests that miss their recovery time objectives. To address this issue, a company will typically upgrade its ERP infrastructure, believing that more spindles, CPUs and I/O will fix the problem. Eventually, those enhancements will lose effectiveness and the company will decide to archive less-important ERP data, which leads to a whole new set of challenges.
To archive ERP data, specific data needs to be surgically extracted from various database columns, rows and tables across multiple tablespaces on multiple physical LUNs and, ultimately, from multiple disks. The extraction must be done at the ERP application level so that associated business logic can be referenced to perform the extraction. Extraction tools are typically implemented by a database administrator (DBA) with significant input from the business stakeholder. The goal is to extract older, less-critical transactions to reduce the overall size of the database. A storage administrator must understand the type of data produced by the extraction and how to effectively manage the storage that will host it.
Generally, there are two types of ERP archiving techniques: dynamic archiving and static archiving. Both pull content from a production ERP application to reduce its size, but from a storage perspective they require very different approaches.
|Tools for archiving ERP data|
This table lists some of the primary providers of enterprise resource planning (ERP) archiving software. In general, "Dynamic archive tables" are databases with lower access characteristics than their production counterparts. "Static archive files" contain static ERP content. "Document management" refers to the ability to manage and view the archive files, while "Primary" indicates the capability the application is most known for today.
Applimation Inc.'s Informia, Hewlett-Packard Co.'s StorageWorks Reference Information Manager (RIM) for Databases (formerly OuterBay's Application Data Management suite) and Solix Technologies Inc.'s Archivejinni can surgically extract data from a production Oracle database and place it in secondary archive tablespaces (see "Tools for archiving ERP data," this page). These tools understand the database schema and business relevance built into the schema, and then use this information to pull all components of the business transaction from the ERP application. Because data is stored in a structured dynamic format, the archives are sometimes referred to as "live," "active" or "dynamic."
Dynamic archiving can significantly shrink the size of an ERP production app and reduce the storage required for the production tablespace and all secondary copies of the tablespace: DEV, TEST, MAINT, TRAIN and QA. For example, one large manufacturing company recovered approximately 30TB of high-end storage after deploying archiving. The major advantage to archiving data within the same database instance is that the data is still available within a tablespace structure, so the archived data can be queried in the same manner as the production data. When searching the ERP application, data that's more than three years old may come from archive tables, while data that's three minutes old will come from production tables--but it's all transparent to the end user.
Structured dynamic data is stored in archive database tables. In some cases, the archive database tables reside in the same instance as the production database. This reduces the need to access entire tables and tablespaces, which increases application response time. Because the data is older and has less immediate business value, fewer people need access and there's less business impact if the data isn't as readily available.
It's important for the DBA and storage administrator to work together on the design of the archive tablespace. While the DBA will concentrate on separating older and newer data, the storage admin provides physical separation at the LUN level to ensure archive tables can be managed appropriately. For example, if all production ERP data is deployed on Tier-1 disks, the archive ERP data can be on Tier-2 storage. Similarly, if the ERP production data is backed up using a full clone with splits to increase performance, then the archive tablespace might be backed up using a snapshot.
From a storage perspective, the biggest advantage is that a storage administrator can move these less-critical tablespaces to secondary storage. The biggest disadvantage is that dynamic archives are still databases that need to be backed up. While dynamic structure archive tables are very effective at addressing end-user response time issues, they may not fix a shrinking backup window because the tablespace must still be backed up on a regular basis. In addition, dynamic archives aren't typically stored on media that's certified by the government for long-term storage. If that certification is required, you must go one step further and extract your ERP content to a static archive.
|SAP's archiving tools|
SAP, unlike most enterprise resource planning (ERP) application vendors, provides a built-in mechanism to archive data. Based on the SAP Archive Development Kit (ADK), SAP offers more than 600 archiving objects. SAP also allows third-party tools to extract business transactions from other database tables. The resulting data is written into SAP archive files containing meta data that, through SAP ArchiveLink, can be managed by a document management system.
Because SAP can handle some of the most complicated tasks of extracting business data, the method for archiving with SAP applications isn't as complicated as with some of the other ERP applications. Third-party products for SAP archiving include Open Text Corp.'s Livelink ECM (formerly IXOS), which leverages SAP's ADK to extract a complete business transaction from production tablespaces to a static file with meta data. The static file is then managed with a document management tool.
Some ERP archiving projects don't archive data into a structured dynamic tablespace, but instead archive the data into a static, file-based structure. Although the goal remains the same--to reduce the size of a production ERP database--the format of the removed content is quite different. EMC Corp.'s Documentum Archive Services for SAP, Livelink ECM from Open Text Corp. (formerly IXOS) and Princeton Softech Inc.'s Optim all extract content into unstructured static files or objects. These tools understand the business logic associated with each transaction. When the content is extracted, business-related meta data is created and associated with the file content.
The meta data enables the file to be cataloged by its business content. Livelink ECM, one of the most frequently used tools in SAP environments, leverages the SAP Archive Development Kit (ADK) to appropriately extract transactions from the SAP tablespace. Only SAP provides an API to extract transactions (see "SAP's archiving tools," this page). The output from Livelink ECM isn't another database table, but rather a file containing appropriate meta data and static content about the archived business transaction. Once created, the complementary information is removed from the production tablespace to reduce the size of the ERP production database.
Most software companies that develop dynamic archives offer the ability to create static archives. In some cases, these files have a proprietary format (Open Text's Livelink ECM, Princeton Softech's Optim) while other products use XML (HP's RIM for Databases, Solix Technologies' Archivejinni). When stored in a proprietary format, the extraction tool provides methods to review and query the extracted files. XML provides a method to describe the content independently from any specific hardware or software. Either way, the file objects contain the business meta data that makes it possible for the business to retrieve the information and understand its content.
Once the ERP content is extracted to a static file, other factors come into play to create an effective archive. The authenticity of the static data must be maintained over time and the media the data is stored on may have to meet government compliance regulations. Historically, static files were stored on optical WORM or tape, but today companies are replacing those technologies with content-addressable storage (CAS) arrays to reduce costs and provide quicker response times. In most cases, the disk arrays also support deduplication or single-instance storage to eliminate storing duplicate data. By storing the data in a disk array, it can be replicated to a secondary array in another location, which may eliminate the need to back up the archive data. Dynamic archives can be replicated as well, but because of the cost and complexity of database replication, it's generally reserved for production data only. Whatever media or storage array is used to store the extracted files, a document management system is essential to effectively manage the files and their content.
|Backup vs. archive|
The terms backup and archive are often used interchangeably, but they refer to distinctly different technical and business applications. Backing up data typically means creating a copy on media that allows it to be restored quickly in the event of data/network corruption or loss of the original. Backup data generally isn't searchable because it's assumed that if you need to restore, you'll likely restore an entire file or dataset. In addition, backups aren't usually kept for long periods of time or in a manner that guarantees authenticity of the information.
The biggest difference between backups and archives is versioning. Backups can contain multiple versions of a file or dataset to restore data to a specific point in time. An archive is a single version or "state" of data at a particular point in time that must be guaranteed to be authentic with the same integrity it had when it was in production. Another defining attribute of an archive is that once it's created, the information within it is usually removed from online storage.
Archival policies are typically driven by regulatory compliance or as part of an information lifecycle management initiative. In either case, it's crucial that archived data is searchable and readily available--producing archived data in an untimely manner can result in financial and, possibly, legal penalties.
Once ERP data has been extracted to a file, document management becomes critical. In many cases, document management capabilities are provided by the same tool that does the data extraction. For example, Livelink ECM moves data from SAP to files, and provides the means to manage and view the files. Princeton Softech can extract complete business transactions from JD Edwards, Oracle E-Business Suite and other ERP apps, and includes management and viewing features. It offers similar capabilities for ERP applications running on an Oracle or DB2 database structure.
Some archiving applications extract the data and hand off management chores to a separate document management tool. In addition to generating dynamic archives, HP's RIM for Databases and Solix Technologies' Archivejinni can create static archive files, but they don't offer document management capabilities. For example, once RIM has created an XML file or object with appropriate business-related meta data, the meta data must be loaded into a Documentum, FileNet or Mobius-type tool for long-term data management. Documentum Archive Services for SAP typically manages data processed by SAP in the form of "print lists" or reports. In this case, SAP extracts business data from the SAP tables and creates an appropriate file or object with associated meta data. Documentum Archive Services for SAP then uses SAP ArchiveLink to import the file and its meta data into the archive for long-term management.
Securing the archive
With all the movement of content, it's important to ensure that only authorized personnel have access to it. The application managing the content enforces rules that describe who can access which pieces of information. When the data is in a dynamic structured format, the database application maintains the access rules. Once the data is extracted to a file-based archive, the document management application enforces access control.
Database archiving is a crucial step in the long-term management of an ERP production application. Archiving moves data from one level of business criticality to another, with each level having different availability and performance requirements. Some companies stop with the creation of dynamic archive tables, while others may skip this step to develop a static archive. By understanding what data is archived and how it will be accessed, a storage administrator can ensure that archived ERP content is stored and protected appropriately.