This article can also be found in the Premium Editorial Download "Storage magazine: Hot tips for buying storage technology."
Download it now to read this article plus other related content.
|When is a gigabyte not a gigabyte?|
|Like a soldier in the front lines, a gigabyte of production|
| data is supported by many others behind it. To keep the production data available 24x7, behind the scenes are local mirrored copies, remote mirrored copies for disaster recovery and copies for each parallel development and test effort, plus multiple copies of all of these.
Consider the following best practices advocated by various vendors for mission-critical production data: Each production volume is protected by mirrored storage. To provide fast recovery from errors, two additional rolling snapshots of the data are kept online. To enable disaster recovery and business continuance, two copies of the data are kept on a remote storage system. One or both of those copies will be mirrored to protect the storage on the remote side in case production operations must be transferred to the recovery site. To facilitate non-disruptive backups of the production data, another independent copy of the data may exist (the life of the rolling snapshot is typically not long enough to be used to backup a large database). Add another copy to do backups at the recovery site--although it may be possible to double-dip and use the space from one of the remote copies used for replicating the data prior to the disaster.
Now, consider the tape backup of production data. There will be at least one full copy of the data, followed by multiple versions of incremental changes to enable the business to recover data back to a given point in time. Assuming 25% of the database contents are modified in one form or another in the course of a week, the equivalent of another full copy of the data is required to protect a month of data. Best practice is to keep one copy of the tapes on site and another in off-site storage, adding another two full copies of data. The total so far is 13 copies of production data, under full view and control of the IT department.
File-oriented data can be managed using familiar operating system utilities, but from a storage administrator's perspective, a database is a monolithic container. The storage administrator only knows the size of the container and where it's located. Managing its contents is the domain of the database administrator. But make no mistake--all databases need to be archived for three main reasons:
- Regulatory compliance. For many organizations, the primary reason for considering an archiving strategy is to address regulatory concerns and risks. These days, almost everyone is aware of the increased focus on the need to retain and retrieve certain kinds of information and the penalties associated with the inability to do so. Many of these regulations dictate long-term retention requirements that place a significant burden on organizations attempting to comply.
- Managing data growth. Another factor is the continuing--and often uncontrolled--growth of data. Corporate databases are becoming the primary repositories for critical enterprise data, and are growing at an estimated 60% to 125% annually. Many primary-tier applications are built on relational databases, and the use of advanced data protection techniques such as rolling snapshot volumes and remote replication are most often applied to database volumes. It's surprising how many copies of the data may exist when you add backups to the mix (see "When is a gigabyte not a gigabyte?").
- Application Performance. One of the most critical drivers for database archiving is application performance. Quite simply, as databases grow, they slow down. Typically, 50% or more of the data residing in databases is historical or inactive. Yet when database searches or lookups are performed, this inactive data is processed and combed through along with current data, resulting in a significantly slower application response. And of course, the performance of ancillary activities such as backup and recovery is also affected (see "Why upgrade when you can archive?").
Archiving data that resides within a database presents some significant challenges. The first challenge is to determine what needs to be archived and when: in other words, data classification and policy development. This can be a complex issue because in most organizations, arriving at an answer is a multidepartmental and multifunctional effort. Cross-functional teams of IT infrastructure and application groups and lines of business, as well as functional areas such as finance and legal, are required to classify data and establish policies for movement and retention. Then, there's the whole range of technical and process issues. These include:
Heterogeneity. Most organizations deploy a wide range of applications that are often built on several different databases. This could include legacy products or older versions of current products. Archiving this range of information could require a variety of tools and processes, thereby increasing complexity.
Application complexity. Even after completing data classification and establishing policies, the rules associated with the business logic of the application and internal relationships and dependencies must be considered. This requires a comprehensive understanding of the application.
Retrieval considerations. Storing years of database backup tapes isn't difficult; retrieving a particular set of information is the hard part. The difficulty of retrieving this information increases in direct correlation to the age of the data. Issues such as media readability and compatibility, system dependencies and application versions must be considered. Above all, there's the problem of locating and identifying the specific information among the many generations of data.
Data destruction. The news is rife with high-profile investigations demonstrating that in many situations, it's just as important to destroy data as it is to retain it. Data retained for longer periods than required for regulatory purposes may become a liability. If an investigation or legal proceeding takes place and some relevant data is discovered, it may need to be produced, even if there was no legal obligation to retain it. In addition to potential liabilities, the cost of retrieving the data can be substantial.
This was first published in March 2004