Editor's Note: This is an excerpt of a three-part article written by Jim Booth for Storage magazine's February 2003 issue. To view the complete text and graphics for this article, click on this link. (Or, if you have not yet signed up for your free subscription to access Storage magazine content, you can do so from the following, online subscription link.)
The movie "Father of the Bride" contains the song "Give Me the Simple Life," an old adage that can be aptly applied to database storage. Sometimes it's the simple things that offer the greatest impact. This article explores data placement issues regarding different kinds of database objects and their I/O characteristics.
Once we know the type of objects being used and the access methods for each object, we can begin to optimize the database storage environment. This will require examining the different types of storage devices available and how each storage component can affect the I/O performance on our database. Storage class, channel connectivity, levels of availability and overall performance are all considerations.
To help you better understand how all the components work together, we'll build a high-performance online transaction processing (OLTP) database storage environment for our hypothetical 34GB database. Hopefully, the techniques you learn here will be transferable to your own database environment.
If you want to enhance database performance, it's important to know the type of data stored in your database. The objects that make up our database have unique considerations, so it's wise to first analyze the type of objects being used and understand how they'll interact as data is placed on the disk drives.
The next step is to characterize database access patterns. The type of datafiles that make up the tablespaces being accessed is critical. A randomly accessed lookup table, for example, won't be treated the same as large binary objects which have a sequential access pattern.
The objects of most interest are the tables, indexes, redo logs and the archive log files. Each of these objects and their subtype have different performance attributes and storage needs. By understanding these needs, we can begin to determine the best data placement for each object. The objects are listed below in order of placement importance.
Redo logs. These must be written immediately to disk and all of the redo logs must be written for an update to be committed within the database. For this reason, the redo logs can have a significant effect on the overall performance of the database, although they're small. In our example, they only consume one out of 34GBs. Careful placement of the redo logs and mirrors is essential.
Tables. They're usually the largest component of the database--10GB random access, 20GB sequential, in our example--and the tables that make up the tablespaces within the database hog the most storage. As a result, table placement can have a significant effect on database performance. Tables are indexed or non-indexed and table access can be sequential or random. It's important to determine the type of tables you have within your database and group them according to their access patterns. For manageability, it's also good practice to minimize the number of tables within the tablespace to allow easier analysis and movement of a badly placed table.
Indexes. They're smaller than tables and should occupy their own tablespace. Because indexes are accessed randomly, they benefit from a smaller block size. Additionally, index tablespaces should be separated from the data tablespaces to avoid conflicting I/O patterns which reduce performance.
Archived redo logs. These require a high level of availability, but don't have high-performance requirements. For this reason, archives can be used to balance access density across physical drives. The only caveat is insuring availability by not placing an archive on a physical drive that contains data tablespace information.
In our example [later in this article], we'll use all these objects with a variety of access patterns. Redo logs will be mirrored, tablespaces striped, indexes isolated and archives moved to slower drives.
Database Characterization. By looking at the components making up the database, you can begin to characterize your database and based on component sizes begin planning the placement of the actual objects on the storage subsystem. Another concern is the type of data access pattern.
Random Access. The tablespaces within our database example will be randomly accessed. Random I/O benefits by placing data on several disks with a large number of channel paths at a low block size so that just the data structure of interest is passed from the disk to the host memory. Random access is mostly found in OLTP applications.
Sequential Access. The redo logs are accessed sequentially. Large block size, contiguous data and pre-fetch algorithms allow the disk subsystems to drive high throughput with a minimum number of I/Os. Data warehousing applications and specialized applications such as streaming make use of the sequential access method.
Applying what we learned about placement of the database objects and the access patterns just discussed, our example database lets us look at the available storage subsystems, weigh the performance and availability considerations and lay down the data.
Logically dividing database storage
Integrated cache disk array
For optimal performance and efficiency, database components should be placed on both high-end (top) and midrange storage.
[...This article is continued on Storage magazine...]
About the author: Jim Booth is a ten-year storage industry veteran, freelance writer and a principle analyst at Hard Problem, Inc., an independent storage research company.