The movie "Father of the Bride" contains the song "Give Me the Simple Life," an old adage that can be aptly applied...
to database storage. Sometimes it's the simple things that offer the greatest impact. This article explores data placement issues regarding different kinds of database objects and their I/O characteristics.
Once we know the type of objects being used and the access methods for each object, we can begin to optimize the database storage environment. This will require examining the different types of storage devices available and how each storage component can affect the I/O performance on our database.
Storage class, channel connectivity, levels of availability and overall performance are all considerations.
To help you better understand how all the components work together, we'll build a high-performance online transaction processing (OLTP) database storage environment for our hypothetical 34GB database. Hopefully, the techniques you learn here will be transferable to your own database environment.
|Logically dividing database storage|
Integrated cache disk array
For optimum performance and efficiency, database components should be placed on both high-end (top) and midrange storage.
If you want to enhance database performance, it's important to know the type of data stored in your database. The objects that make up our database have unique considerations, so it's wise to first analyze the type of objects being used and understand how they'll interact as data is placed on the disk drives.
The next step is to characterize database access patterns. The type of datafiles that make up the tablespaces being accessed is critical. A randomly accessed lookup table, for example, won't be treated the same as large binary objects which have a sequential access pattern.
The objects of most interest are the tables, indexes, redo logs and the archive log files. Each of these objects and their subtype have different performance attributes and storage needs. By understanding these needs, we can begin to determine the best data placement for each object. The objects are listed below in order of placement importance.
Redo logs. These must be written immediately to disk and all of the redo logs must be written for an update to be committed within the database. For this reason, the redo logs can have a significant effect on the overall performance of the database, although they're small. In our example, they only consume one out of 34GBs. Careful placement of the redo logs and mirrors is essential.
Tables. They're usually the largest component of the database--10GB random access, 20GB sequential, in our example--and the tables that make up the tablespaces within the database hog the most storage. As a result, table placement can have a significant effect on database performance. Tables are indexed or non-indexed and table access can be sequential or random. It's important to determine the type of tables you have within your database and group them according to their access patterns. For manageability, it's also good practice to minimize the number of tables within the tablespace to allow easier analysis and movement of a badly placed table.
Indexes. They're smaller than tables and should occupy their own tablespace. Because indexes are accessed randomly, they benefit from a smaller block size. Additionally, index tablespaces should be separated from the data tablespaces to avoid conflicting I/O patterns which reduce performance.
Archived redo logs. These require a high level of availability, but don't have high-performance requirements. For this reason, archives can be used to balance access density across physical drives. The only caveat is insuring availability by not placing an archive on a physical drive that contains data tablespace information.
In our example, (see "Databases need more than one kind of storage") we'll use all these objects with a variety of access patterns. Redo logs will be mirrored, tablespaces striped, indexes isolated and archives moved to slower drives.
Database Characterization. By looking at the components making up the database, you can begin to characterize your database and based on component sizes begin planning the placement of the actual objects on the storage subsystem. Another concern is the type of data access pattern.
Random Access. The tablespaces within our database example will be randomly accessed. Random I/O benefits by placing data on several disks with a large number of channel paths at a low block size so that just the data structure of interest is passed from the disk to the host memory. Random access is mostly found in OLTP applications.
Sequential Access. The redo logs are accessed sequentially. Large block size, contiguous data and pre-fetch algorithms allow the disk subsystems to drive high throughput with a minimum number of I/Os. Data warehousing applications and specialized applications such as streaming make use of the sequential access method.
Applying what we learned about placement of the database objects and the access patterns just discussed, our example database lets us look at the available storage subsystems, weigh the performance and availability considerations and lay down the data.
Each day there seems to be another choice of storage, ranging from small appliance-type devices with a few drives to large unified storage systems designed to handle both block and file data. The type of storage class where your database is stored is the first storage consideration for data placement. The three types of storage classes we're going to consider for data placement are JBOD, Intelligent Controller and Integrated Cached Disk Arrays.
JBOD. Provides the least amount of data protection and performance, but at low cost. To achieve any type of performance with a database placed on this type of storage, one must consider the physical placement on the drives and utilize the operating system or volume manager software to realize performance gains. JBOD could be suitable as a low-cost solution for infrequently referenced data or certain data elements such as archives or backup copies.
Intelligent Controller. While a step above JBOD in performance and availability, a subsystem with an integrated controller is going to do more to offload host resources than to actually improve the overall database performance. Mirroring and striping can be performed at the controller level with this type of subsystem.
Integrated Cached Disk Array (ICDA). Being the most sophisticated of the storage class, it would be easy to just place all the database objects on the ICDA. And while this is a completely acceptable solution, it is also the most expensive. One way to mitigate the cost associated with using the ICDA is to mix physical drive sizes (larger, slower drives for archives) and a mixture of protection schemes (hardware-based mirroring on redo logs).
There are pros and cons for each class of storage subsystem. The goal in selecting a storage class is to mix and match the system types to gain the best price/performance ratio while giving greatest number of options with regard to data placement.
Controllers and channels
At minimum, we'll need two controllers within our storage subsystem--the more the better. Today, some high-end systems have as many as 16 independent front-end controllers. A high number of controllers allows for a high level of parallel I/O through the storage subsystem. Additionally, when we scale, we would like to scale as near linear as possible. By adding additional channels and controllers in a parallel fashion, this can be accomplished.
Another consideration is the debate over Internet Protocol (IP) and Fibre Channel (FC) connectivity. While iSCSI and other IP-based protocols are making great strides, only a small number of latency-tolerant database applications are currently making use of IP-based channel devices. iSCSI is being used primarily for storage network connectivity, which will certainly change as 2Gb and 10Gb IP infrastructures evolve.
Performance and availability
Performance gains come primarily by spreading the datafiles across as many physical drives as possible. This means concatenating or striping across all the drives on separate controllers to gain as much parallel access as possible, while avoiding striping back on the same drive and creating hot spots of high access density.
A logical volume manager residing on the host server not only eases management, but is also used to increase performance. Striping at the host allows parallel access from the host bus adapter (HBA) all the way down to physical drives--or cache slots--being striped across. By starting the parallel access at this level, greater performance can be achieved. In our example, I've chosen Veritas Volume Manager, but there are several other software programs for various platforms that accomplish the same task.
Availability should always be a primary concern when deciding where to place your data. DBAs will tell you that they can't afford to lose any data. While this may be true, there are ways to mitigate the risk of loosing data while balancing the high price of ICDAs. The first step is to use hardware or software to apply RAID protection. While RAID software is fine, it's preferable to make use of hardware RAID to reduce channel overhead, especially when mirroring.
|Key points to remember:|
Mixing and matching
The last storage consideration is making good use of the storage systems you already own. While it would be nice to buy the latest large-cached, high-available subsystem every time you launch a new database application, that's usually not reality.
It's more likely that you'll have to make the most of existing storage by freeing space on more expensive systems by moving less performance and availability critical data components to less costly storage gear.
In our example we use one Integrated Cached Disk Array (EMC/HDS-class) and one Intelligent Controller Array (LSI/Clariion-class). Although we're using EMC and LSI in the example, the important distinction is the class of storage. The high-end class includes EMC Symmetrix, HDS Lightning, and IBM Shark, while controller-based includes LSI, HDS Thunder, NetApp and others. Indexes, redo logs and high-performance tables are most appropriate for the cached array. We'll place the archives and low-performance tablespaces on the controller-based subsystem (see "Logically dividing database storage") which shows the subsystem placement for the database objects.
|How we place data on our arrays|
EMC 8830, 4 channels, 12 73GB drives free, mirrored
LSI 4400, 2 channels, 4 36GB drives free, RAID-5
In our example, we've physically divided the database components between the EMC and LSI arrays. The arrays are connected to a cluster consisting of two Sun 420 servers, Solaris 8, Oracle 91, Veritas Manager and Cluster server, with a block size of 8Kb.
Pulling it all together
After knowing the data objects and available storage platforms, we're ready to determine how the data should be placed on the physical drives. To maximize parallelism, data should be spread across as many physical spindles as possible by striping. Striping is most effective at the host level because you can not only stripe across the disk drives, but also stripe across controllers.
To complete our environment, I added a cluster of Sun servers with a volume manager consisting of two Sun 420 servers Solaris 8, Oracle 9i, Veritas Volume Manager and Cluster Server, with a block size of 8Kb.
The first step is to isolate the redo logs. One log each will go on separate physical 73GB drives on our high-end array (EMC in our case). Because we'll only be using about 1GB of space, use the smallest hyper available on the drive to avoid wasted space. Additionally, because the redo logs are on the EMC, they are mirrored using hardware.
Secondly, the indexes are placed on separate physical drives to isolate the random I/O from other database elements. Ideally, we want to isolate these further by placing the indexes on separate back-end buses, but in this example we would have to reduce the number of redo logs or add more back-end adapters (which in most cases would mean a larger array).
The next performance-sensitive data are the data tablespaces. One of these tablespaces requires high performance, so we'll deal with that first. On the host, the tablespace will be striped across all four channels and reside on eight physical drives using a chuck size of 64Kb. The 64Kb chunk will map nicely to the cache slots on the EMC. The idea is two keep two I/O blocks in the queue for each channel, but not saturate the channels by moving large blocks since this is an OLTP application. We are also taking advantage of dedicated back-end directors by striping across drives that sit behind unique disk adapters. The remaining tablespace will be striped across the two channels and four drives on the LSI system.
Our sequential table could go on either the midrange on the LSI or EMC, but because I'm going to assume mostly read, RAID-5 will work just fine. We'll stripe these tables with a chuck size of 256MB to keep our channels full. Lastly, we'll fill the empty space on the LSI drives with the archive logs.
All our database components now have a home. We've analyzed the database objects and placed them appropriately based on their unique characteristics. The objects are grouped according to I/O access pattern and on the appropriate storage systems for performance and availability.
It's important to note that this example was in a clean environment. That isn't always going to be the case. It's equally important to know data that already exists as well as the new data being added. A comprehensive analysis should be done of the existing environment before placing additional data.