Published: 02 Jul 2012
For many companies, business depends on the performance of their database applications. Here’s how to ensure your storage systems are providing all the help the database apps need.
With so much recent emphasis on unstructured data, it can be easy to overlook the more formally constructed information held in databases, which are often at the heart of corporate operations. Databases process the information a company needs to do business and, in some cases, like the databases behind a Web storefront, they’re directly involved in the creation of the company’s revenue. The faster a database performs, the more transactions it can process, and the quicker it can deliver answers or contribute to the bottom line.
When a database application isn’t performing well, the source of the problem is often the supporting storage system, which could indicate an upgrade is needed to remedy the situation. Of course, an upgrade can be expensive and may not always be as effective as desired. The reality is that improving database performance may not require any additional hardware purchases at all, and won’t always require the amount or type of hardware vendors may suggest.
What to accelerate
Each database management system has its own naming conventions for its constituent components. In general, however, databases have a transaction log file that receives new data. This relatively small file increases database performance by sending an acknowledgement back to the client that the new or modified information has been accepted. This process is much faster than if the information was simply written directly to the main file. Later, in the background, the log file is flushed into the main data file.
There are two performance concerns with the transaction log. First, it needs to be able to accept multiple writes from hundreds or thousands of users simultaneously. Second, it must empty itself fast enough so as not to create a “write-through” situation where update performance is slowed to the point that the log file is ineffective. These two situations typically occur periodically throughout the day when demand is high, leading to unpredictable performance. Write performance is critical for these files.
Another fundamental set of database components are the indexes, the smaller files used to improve search performance. Searches can be done against these indexes, which point the query directly to the row or group of rows that contain the rest of the information related to the searched-for item. This is much faster than searching the entire database, top to bottom. Typically, the database administrator will select a few key fields within a row to be indexed -- “company name” or “account number,” for example. The index needs to be updated any time a new row is added or a key field in an existing row is modified. For index files, write performance is important and read performance is critical.
Finally, there’s the database itself. In most cases, indexes and log files are the primary bottlenecks since the database is either updated or queried as a result of activities related to those files. But there are situations when an index search isn’t possible, there are too many fields to be searched or indexing won’t provide a performance boost. There are also certain databases where the granular control over where these files are placed isn’t a user function, and is instead hard set by the application.
Is storage really the problem?
The first step is to make sure the performance problem is indeed storage. The simplest way to do that is to review CPU utilization on the server(s) executing the database application. In general, if CPU utilization is low but performance is a problem, there’s a good chance it’s a storage performance issue. But there are also more detailed approaches that can and should be used. Almost every database application includes its own performance analysis tool, plus there are third-party applications and consultants who specialize in database apps. In general, it’s amazing how accurate the instincts of an experienced database administrator can be. If the DBA is complaining about performance, storage managers should listen.
How storage can be the problem
Storage can slow down a database application in two ways. First, the storage can be overwhelmed by the number of inbound requests. These requests are stacked into a queue, the number of which is referred to as “queue depth,” and emptied on a first-in first-out basis. When performance is good, this queue depth is typically in the low single digits. As the queue depth grows, the application has to wait for its requests to be processed. This statistic is relatively easy to monitor as most operating systems will report the queue depth of a given drive or volume. Any operation that involves hundreds or thousands of simultaneous requests will typically add to the disk queue depth.
Storage can also limit database application performance based on its ability to service individual requests. In a traditional hard disk drive array, this “organic” performance is essentially the speed at which a hard drive can locate and retrieve the required data blocks, since the majority of this time is spent waiting for disk platters to rotate to a specific spot. This response time can also be measured with operating system tools. While the number of users may have some impact on response time, it’s often an issue for single-thread operations, like when new data is added or existing data is modified via the log file.
Storage acceleration tune-up
Dealing with indexes. When it comes to deciding which of the three components of the database to pay attention to, indexes are typically the most problematic across a wide variety of use cases. In short, almost all databases are searched and in most cases indexes are used. Generally speaking, the more indexes a database has the slower the update process becomes. Making these indexes update faster and return faster results will almost always improve performance. These indexes are impacted by the number of simultaneous search requests, as well as the time it takes to service those requests. Indexes are mostly read-intensive files, but they do experience significant write activity as indexed values are added or updated.
The correct approach to solving an index performance issue largely depends on the environment. A single database server with data being stored locally can see performance improved by installing a mirrored set of drives on that server, dedicated to storing the index files. Even a single server that’s storing its data on a storage-area network (SAN) can benefit from this approach, especially if the storage network is slow (1 Gb Ethernet iSCSI or 2 Gb Fibre Channel, for example).
If the database is shared across multiple application servers, local disk logging may not be an option. Again, it depends on the database, but usually an improvement to the shared storage environment is needed. This could be as simple as dedicating a set of drives on the SAN to the specific task of storing indexes, once again mirroring them instead of running them in a RAID group. It could even mean short stroking those drives so that only the outer cylinders of each platter are used. Indexes are relatively small and typically don’t need the full capacity of a drive.
In all of the above cases, flash-based solid-state drives (SSDs) offer an alternative to solving the index performance problem. Because index files tend to be small in size, an investment in a couple of enterprise-class SSDs will deliver a very affordable performance boost for a database server using directly attached storage. When shared storage is used, a few SSD drives installed inside the storage system to hold the indexes may alleviate the performance problem, although at a somewhat higher cost. It’s important to note that in both cases a complete storage refresh isn’t required.
Dealing with log files. Log files are similar to indexes except they typically handle write requests exclusively and configurations similar to the ones described above should also improve their performance. There’s a new consideration when flash SSD is used. The write intensity of log files can be very high and flash-based SSDs have limitations in the number of writes they can sustain over their lifetimes. While that write cycle number has improved over the last few years, log files are the one type of workload that can still prematurely wear out a flash-based SSD.
In log file environments where the write intensity will be very high, it’s advisable to use a hot-swappable multi-level cell (MLC) drive technology, to use single-level cell (SLC)-based SSDs or to even consider DRAM-based SSD. DRAM systems now provide very reliable recovery from power failure, offer significantly better write performance than even SLC flash and don’t suffer the endurance issues that flash-based technologies do. But DRAM is considerably more expensive than NAND flash storage.
Dealing with the database. The final database performance condition involves situations where the whole database needs to be accelerated, not just its index and log files. This can emerge as an issue because the bulk of the queries or the transaction rate exceeds the ability of the storage system to keep pace with the environment. There are two options in these situations: moving the entire database environment to a faster storage system or using a flash-based caching technology.
Opting for a new, faster storage system doesn’t necessarily mean a move to flash-based storage. Contemporary mechanical storage can still turn out ample IOPS when properly tuned. The legacy practice of short-stroking drives can also be avoided. As described above, short stroking means drives are formatted at one-half or one-third of their normal capacity by using only the outer tracks or cylinders of each disk platter. This improves response time because the linear speed of the disk track moving past the drive head is faster on the outer cylinders. Using less than each drive’s full capacity increases overall drive count, which can also improve performance by allowing more queued storage requests to be simultaneously addressed. Unfortunately, this practice is also expensive, wasteful and a significant draw on power resources when applied to the entire database.
Most current enterprise storage systems can provide wide striping, which allows all the drives in a storage system to be used for a particular volume or logical unit number (LUN). Other less performance-sensitive volumes can also use these drives without interfering with the database. Some storage systems and storage networks will allow you to set priorities for certain types of data so they have first access to the storage bandwidth. Finally, some storage systems will let you specifically place certain types of data on certain portions of the disk drive. For example, database information can be prioritized to the outer portion of the hard drive for better performance and file systems can be placed on the slower, inside portions of the hard drive.
Another option is to use a flash-based caching technology. These products can be installed in the server, on the network or on the storage system itself. Each location has its advantages and disadvantages. The server typically provides the best performance as the cache is often a PCI Express (PCIe) SSD and has direct access to the server’s CPU. However, those caches typically only cache reads, especially in an environment where the database is shared, due to concerns about maintaining “coherency” between the cache and the primary storage it’s supporting. Network caches can typically do both read and writes, and work across a variety of storage systems. They’re available for NFS or Fibre Channel (FC)-attached environments. The challenge is the added network latency. Finally, storage system-based caching is also available but it only works on the storage system in which it’s installed and isn’t universally available across a variety of vendor storage systems.
Start your engines…
Improving database storage performance is a process. In most environments, there’ll always be an area that requires some tuning. The key is to tune it to the point where today’s performance demand is met and deal with tomorrow’s performance issues when they arise. Technology will continue to evolve and the storage manager’s toolkit will become more diverse and effective over time. A step-by-step approach also allows a minimal amount of your budget to be spent. A database storage performance problem doesn’t always mean ripping and replacing the storage system.
BIO: George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.