Published: 12 Apr 2007
Not all RAID types are created equal; the RAID you use can have a major impact on application performance.
Vendors won't hesitate to tell you how beautifully parity-based RAID (RAID-5, RAID-6, RAID-4, etc.) works in their storage subsystems, making it almost unnecessary to use any type of striped/mirrored RAID protection. Users may see it as a way of getting a lot more usable storage out of the subsystem but, as the old adage warns, "Nothing in life is free." The same principle applies to storage: What you get back in usable storage, you pay for with processing power. One of the most common problems I encounter during storage assessments is a lack of attention to matching the workload profile of the application to how storage is provisioned in the array. This is often driven by sheer economics but, even when it isn't, bad storage practices can simply result in a poorly balanced system that shows signs of stress. Stress is the asymmetrical impact on various components in the array, such as cache, front-end ports, back-end disk directors and the backplane.
RAID and writes
RAID is still one of the most revolutionary storage technologies, providing the ability to take a bunch of disks and virtualize them into looking like a single entity. How that entity functions is transparent to the host operating system; whether the entity is RAID-5 (distributed parity) or RAID-10 (mirrored and striped with no parity) makes no difference to how the host reads and writes from this LUN.
However, the combination of reads and writes on the host makes all the difference in how the array is affected. Add a few hundred of those reads and writes, and you'll see the effects on the array. To a large extent, an array is capable of absorbing such massive reads and writes in its cache because it's tuned for the purpose of maximizing read and writes from hosts. But if the amount of data arriving on the front end (from the host) can't be passed at the same or higher pace to the back end (the disks), you begin to see a backlog in cache. For writes, this backlog is known as "write pending" and is the extra buffered writes waiting to be written to disk. A high write-pending ratio is unhealthy and can often lead to other problems, such as high I/O wait times on the host and, in extreme cases, the array may stop accepting writes.
Under normal operating conditions, every array will have a write-pending backlog. That's because even with the fastest disk, solid-state memory is still faster. But like a pipe that allows water to flow without any obstruction, pending writes in cache need to be written to disk at a steady pace--ideally, at the rate they come in. Meanwhile, the host is happily chugging along, assuming it has completed its writes. It's the array that has to ensure that the writes are committed to disk in the fastest time possible. If it loses the writes for whatever reason (power loss, failed controller, etc.), the writes are gone. What remains is in an inconsistent state. Vendors will tell you that the chances of this occurring are rare, but it's not unheard of.
One common mistake is thinking that throwing more cache at a problem fixes it. To remedy the situation, you need to go back and figure out the ratio of reads and writes generated at the host and then try to match the RAID type. Essentially, it means that the workload profile of whatever is generating these reads and writes needs to match how the disks are arranged in the array (their RAID type).
Parity-based RAID types aren't well suited for applications with heavy random writes, but one of the most common reasons for high write-pending ratios is a high percentage of random writes on a RAID-5 or RAID-6 volume. Such applications are better suited for RAID-10 or RAID-1 (with host-based striping). Sequential read and write workloads, on the other hand, work well on parity-based RAID because head movement is minimized in sequential reads and writes. A write-intensive workload associated with parity-based RAID volumes can also cause processor overload because the overhead of parity calculation affects processor performance.
Let the host share the load
Another thing to keep in mind is stripe size. It's common knowledge that stripes (not concatenations) are generally good for performance. Stripes give you the ability to have a higher number of spindles acting in your favor. However, there's a sweet spot beyond which too many spindles might not be such a good thing. The number of spindles depends on the read and write requests on the host, as well as alignment to the stripe column width and size. Even a slight misalignment in these can cause problems such as disk crossings (i.e., the array perceives a single write as two sequential writes). This results in suboptimal performance. The use of host-based volume managers can nicely complement striping in the array with striping on the host and minimize the impact of a large stripe all on the array side.
So how do you make sure all of this translates into pairing the workload profile with the correct RAID type? Divide and conquer. By ensuring that no single system or component bears the onus of meeting the I/O requirements of the application, you can minimize the impact the application workload has on any single component in the I/O chain. In other words, the more horizontally you spread the I/O (within reason), the better the performance. A well-balanced system consists of host-based components bearing equal responsibility with those in the storage array. Host-based components include multipath software, volume management and file systems. Array-based components include front-end (host) and back-end (disk) controllers. When you delegate your I/O in such a way that neither component contributes more than 25% of the imbalance, you can rest assured that if a component should fail, the redundancy won't exceed 50%. That is, if you were to stress the environment, no component should exceed 25% utilization relative to all the others.
Analyze the app
You should also examine the application environment itself. Do all components need to reside on the same file system or volume? Can they be segregated based on the I/O profile of each component? Determining an application workload profile isn't an easy task. Database administrators will tell you that not all components of a database have a random write profile--only the data components do. So mixing all kinds of profiles on the same file system isn't the best approach. Instead, create different file systems on disparate spindles (or RAID groups) so that one file system doesn't affect performance on the other.
All this is fine and dandy when you're planning from scratch, but that's often not the case. You may have performance problems in your existing environment. First, try to gather as much data on the symptoms as possible, and then find out where the problem truly lies or what's introducing it. For example, if you see high write-pending utilization in cache, find out if it's being caused by an imbalance on the host or array. If the host has volume management and all volumes are striped, you're probably not spreading the stripe across all the back-end resources or you have an improperly matched RAID type. Most storage vendors provide tools that allow you to move data from one LUN to another transparently, so you can use them to move from RAID-5 to RAID-10 or vice versa.
If the host lacks volume management, and you have a single large file system that's pounding away on a single large LUN, you may have to convince your systems folks to implement volume management and spread your I/O. The other option is to use array-based "striping" features that allow you to take that single large LUN and put it on a stripe consisting of multiple smaller LUNs that go to different RAID groups. The key thing is to explore all of your options. And one final word that always works in your favor: Patience.