Best Practices: Balance workloads with RAID types

12 Apr 2007 |

Not all RAID types are created equal; the RAID you use can have a major impact on application performance.

Vendors won't hesitate to tell you how beautifully parity-based RAID (RAID-5, RAID-6, RAID-4, etc.) works in their storage subsystems, making it almost unnecessary to use any type of striped/mirrored RAID protection. Users may see it as a way of getting a lot more usable storage out of the subsystem but, as the old adage warns, "Nothing in life is free." The same principle applies to storage: What you get back in usable storage, you pay for with processing power. One of the most common problems I encounter during storage assessments is a lack of attention to matching the workload profile of the application to how storage is provisioned in the array. This is often driven by sheer economics but, even when it isn't, bad storage practices can simply result in a poorly balanced system that shows signs of stress. Stress is the asymmetrical impact on various components in the array, such as cache, front-end ports, back-end disk directors and the backplane.

RAID and writes
RAID is still one of the most revolutionary storage technologies, providing the ability to take a bunch of disks and virtualize them into looking like a single entity. How that entity functions is transparent to the host operating system; whether the entity is RAID-5 (distributed parity) or RAID-10 (mirrored and striped with no parity) makes no difference to how the host reads and writes from this LUN.

However, the combination of reads and writes on the host makes all the difference in how the array is affected. Add a few hundred of those reads and writes, and you'll see the effects on the array. To a large extent, an array is capable of absorbing such massive reads and writes in its cache because it's tuned for the purpose of maximizing read and writes from hosts. But if the amount of data arriving on the front end (from the host) can't be passed at the same or higher pace to the back end (the disks), you begin to see a backlog in cache. For writes, this backlog is known as "write pending" and is the extra buffered writes waiting to be written to disk. A high write-pending ratio is unhealthy and can often lead to other problems, such as high I/O wait times on the host and, in extreme cases, the array may stop accepting writes.

Under normal operating conditions, every array will have a write-pending backlog. That's because even with the fastest disk, solid-state memory is still faster. But like a pipe that allows water to flow without any obstruction, pending writes in cache need to be written to disk at a steady pace--ideally, at the rate they come in. Meanwhile, the host is happily chugging along, assuming it has completed its writes. It's the array that has to ensure that the writes are committed to disk in the fastest time possible. If it loses the writes for whatever reason (power loss, failed controller, etc.), the writes are gone. What remains is in an inconsistent state. Vendors will tell you that the chances of this occurring are rare, but it's not unheard of.

One common mistake is thinking that throwing more cache at a problem fixes it. To remedy the situation, you need to go back and figure out the ratio of reads and writes generated at the host and then try to match the RAID type. Essentially, it means that the workload profile of whatever is generating these reads and writes needs to match how the disks are arranged in the array (their RAID type).

Parity-based RAID types aren't well suited for applications with heavy random writes, but one of the most common reasons for high write-pending ratios is a high percentage of random writes on a RAID-5 or RAID-6 volume. Such applications are better suited for RAID-10 or RAID-1 (with host-based striping). Sequential read and write workloads, on the other hand, work well on parity-based RAID because head movement is minimized in sequential reads and writes. A write-intensive workload associated with parity-based RAID volumes can also cause processor overload because the overhead of parity calculation affects processor performance.