Do RAID levels still matter?
This article can also be found in the Premium Editorial Download "Storage magazine: RAID turns 20: Do you still need it?."
Download it now to read this article plus other related content.
Storage administrators have always been confronted with the task of matching the application storage workload to the storage configuration. For many enterprise applications, this goes beyond the simple "logs on 0/1, tables on 5" to complex volume management involving both hardware and software RAID. Some array vendors actually recommend such techniques as software RAID 0 striping across their array controllers. But the storage workload profile of an application can change over time, requiring constant re-tuning to maintain performance.
Many storage administrators, managing hundreds of terabytes of data, are beginning to realize that twiddling knobs to achieve better performance isn't as much fun as it used to be when 2TB was considered a very big array. Today, the hottest trend in storage arrays is data stripe abstraction. Blocks of data are written to many disks in the array according to a pattern determined by the intelligence in the array; the RAID level, as it were, is fixed and not configurable. In the most advanced arrays, the data block layout pattern can change dynamically as the application workload profile changes.
|Balance cost, data protection and performance|
|If you still want to manually|
| select RAID levels, remember that performance increases with the number of spindles across which your data is distributed, but large RAID groups increase the risk of data loss if a disk fails. Using small RAID groups, or increasing the number of parity disks, decreases the available capacity for the same investment in cost, power and rack space. Balancing availability, risks and costs are certainly business decisions. Here are some guidelines, with the caveat that "your mileage may vary," especially among array vendors.
RAID 0/1, with a small stripe size and meta volumes spread out over a large number of spindles, offers the highest reading and writing performance, all other things being equal. The primary reason for this is that there are no parity calculations involved when writing or rebuilding. This is also the most expensive approach; plan on a usable capacity of half the raw storage amount. If you have a "thrashy" application (lots of random reads and writes), such as a journaling file system, email server, OLTP database or CAD application, this still may be your best bet.
RAID 5 offers read performance almost on par with RAID 0/1, but write performance isn't as good. Avoid the temptation to make RAID groups too large. Expect usable capacities in the range of 67% to 94% of the raw storage amount. For most applications, RAID 5 is the best approach.
If you're using big drives for the lowest possible cost per gigabyte consider using RAID 6 (also known as RAID with double parity). This will help protect against the greater risk of two drives failing in a RAID group, while allowing a usable capacity up to 88% of the raw storage amount. This is a good RAID level for data warehousing, archive, back up to disk or any application where large capacity is paramount.
This was first published in November 2007