RAID originally stood for redundant array of inexpensive disks. Today, the acronym has been updated to redundant...
array of independent disks. But RAID's purpose has not changed.
RAID is a common method for protecting application data on hard disk drives and solid-state storage, with different types of RAID balancing the level of protection against price. The greater the protection, the higher the cost. As storage has evolved, the number of RAID levels has increased.
RAID is a way of grouping individual physical drives together to form a RAID set. The RAID set represents all the physical drives as one logical disk on your server. The logical disk is called a logical unit number, or LUN.
Improvements to RAID performance and availability have kept it in use even as newer technologies have become available.
To fully understand RAID and its benefits, it's important to break down the different RAID levels and what they each do best.
Benefits of RAID
The primary benefit of using RAID is preserving data stored on failed drives. RAID levels use data mirroring, striping and parity, or a combination of those techniques. In most cases, increases in performance or reliability raise the cost of protecting data on the drives.
Mirroring is when data is written to more than one drive simultaneously, while striping means data is spread across drives in chunks. Parity is a way to make sure data has successfully been written when it is moved from one drive to another.
Parity is basically a checksum of the data that was written to the disks, which gets written along with the original data. The server accessing the data on a hardware-based RAID set never knows that one of the drives in the RAID set went bad. The controller recreates the data that was lost when the drive goes bad by using the parity information stored on the surviving disks in the RAID set.
Standard vs. nonstandard RAID levels
The expansive number of RAID levels can be broken down into three categories: standard, nonstandard and nested. Standard levels of RAID are made up of the basic types of RAID numbered 0 through 6.
A nonstandard RAID level is set to the standards of a particular company or open source project. Nonstandard RAID includes RAID 7, adaptive RAID, RAID S and Linux md RAID 10.
Nested RAID refers to combinations of RAID levels, such as RAID 01 -- RAID 0+1, RAID 03 -- RAID 0+3, and RAID 50 -- RAID 5+0.
RAID levels explained
The RAID level you use should depend on the type of application you are running on your server. RAID 0 is the fastest, RAID 1 is the most reliable and RAID 5 is a good combination of both. The best RAID for your organization may depend on the level of redundancy you're looking for, the length of your retention period, the number of disks you're working with and the importance you place on data protection versus performance optimization.
Below is a description of the different RAID levels that are most commonly used in storage arrays. Not all storage array vendors support every RAID type, so be sure to check with your vendors for the types of RAID that are available with their storage.
RAID 0: RAID 0 is simple disk striping. All the data is spread out in chunks across all the disks in the RAID set. RAID 0 offers great performance because you spread the load of storing data onto more physical drives. It also has the lowest cost of all the RAID types because it uses disk space only to store data. Because there is no parity generated for RAID 0, there is no overhead to write data to RAID 0 disks.
However, RAID 0 has the worst data protection of all the RAID levels. When a disk fails, data on that disk is unavailable until it can be rewritten from another drive.
RAID 1: RAID 1 is disk mirroring, which means all the data is written to two separate physical disks. The disks are essentially mirror images of each other. If one disk fails, the other can be used to retrieve the data.
Disk mirroring is good for fast read operations, but write speeds are slower because data must be written twice to disks. Another downside of RAID is it doubles the amount of disk space required because all the data is stored twice.
RAID 1+0: RAID 1+0, which is also called RAID 10, uses a combination of disk mirroring and striping. The data is normally mirrored first and then striped. Mirroring striped sets accomplishes the same task, but it is less fault-tolerant than striping mirror sets.
If you lose a drive in a stripe set, you must access data from the other stripe set because stripe sets have no parity. RAID 1+0 requires a minimum of four physical disks.
RAID 3: RAID 3 uses a parity disk to store the parity information generated by a RAID controller on a separate disk from the actual data disks instead of striping it with the data, as in RAID 5.
This RAID type performs poorly when there are a lot of requests for data, as with an application such as a database. RAID 3 performs well with applications that require one long, sequential data transfer, such as video servers. RAID 3 requires a minimum of three physical disks.
RAID 4: RAID 4 uses a dedicated parity disk along with block-level striping across disks. While it is good for sequential data access, the use of a dedicated parity disk can cause performance bottlenecks for write operations. With alternatives such as RAID 5 available, RAID 4 is not used much.
RAID 5: RAID 5 uses disk striping with parity. The data is striped across all the disks in the RAID set, along with the parity information needed to reconstruct the data in case of disk failure.
RAID 5 is the most common RAID method because it achieves a good balance between performance and availability. RAID 5 requires at least three physical disks.
RAID 6: RAID 6 increases reliability by utilizing two parity stripes, which allow for two disk failures within the RAID set before data is lost. RAID 6 is used often for large capacity drives deployed for archiving or disk-based backup. RAID 6 allows for data recovery during simultaneous disk failures, which is more common with larger capacity drives with longer rebuild times. RAID 6 requires at least four drives.
Adaptive RAID: Adaptive RAID lets the RAID controller figure out how to store parity on the disks. It chooses between RAID 3 and RAID 5 depending on which RAID set type will perform better with the type of data being written to the disks.
RAID 7: RAID 7 is a nonstandard RAID level -- based on RAID 3 and RAID 4 -- that requires proprietary hardware. This RAID level is owned and trademarked by the now-defunct Storage Computer Corp.
Minimum drives and rebuilds for RAID levels
RAID requires multiple drives, and the minimum of required disks varies by RAID level. But once you meet the minimum requirement, is there a benefit to exceeding that number?
If you use more than the minimum number of drives, you get more available storage and more actuators or spindles for the OS to use. However, this does not mean you should use as many drives as possible at all times. Most RAID arrays use a maximum of 16 drives within a RAID set due to higher overhead and diminishing returns in performance when exceeding that many drives. Up to eight drives seems to be a good rule of thumb for RAID 5 and RAID 10. If you need more disk space, you can create another RAID set with the other disks.
As another rule of thumb, try to keep different workload data types on separate RAID sets. You can use RAID 10 for best performance everywhere, but most budgets dictate the use of RAID 5 for database data volumes, with RAID 1 or RAID 10 used on database log volumes. The database volumes can be highly random I/O, and the logs tend to be sequential in nature.
Rebuild times depend on the kind of RAID you choose. If you are using software-based RAID, then more spindles within the group mean longer rebuild times. If you use hardware-based RAID, rebuild times are usually dictated by the size of the drives themselves, as the hardware usually does the sparing in and out of the set. A 146 GB drive takes longer to rebuild than a 73 GB drive.
How RAID is used today
Many experts say the need for RAID technology has diminished. Erasure coding and solid-state drives have presented reliable -- if more expensive -- alternatives, and as storage capacity increases, the chance of RAID array errors increases, as well. Still, large storage vendors continue to support RAID levels in their storage arrays.
Can RAID technology benefit object storage?
Supplement RAID with storage virtualization
How data growth affects the RAID and erasure coding battle