RAID used to stand for "redundant array of inexpensive disks". Today the term has been updated to "redundant array of independent disks". RAID is a way of grouping individual physical drives together to form one bigger drive called a "RAID set". RAID can make many smaller disks appear as one large disk to a server. The RAID set represents all the smaller physical drives as one logical disk to your server. The logical disk is called a LUN, or "logical unit number". Using RAID has two main advantages. Better performance and higher availability, which means it goes faster and breaks down less often.
RAID benefits explained
Performance is increased because the server has more "spindles" to read from or write to when data is accessed from a drive.
Availability is increased because the RAID controller can recreate lost data from parity information. What is parity? Parity is basically a checksum of the data that was written to the disks, which gets written along with the original data. RAID can be done in software on a host, such as Windows "FTDISK" volumes, or in hardware on the storage controllers. The server accessing the data on a hardware-based RAID set never knows that one of the drives in the RAID set went bad. The controller recreates the data that was lost when the drive went bad, by using the parity information stored on the surviving disks in the RAID set.
There are a number of different ways drives can be grouped together to form RAID sets. The different methods used to group drives are called "RAID types". RAID types are numbered from 0 to 5. The numbers represent the "level" of RAID being used. RAID levels 0, 1 and 5 are the most common. Combinations of RAID types may be used together. For example, you can create 2 RAID-0 sets, and then combine the RAID-0 sets into a RAID-1 set. This will essentially give you the performance benefits of RAID-0, with the availability benefits of RAID-1.
In order to survive multiple drive failures in a "RAID-10" setup, you should create multiple RAID-1 mirrors, and then stripe across the mirrors using RAID-0. As long as multiple drive failures occur in separate mirror sets, the RAID set is still available. If you create 2 RAID-0 stripe sets and mirror those together, losing a single disk within the stripe forces all access to occur from the mirrored stripe set.
The RAID type you should use depends on the type of application you are running on your server. RAID-0 is the fastest. RAID-1 is the most reliable and RAID-5 is a good combination of both.
RAID types explained
Below is a description of the different types of RAID that most commonly used in SAN storage arrays. Not all storage array vendors support all the various RAID types. Check with your vendor for the types of RAID that are available with their storage.
RAID-0: RAID-0 is called disk "striping". All the data is spread out in chunks across all the disks in the RAID set. RAID-0 has great performance, because you spread out the load of storing data onto more physical drives. There is no parity generated for RAID-0. Therefore there is no overhead to write data to RAID-0 disks. RAID-0 is only good for better performance, and not for high availability, since parity is not generated for RAID-0 disks. RAID-0 requires at least two physical disks.
RAID-1: RAID-1 is called disk mirroring. All the data is written to at least two separate physical disks. The disks are essentially mirror images of each other. If one of the disks fails, the other can be used to retrieve data. Disk mirroring is good for very fast read operations. It's slower when writing to the disks, since the data needs to be written twice. RAID-1 requires at least two physical disks.
RAID 1+0: RAID 1+0, which is also called RAID-10, uses a combination of disk mirroring and disk striping. The data is normally mirrored first and then striped. Mirroring striped sets accomplishes the same task, but is less fault tolerant than striping mirror sets. If you lose a drive in a stripe set, all access to data must be from the other stripe set, because stripe sets have no parity. RAID 0+1 requires a minimum of four physical disks.
RAID-2: RAID-2 is no longer used.
RAID-3: RAID-3 uses something called a "parity disk" to store the parity information generated by the RAID controller on a separate disk from the actual data disks, instead of striping it with the data as in RAID-5. This RAID type is not currently used very often, because it performs poorly when there are a lot of little requests for data, as in a database. This type performs well under applications that just want one long sequential data transfer. Applications like video servers work well with this RAID type. RAID-3 requires a minimum of three physical disks.
RAID-4: RAID-4 is good for sequential data access, but is not used much.
RAID-5: RAID-5 uses disk striping with parity. The data is striped across all the disks in the RAID set, along with the parity information needed to reconstruct the data in case of disk failure. RAID-5 is the most common method used, since it achieves a good balance between performance and availability. RAID-5 requires at least three physical disks.
Adaptive RAID: Adaptive RAID lets the RAID controller figure out how to store the parity on the disks. It will choose between RAID-3 and RAID-5, depending on which RAID set type will perform better with the type of data being written to the disks.
RAID-6: RAID-6 increases reliability by utilizing 2 parity stripes, which allows for 2 disk failures within the RAID set before data is lost. RAID-6 is seen in SATA environments, and solutions that require long data retention periods, such as data archiving, or disk-based backup.
Regarding your question about the benefit of using more disks in a RAID set than the minimum, the answer is you get more available storage and more "actuators" or "spindles" for the OS to use. Most RAID arrays use a maximum of 16 drives within a RAID set due to higher overhead and diminishing returns in performance when exceeding that many drives. Up to 8 seems to be a good rule of thumb for RAID-5 and RAID-10. If you need more space, you can just create another RAID set with the other disks. As another rule of thumb, try to keep different workload data types on separate RAID sets. You can use RAID-10 for best performance everywhere, but most budgets dictate the use of RAID-5 for database data volumes, with RAID-1 or RAID-10 used on database log volumes. (the database volumes can be highly random I/O, and the logs tend to be sequential in nature).
Rebuild times depend on the kind of RAID. If you are using software-based RAID, then more spindles within the group means longer rebuild times. If it's hardware-based RAID, then rebuild times are usually dictated by the size of the drives themselves, since the hardware usually does the sparing in and out of the set. A 146 GB drive takes longer to rebuild than a 73 GB drive.
This was first published in December 2005