Article

The essential RAID primer, part 1

Evan Marcus
The concept of RAID (Redundant Arrays of Independent Disks) dates back to a famous paper written by David Patterson, Garth Gibson and Randy H. Katz in 1988.

In its most basic form, RAID allows disks to

    Requires Free Membership to View

be arranged to protect the data that they contain by adding redundancy. In a properly configured RAID model, the loss of any single disk will not interfere with users' ability to access the data that is stored on them.

"A Case for RAID" introduced five levels of RAID, each of which adds redundancy by combining disks in a different way. Since this landmark paper was written, several other levels of RAID have been defined by the RAID Advisory Board and by various storage technology vendors, including at least one level that does not actually add any redundancy at all.

A brief description of different RAID levels follows:

RAID-0: Striping. The easy way to remember that is that RAID 0 adds no redundancy. In fact, if you stripe disks to improve performance, you will have worse availability than if you just use disks without RAID. That's because if you stripe, the loss of any of the disks in the stripe will cause the entire stripe to fail. (Remember that; we'll come back to it.)

RAID-1: Mirroring. All of the data on one disk is copied exactly onto a second disk. Neither disk is the master or primary; the disks are peers. For writes to be deemed complete, they must make it to both disks. If one disk fails, its partner keeps right on running, without interruption. The good news with RAID-1 is that it's very easy to manage and it does not require significant levels of CPU for normal operations or for recovery. The downside to RAID-1 is the expense: For every gigabyte of disk you wish to protect, you need a second, matching gigabyte. In other words, RAID-1 requires twice as much disk space as unprotected disks.

RAID-2: Hamming code error correction. RAID-2 uses the same hamming encoding method for checking the correctness of disk data that error correcting code memory (ECC) uses. I have never seen or heard of a single commercial implementation of RAID-2. I include it here only for completeness and because if I left it out, someone would write to ask me about it.

RAID-3, RAID-4, and RAID-5 are all variations on a theme. The theme is parity-based RAID. Instead of keeping a full copy of the data as in RAID-1, these levels spread the data over several disks with an additional disk added. The data on the additional disk is calculated (using Boolean XORs) based on the data on the other disks. If any disk in the set is lost, its data can be recovered through calculations on the data on the remaining disks. These implementations are less expensive than RAID-1 because they do not require the 100% disk overhead that RAID-1 requires. However, because the data on the disks is calculated, there are performance implications associated with any writing, and with recovering after a disk is lost. Most commercial implementations of parity RAID use cache memory to alleviate the performance issues.

RAID-3: Virtual disk blocks. In RAID-3, every write is split (striped) across all of the disks (usually four or more) in the RAID array. Since every write touches every disk, the array can only be writing one block of data at a time that can cause poor performance from the RAID. RAID-3 performance varies based on the nature of the writes: Small writes scattered all over the disks will show very poor performance. Larger sequential writes will result in better performance.

RAID-4: Dedicated parity disk. In a RAID-4 array, there is a set of data disks, usually 4 or 5 (although there could be more, at a significant performance penalty), plus one extra disk that is dedicated to managing the parity for the data on the other disks. Since all writes must go through the parity disk, that disk becomes a performance bottleneck slowing down all write activity to the entire array.

RAID-5: Striped parity. RAID-5 is virtually identical to RAID-4 except that instead of all of the parity being concentrated on a single disk, the parity is divided up, with a share being given to each disk in the array. This sharing will balance and reduce the performance impact that is evident in RAID-4 implementations. In software implementations of RAID-5, which are fairly common, performance will often become unacceptably slow if writes make up any more than about 15% of disk activity.

Take a look at "The essential RAID primer: Part 2" where Evan Marcus discusses the other levels of RAID including the one that does not add redundancy.


About the author: Evan Marcus is a Principal Engineer and Data Availability maven with VERITAS Software Corporation, with more than 15 years of experience in Unix systems. After spending five years at Sun Microsystems, Evan joined Fusion Systems and, later, OpenVision Software, where he worked to bring the first high availability software applications for SunOS and Solaris to market. Evan is the author of several articles and talks on the design of high availability systems. He is the author (along with Hal Stern of iPlanet) of "Blueprints for High Availability".
Related Topics: Disk arrays, VIEW ALL TOPICS

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: