Preventing common RAID errors

By  Rick Cook

SearchStorage.com

What you will learn from this tip: How to prevent common malfunctions in RAID drives as well as common RAID errors.

One of the main reasons to use RAID instead of JBOD is to improve reliability. With mirroring (such as RAID 1 or RAID 10), or striping with parity (such as RAID 5), the system can recover from a single hard disk failure with no loss of data.

This works in theory. But there's a reason we still make backups of data stored in RAID arrays. Things can still go wrong that can cost you all or most of the data stored on the array. Fortunately, you can prevent most of them.

Failed consistency checks to prevent RAID errors

RAID arrays should be checked for consistency on a regular basis to prevent RAID errors. A consistency check helps spot blocks that have gone bad. Many of the consistency checking utilities also check any hot spares on the system to make sure they're actually capable of taking over in the event of a disk failure. Of course, if you never run a consistency check, you'll never know what's happening -- until it's too late.

The worst-case scenario happens when a hard disk fails in a RAID 5 array. If bad blocks have been building up on one or more of the other disks in the array, the system may not be able to rebuild the array, even after the failed disk is replaced. RAID 6, which uses two parity stripes instead of one, can protect against this problem at the expense of increased hardware and reduced performance. A better strategy is to run consistency checks regularly and replace drives that need it.

Some software runs consistency checks automatically. Hewlett-Packard's NetRAID Assistant does it once a week. A check should be run at least once a month. Any disks that are showing problems should be promptly replaced.

Write cache with no battery backup

RAID controllers come with caches that can be set to provide various amounts of memory for read or write caching. If you intend to use write caching, make sure the controller has a battery backup for the cache. Otherwise, you could lose data and corrupt the system in the event of a power problem.

If your RAID controller does not have battery backup, the safest course is to set the write cache size to zero.

About the author: Rick Cook has been writing about mass storage since the days when the term meant an 80 K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last 20 years, he has been a freelance writer specializing in storage and other computer issues.

08 Nov 2004

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.