This depends on whose storage array you are using to set up the RAID5 volume. You should check with your storage vendor and ask them their "best practice" for RAID5 using their storage. Under normal RAID5, as you add spindles to a RAID5 implementation, the controller has to do its "read before write" operation for the stripe of data that was changed.
The steps for calculating RAID5 parity during a write are as follows:
1) Read the old data
2) Read the old parity, now calculate the difference between old data and new data, and old parity and new parity
3) Write the new data
4) Write the new parity
As you can see, it usually takes four I/O operations for every RAID5 write operation. The more drives in the RAID5 set increases the amount of data that needs to be read to calculate parity for the stripe. The good news is that using more drives in a Raid5 set reduces the amount of parity "penalty", which increases the effective user space for that volume.
So there is a tradeoff of performance vs. usable capacity when configuring RAID5. The minimum amount of spindles in RAID5 is three, and the maximum on most implementations is between 14 and 17. (More than 14 drives in a single RAID5 volume is usually not recommended). I find using six or seven spindles to be a good design. You may also want to configure your RAID5 set so there is no single point of failure. What I mean by this is that some modular storage vendors use drive "shelves", where each shelf is connected to a controller port behind the "dual redundant" controllers. You would configure your RAID5 set so that there are never two drives on the same shelf within the set. A shelf failure may cause both drives to fail, which would not be good for your data (only a single drive failure is allowed in a RAID5 set).
Some vendors have incorporated improvements to the original RAID5 spec, which allows for reducing the "RAID5 write penalty" during most normal operations. This can be done using hardware to calculate the parity "in parallel" to the underlying disks. Another approach is to gather writes in a cache buffer of say 64K, and then calculate parity for the buffer only once for all the writes into that buffer.
Other enhancements like RAID ADG (Advanced Data Guarding) write two parity stripes across more disks which allows for larger volumes and the ability to handle multiple drive failures. This is good for data archiving, as it is slower than RAID5.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our .bphAaR2qhqA^0@/searchstorage>discussion forums.
Dig Deeper on SAN technology and arrays
Related Q&A from Christopher Poelker
SAN expert Chris Poelker discusses how to change the size of a LUN in a Microsoft cluster server environment. Continue Reading
SAN expert Chris Poelker compares connecting a SAN with wavelength cabling and dark fiber and discusses the pros and cons of each. Continue Reading
Storage expert Chris Poelker outlines WWN basics in order to answer the question: "Why do HBAs in a SAN have same base?" Continue Reading