Have you come across some hard numbers and performance figures for the use of RAID (at any level) in the database environment?
Any information is much appreciated.
Database application performance should not only be looked at from an I/O standpoint, but first and foremost from a design perspective. Let me give you an example. At one particular customer site, they were using SQL server in a three-tier Web application, where the Web pages were built dynamically from the database. The design used active server page scripts in the application tier to query the database for results and the results then sent to the Web server to build the dynamic pages.
By moving to SQL stored procedures at the database level, and tuning the indexing methods (and a slight schema redesign), we were able to improve performance by over 800%, all without even touching the back-end I/O. Never throw hardware at a poor design as the return on investment is never as good as taking the time to rethink the original design.
Ok, that said, here's the thing to do for good I/O performance:
1) Use a LOT of spindles. If you go to http://www.tpc.org and look at the specs on the designs to achieve maximum database performance, you'll notice they use hundreds of drives. By a lot of spindles I do not mean large RAID sets. You should concatenate many smaller RAID sets into a single LUN either by using host software or LUN virtualization within the storage hardware, if it has that capability.
2) Use low latency paths to storage. Fibre Channel uses a packet-based network, which allows for a full duplex backend. On a SCSI bus, only one thing can happen at a time. Using dual-pathed Fibre disks, you can have a read command on one side and a write command on the other side of the disk being queued at the same time.
3) Spread the I/O load as much as possible. Use many HBAs in the server that are load balancing your I/O across all the paths. Tools like Veritas Volume Manager or Solstice Disk Suite can help do this by creating a single LUN from many spindles on each path. Using 2 Gbit paths and, say, 4 HBAs can get you near 800 MBps to the LUN!
4) Use the proper RAID level for the log and database. Best practice here is log files should be on RAID0+1 (RAID-10) and databases on RAID-5 (RAID0+1 can be used on the database for better performance if your budget allows for it).
OK, now for performance. Today's Fibre Channel hard drives can achieve a sustained throughput of between 50 and 80 MB per second. You can calculate MAX I/O per spindle by dividing the MAX sustained transfer rate by the block size of your I/O request, minus any latency. I generally use 140 to 160 IOPS (I/Os per second) per spindle. The performance depends on the drive electronics, onboard cache size, rotation speed (15000 rpm is better than 7200 rpm of course) and maximum seek latency. If you partition your RAID set into small partitions, and locate your data on the outer cylinders. Your performance will go up substantially. Doing this reduces the seek time for a particular partition. Now just add up the number of spindles to figure out your maximum IOPS. Figure 1680 IOPS using 14 spindles on one HBA, or 3360 IOPS using 2 HBAs to 28 disks doing 64K sequential I/O (random I/O will be less). Changing the block size of the transfer will either increase MB per second (throughput) or IOPS. Smaller blocks make IOPS go up, larger blocks make throughput go up.
These numbers are for "direct" I/O, meaning no cache in front of the drives. Adding cache can increase performance dramatically. I have seen 33000 IOPS on a single LUN during 100% cache hit. So, if you want the best darn I/O performance available, use a solid-state disk. (Which can be REAL expensive!)
This was first published in April 2002