This article can also be found in the Premium Editorial Download "Storage magazine: Low-cost storage pieces fall into place."
Download it now to read this article plus other related content.
|Apple's first storage product: highlights|
Lots of throughput
To truly appreciate the power of the Xserve RAID, you have to understand the difficult target that Apple is aiming at--the high-end, real-time video processing market. These people demand a lot of throughput (well in excess of 100MB/s). When you're talking to someone who edits video, don't mention the maximum transfer rates or seek rates of a disk drive or storage subsystem. Talk to them about minimums. If you drop below 130MB/s sustained throughput, you're going to drop frames--audio or both--and that's simply unacceptable.
The RAID arrays aimed at this target market advertise "up to 130MB/s sustainable throughput," or "over 100MB/s sustainable throughput." With the Xserve RAID, Apple has demonstrated that it can provide a sustained throughput of 209MB/s in a RAID 5 configuration. Although that number alone sounds impressive, Apple says that it engineered the system in such a way that they can support enough throughput for high-definition video editing--even after losing a disk in a RAID array. They say that this applies to both reads and writes, as it claims that their RAID 3 and RAID 5 write performance is approximately 90% of their read performance. Pretty impressive. (Be advised that this article is not a review, but rather a first-look report. Most statements in this article are based on claims made by Apple.)
The Xserve RAID algorithm also minimizes the performance hit caused by so-called soft errors. Soft errors occur when all of the data is not retrieved from the disk in the first pass, and it causes the disk to spin around again for a second, third, or fourth attempt to reread the missing data. Such retries significantly degrade the actual throughput of a RAID set. When a soft error happens on an Xserve RAID array, the RAID controllers actually rebuild the missing data from parity in the cache before the disk can make a quarter turn. This secret sauce also allows them to maintain a nearly constant sustained throughput over the entire platter of a RAID set's hard drives. This means that the read and write performance of the outer track is nearly identical to that of inner track.
As mentioned earlier, the RAID array supports RAID 0, 1, 3, 5 and 0+1. RAID 0 stripes multiple disks into a single virtual disk. RAID 1 mirrors two disks. If you are mirroring two RAID 0 virtual disks, this is called RAID 0+1. RAID 3 and 5 sets can rebuild any single disk within the RAID set using parity. RAID 3 uses a dedicated parity disk and RAID 5 distributes the parity among all disks. RAID 3 and 5 both protect you against the loss of a single disk with a lot fewer disks than RAID 1. However, RAID 3 and 5 must calculate and store parity, resulting in a performance penalty during writes. Although RAID 3 is not used in most data centers, it is used where a constant throughput level is required, like with high definition video editing or real-time recording of data streams such as seismic or telemetry applications.
The way that the Xserve RAID is designed, each RAID controller creates independent RAID 0, 1, 3 or 5 arrays. These arrays can then be used individually, striped together for performance or mirrored for additional availability. Both of these operations would need to be done using Apple's volume manager or Windows dynamic drives. Apple says that it is also quite common for customers to mirror across Xserve RAID arrays.
Two types of write caching
Most RAID arrays--including the Xserve RAID--use caching to help mitigate the performance penalties associated with RAID levels 3 and 5. There are two main types of write caching: write-back and write-through. In write-back cache, data is considered committed, or successfully received, as soon as the RAID controller has written the data to cache. Then the RAID controller tells the host that the write has been completed, even though it is only written to RAM. While this provides fast performance, the volatile nature of RAM means that the data will be lost upon a power loss or other server outage. Those concerned more with data integrity than performance should use write-through cache, which ensures that the data is written directly to disk before it acknowledges that a write request has been completed.
As you can see, write-back cache is designed for high performance and write-through cache is designed for increased data integrity. The mistake that many hardware RAID systems make is deciding which type of write cache is best for their customers, as they often offer only one type of write cache. Apple's Xserve RAID allows you to choose which is more important to you--performance or integrity. If you choose write-back cache for performance reasons, the Xserve RAID will default to write-through cache if it detects that it is running on the UPS. Switching to write-through cache assures that nothing will remain in cache in the event of complete battery loss.
Using an independent ATA bus for every disk is also a good idea. As mentioned earlier, many ATA arrays that the Xserve RAID competes with use two disk drives on each ATA bus. Anyone who has ever tried to burn a CD from another CD on the same ATA bus knows what kind of impact bus contention can have on an ATA bus.
Another important design decision is that of the passive midplane. A lot of server and storage vendors use a back-plane with integrated circuits that are included in the data path. That is, a disk drive is connected to the RAID controller by plugging into the sockets that are connected to the back-plane. The data then travels through the socket, across the back-plane and back out through another socket to the RAID controller. Apple designed its midplane to be passive. In other words, all data passes from each disk to the RAID controller via an independent drive channel. In terms of availability, the loss of one of these channels is no different than the loss of a drive. This removes the single point of failure that back-planes create.
Apple has also removed single points of failure with dual, hot-swappable power supplies, cooling systems and coprocessors. A single power supply can power the entire unit while the other is being repaired. A single cooling system can be instructed to double fan speed if the other cooling system is lost. And the coprocessors monitor and track all of this, automatically notifying administrators via e-mail or pager of any failure. These coprocessors also track self-monitoring analysis and reporting technology (SMART) information from each disk drive, and can warn administrators of prefailure conditions. Such automated notification is essential when using RAID 3 or 5, as the loss of more than one drive results in a complete loss of all data in that RAID set. The Xserve RAID array also supports a global hot spare on each RAID controller, allowing it to step in for any lost drive, mitigating this risk even further.
It's obvious that Apple has put a lot of thought into their first RAID offering and wishes to be the RAID vendor of choice for Apple servers requiring lots of storage. Hopefully in the future, they will resolve the incompatibility issues and make this product more easily available to Windows, Linux and Unix users.
This was first published in October 2003