Road testing fast HBAs

We put four PCI-X adapters to the test and the results are instructive.

There's a lot of reasons for storage managers to get excited about the improvements that the PCI-X bus can bring to storage area networks (SANs) by enabling faster host bus adapters (HBAs). We tested three PCI-X dual-channel, 2Gb HBAs to inspect their performance, management characteristics and understand how the HBAs would react when bus conditions weren't optimal. Our results show that your choice of board and the way you configure the server it goes into will both have a significant impact on your results.

Lab Notes
The Linux version of Iometer works well, but we were unable to get normalized results, and can only speculate as to why this may be the case. The current open source version of Iometer is not yet at a release stage. We could only verify that the drives could be mounted under Linux, and that drives would support .ext3 and Reiser filing systems.

Windows 2000 Update service didn't contain drivers for any of the HBAs. The Windows 2000 source disk contains drivers that are as much as three years old, and therefore Windows 2000 Server installation inevitably required manual driver updates at installation time, whether a new installation or a retrofit. While those in the Linux, Solaris, and BSD operating system worlds are used to downloading the latest drivers, a question of revision synchronization between firmware and drivers comes into play, as well as driver validation. We'd like to see Microsoft use a better system for driver management.

The PCI-X bus provides speed, and SAN speed is critical. The data delivery chain starts at an application, and then data is delivered through the operating system to system hardware. From there, highest speed dedicated SANs get their data overFibre Channel (FC) to drives in various configurations. A traditional gating factor has been the speed of the host bus and its transfer characteristics.

The PCI-X bus takes the PCI bus from 32 to 64 bits wide. It was also designed to deliver 133MHz speeds, while being backwards and plug compatible with the older PCI bus speeds of 33MHz and 66MHz. So the current maximum throughput is 1GB/s, but there's a gotcha. The PCI-X bus is meant to be backward compatible with PCI adapters, but not at maximum performance levels.

We tested each board to see how far the bus would fall back in speed by introducing a PCI bus HBA into the PCI bus-based test bed. In all cases, the message is clear: You must use all PCI-X cards in a PCI bus system--unless a vendor supports totally different buses--or any gains associated with a PCI-X HBA will be thwarted.

The PCI-X bus must slow down to match the highest common denominator speed of adapters connected to the bus. On a good day, that means that a PCI-X adapter must slow down to 66MHz to match the highest common denominator speed of the PCI card connected to the bus. An older PCI bus card would slow the overall speed of the bus down still further. This characteristic was the same for all three cards tested. And keep in mind that a non-HBA PCI card--such as a network interface card--will have the same effect on the PCI-X HBA.

Hardware in our test environment was a Compaq DL580 (see "Lab Notes") through three vendors PCI-X 2Gb/s, 2-port HBAs (and an additional single channel adapter) to a JMR Flora drive array-all tested in several configurations. We chose the HBA vendors--Emulex, Costa Mesa, CA, LSI Logic, Milpitas, CA, and JNI, San Diego, CA--based on the breadth of their inclusion in compatibility matrixes from the major switch and disk subsystem vendors. We had intended to include QLogic HBAs for the same reason, but the company declined to participate. We did, however, test a new, single-channel Emulex board as our fourth option.

We tested all three vendor's boards on Windows 2000 and SuSE Linux 8.1. Each HBA vendor also supports a matrix of other operations systems, with Solaris as the most popular "other."

The test that we used was Intel's IOMeter, which is available on both Windows and Linux platforms. However, the open-source Linux version of Iometer isn't yet a 1.0 release. We were able to get high testing repeatability with the Windows version, but the Linux version didn't give us results within our desired 5% range on two of the three boards tested. We therefore performed most tests using Windows 2000 Advanced Server as our reference platform, and our comments are based on Windows 2000 use.

Common to all of the HBAs tested was the ability to perform many automated sensing chores. Each HBA tested also has support of the IP over the adapter , but we did not test IP features, only SCSI. We also weren't brave enough to test PCI-X hot-pluggable features of the HBAs.

LSI Logic LSI7202XP-LC
The LSI Logic LSI7202XP-LC is a dual-channel, 2Gb/s adapter. On our test platform, the hardware detection portion of the Windows 2000 Advanced Server installation CD didn't contain the correct driver required to automatically install the LSI HBA. LSI has the instructions necessary to add the drivers during installation or to update them when the HBA is retrofit--both instructions worked perfectly.

The LSI LSI7202XP-LC HBA has an onboard BIOS routine that allows configuration at power on, which lets you change which of the two controllers boot first, and interrupt and port choices for the adapter. Changing the parameters was simple. However, although an F1 key is available for help, the screens are frustratingly empty of any information at all--they're blank. Despite this oddity, we liked the fact that the LSI HBA would interrogate/autosense what was connected to it readily from the moment we powered up its host server.

In Windows 2000, the LSI HBA appeared as two driver and board instances-as did the other HBAs tested-within Device Manager.

Of the three dual-channel HBAs tested, the LSI7202XP-LC HBA used the least amount of CPU during IOMeter testing, but was oddly the slowest performer. The CD sent with the HBA has driver files and documentation for several different operating systems, but no special management utilities or other functionality.

We obtained utilities from LSI, but otherwise didn't find them on their Web site or on the CD after deadline. We're not sure how to get them, but perhaps the Web site will be updated by the time you read this.

How fast is fast
Installation 20% 4.3 4 3.6 4.3
     Ease Very Good Very Good Very Good Very Good
     Documentation Very Good Very Good Good Very Good
     Management Apps Excellent Very Good Very Good Excellent
Compatibility 20% 4.3 4 3.6 4.3
     Windows 2000 Excellent Excellent Excellent Excellent
     Linux 2.4.17+ Very Good Very Good Good Very Good
     Has other drives Excellent Very Good Very Good Excellent
Performance 40% 4 4 5 4
      Very Good Very Good Excellent Excellent
Value 40% 4 4 5 5
      Very Good Very Good Excellent Excellent
TOTAL 4.12 4.06 4.44 4.32
PCI-X alone (MB/s) 99.4 88.3 123.1 109.4
PCX-X with PCI 26.3 25.5 45.6 28.3
Upsides Good management apps POST BIOS configuration easy Strong performer Dark horse, great value
Downsides CD lacked drives Difficult to find management utilities Maturing CD lacked drivers
We tested the PCI-X HBAs using two different speed models. We also tracked the best throughput per adapter where no I/Os were pending. To give you an idea of the performance on our test bed, remember that we're using some of the fastest, high-spindle speed drives available (Seagate Cheetah 336752FC) in a dual-controller, 2Gb/s JBOD array, expressed as the best read speed of the drive after write.

To check the effects of mixing standard PCI cards with PCI-X cards, we repeated the tests with a Compaq NetIntelligent PCI 10/100 Ethernet NIC in the bus. Results were dramatically lower. As a point of comparison, a PCI-based server (Compaq 760) connected to an external JBOD normally performs this test on a good day at 21.2MB/s using the Compaq 2.X SmartArray.

JNI FCX2-6562
JNI's FCX2-6562 installed under Windows 2000 server easily. JNI's management application had to be downloaded from their Web site, as no CD was included with the adapters. We'd prepared for this by downloading the drivers and management application prior to installation.

Like the other HBAs tested, installation under Windows 2000 required simply having the drivers ready. The JNI EZ Fibre 2.2 application installed as a service under Windows 2000 Server and uses its own copy of the Java Runtime Environment. We found graphical application quite easy to understand and use to monitor the JNI adapter.

The JNI HBA performed at the highest levels of the HBAs tested, although the HBA also consumed more CPU cycles (at peak still only 4.1%) in the test server. We found that if we performed tests while running the EZ Fibre 2.2 application that CPU could shoot as high as 10%, which we consider to be significant, but not alarming.

Because of its low price, the JNI FCX2-6562 presented the best overall value, and when combined with its speed and management app, received the highest score of the adapters that we tested.

Emulex LP9402-F2 and LP9802-F2
The Emulex LightPulse LP9402 (and LP9802) installs on Windows 2000 Server similar way to the other HBAs tested, either through a driver diskette at initial install, or via driver discovery during retrofit.

The power-on (POST) BIOS options for the Emulex HBA used a menu tree to select options, and we were able to easily maneuver the user interface to make changes.

After Windows 2000 booted and the Emulex drivers are installed, a utility called lputilnt.exe is used to view the internals of the board from Windows 2000. There's even a place within the utility to change Windows Registry values--temporarily or permanently--to optimize the performance of the drive. Indeed, Emulex suggested a few registry entries that might help performance, and these entries certainly helped in small I/O, but are not reflected in our test results.

Emulex initially sent us a single-channel HBA, their LP9802-F2. They felt confident that the board was faster than their older, dual-channel 2Gb/s HBA, and they were right--it was speedier in overall IO output than the older LP9402 in our tests. We've included the board as an example of how next generation 2Gb/s FC boards--with faster CPUs and better multithreading--perform.

The final results
Each of the boards was flexible, but the JNI HBA was the overall winner in our impressions and scoring (see "How fast is fast," this page).

Our performance tests consisted of using Intel's Iometer, using two test sequences representing high-file I/O activity of small loads, and secondly, a high-file I/O activity of very high loads. We've found that Iometer can be tweaked to heighten response, but subjected each HBA to the same workload on the same system using freshly reformatted drives targeted in our JMR JBOD array without tweaks.

The first workload is a simulation of typical file server responses, with a 67% read, 33% write of small I/O (4K, 64K and 256K chunks). The second load represents fat I/O loads of 90% reads, and 10% writes of large I/Os (1MB, 2MB and 16MB chunks).

In terms of speed, Emulex's new single channel HBA performed admirably, and faster than Emulex's dual-channel adapter. However, JNI's HBA was the dual-port and overall speed champion.

We also tested the impact of using PCI adapters in the test platform's PCI bus. Results for each adapter was halved or worse by the introduction of a single PCI network card into the bus. The effect was most pronounced with the Emulex LP9802-F2, but the performance reduction was nearly the same in terms of percentage decline from normal--69%--across the HBAs tested.

Therefore, we strongly recommend, therefore, you use either motherboard-based network adapters-or other PCI adapters for that matter-or to uniformly deploy only PCI-X HBAs into a server connected to a SAN if optimal performance is desired.

Test configuration
The base test platform was a Compaq DL580-G2 server, running four 1.6GHz Xeon CPUs, connected by the HBA under test, to a JMR Flora drive array using two controllers. Inside the JMR were eight 15,000 rpm, 36GB, Seagate Cheetah 336752FC drives. The Iometer tests were performed utilizing the dual-JBOD controllers inside the JMR drive array with four targets per HBA channel.

We used Windows 2000 Advanced Server (SP3 with recommended patches to November 11, 2002). The server was configured without most services enabled, such as the Active Directory, IIS and Message Queueing to normalize quiescent CPU characteristics and memory usage. Drives were reformatted between

Iometer tests.
We used Intel's Iometer 2001 benchmark Windows version application in its default file server mode to gain performance results. We also used a Perl script to write and time file I/O to generate sanity checks under Windows 2000 Advanced Server to get an alternative validation to our results. All test sequences were repeated up to five times to achieve normalization within 5%.

Dig Deeper on SAN technology and arrays