But along the way, the hospital found an iSCSI switch from Sanrad Inc. to be a more viable tool for pulling all the pieces together.
The hospital wanted to send each newly provisioned volume of storage to the secondary site automatically and asynchronously, keeping the two sites reasonably in sync. The goal was not only to be able to balance workloads between the primary site and hot disaster recovery sites using VMware Inc.'s VMotion, but also to be able to lose the entire primary data center almost instantaneously and failover to the secondary site automatically with no impact to users.
This complicated the replication plans, however. Nelson knew putting a second HP system at the secondary site for homogeneous replication would let him transfer data fast enough between the sites, but he didn't want to lose his investment in the EMC array. That's when the hospital's local VAR, Total Tec, suggested Sanrad's V-Switch product as a way to get asynchronous replication between the heterogeneous arrays.
"I was skeptical about iSCSI," Nelson said. "I had used Fibre Channel for years and years and had trouble envisioning putting things like my high-transaction databases on iSCSI." But using the Sanrad switch would save him $300,000 over its Fibre Channel alternatives. He tried it and was surprised by its performance. "It supports our high-transaction SQL cluster just fine," he said, pegging SQL's transaction rate at 100 to 140 per second. "I have noticed no bottlenecks."
The hospital's storage team first presents Fibre Channel LUNs from the EVA at the primary site to the Sanrad switch, then presents iSCSI LUNs to most of the 140 physical and virtual servers running in the production environment, both at the primary and secondary sites. The Sanrad switch then automatically mirrors every volume of data to the EMC array at the secondary site.
The hospital's servers are divided among five physical hosts at each site, and VMotion uses the hospital's own fiber connection between the sites to load-balance among them, drawing data from either pool of mirrored storage interchangeably. There are exceptions to this, such as a Microsoft Exchange 2007 cluster that does its own replication with geographically separated storage, and some noncritical file shares stored in the FATA partition on the EVA and not replicated for disaster recovery. Otherwise, about half of the hospital's 26 TB of data is flowing between the sites.
Nelson admitted that there are a few unique attributes of his environment that allow adequate performance for this architecture. One is that the hospital owns the pipe from one site to another. Another is the hospital's dual Cisco Catalyst 4510 backbone, which has enough bandwidth to run three separate VLANs to support iSCSI, each of which also have their own blade in the Cisco chassis.
The ultimate test of this infrastructure has yet to be carried out. "We can just about get all our critical servers running simultaneously on the hosts at each physical site, but it's pretty tight," Nelson said. The hospital has run some test failovers, and while it has verified that the mirroring is working on a day-to-day basis, it has not yet performed a full live shutdown of the primary site.
In the meantime, Nelson said he's discovered another advantage to iSCSI over Fibre Channel – the ability to directly attach LUNs to virtual hosts. "Trying to attach a Fibre Channel LUN to a virtual host is like trying to plug in a USB device – you need an HBA," he said. "With iSCSI, the target and initiator can both be software-based." In fact, he said, if he'd known at the outset what he now knows about iSCSI, "I would've had no problem at all basing all of this on an iSCSI SAN and saving even more money."