Would you please comment on single vs. dual SAN fabrics in a large SAN environment? One with 20+ TBs of disk and hundreds of mixed O/S servers.
Assume all servers are dual pathed, redundant (clustered) and attached to HA directors or switches. What are some of the issues that would push you to design a SAN that large in one fabric vs. two distinct fabrics? What would you recommend and why? I believe two distinct fabrics with server connectivity to both is the best way to design a large SAN. Am I wrong?
Thanks for the advice!
You answered your own question. If you ask any of the switch vendors for their opinion, they will all agree (as do I) that dual fabrics are the way to go for large SANs. Here are some of the reasons why:
- Traffic isolation
- Over-subscription prevention
- SAN backup
Let's look at each of these more closely.
In large SANs, it's always a good idea is to isolate traffic as much as possible through the use of good zoning practices. You should create small zones that include the path from each server to the storage port it needs access to. If you create large zones, if one of the devices within that zone misbehaves, all the zone members will see that traffic. Limiting the membership within each zone to a single server path reduces this risk. Using dual paths through two fabrics will allow I/O traffic to continue through the surviving path if your server looses an HBA or someone trips over a cable.
Using two fabrics will help load balance I/O traffic through two distinct paths. If one of the switches in any path gets over loaded, the path failover software on the host can redirect traffic to the faster path. You want to make sure your path failover software uses queue depth as a rule for load balancing. Path software can use queue depth or round robin as a method to balance I/O traffic across the HBAs. Using the queue depth allows the software to make more efficient decisions on which path to send the I/O. If one of the ISLs in the fabric becomes over utilized (over subscribed), you can move traffic through the other fabric to balance out the load.
This is probably the most important reason for using two fabrics. If you need to upgrade the firmware on your storage arrays, you can do it online by moving all traffic through the other fabric while upgrading each controller on the array one at a time. As you upgrade the first controller, all traffic will fail over to the other fabric paths. You then bring the first controller back online and do the other side. The same holds true for switch maintenance. Some switches require a reboot for new firmware images to take effect. If you had one big fabric, you would affect every device connected to that switch. Using dual fabrics will allow all traffic to continue to flow while switch maintenance is performed.
Although I usually recommend using a third HBA adapter in your servers for allocation to online SAN backup, some folks do not have either the slots in the servers, or the budget to allocate to a dedicated backup path in the SAN. If you use dual fabrics, you can share a tape library among different backup environments by implementing zone configuration changes in only one fabric. This will allow access to a shared library through a single path, while disk I/O traffic continues un-affected through the other path. It's not a good idea to push backup data streams along side normal production disk traffic in the same fabric. Using one fabric as the backup path let's you do this without having a third adapter.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our .bphAaR2qhqA^0@/searchstorage>discussion forums.
This was first published in September 2002