Bits & Bytes: We all know that if things were done right in the first place, problems wouldn't creep up on us, right?
According to expert Chris Poelker, it's certainly true when designing a SAN environment. In this Ask the Expert answer, Chris discusses the problems one can encounter with a poorly designed SAN, unimplemented change control policies and the importance of flexibility.
A reader recently asked Chris the following questions:
In a SAN environment where you have a significant number of servers sharing a few switches to get to large storage devices, it is no small task to coordinate outages to implement switch firmware upgrades. Do switch vendors take this issue into account when determining how often upgrades come out and how long existing versions are supported?
Here's what Chris had to say in response:
Not as far as I have seen. The vendors come out with new firmware versions on a regular basis to either fix bugs or add new functionality. Your experience with outages could mean your SAN environment was not properly designed to start or you have been adding to the SAN with no policies in place for change control.
What you are experiencing is the reason why proper SAN design MUST take into account scheduled and unscheduled maintenance procedures and be flexible enough to withstand multiple component failures. If you have a "significant number of servers sharing a few switches to get to large storage devices", then you need to plan for congestion. Make sure your fan-in-ratio is in agreement with current standards for the bandwidth you are using and the ISL links are properly balanced to take the load. Use trunking if possible.
For those servers where you are finding pain by having to plan for downtime, make sure you have at least TWO host bus adapters in each server with path failover software, connect each path to storage across TWO separate fabrics and assign volumes to the servers from TWO separate storage ports. Doing so will provide you with a resilient design, that will enable automatic path fail over by the path fail over driver as you take down each fabric, one at a time, to perform maintenance. Also, make sure your storage provider provides the capability for "online microcode loads", so you have zero planned downtime.
Downtime adds up and increases the operating costs of your storage solutions. Always take a close look at this when deciding on future purchase decisions. Customers need to drive the need for zero downtime to the vendors and the vendors get the message when your storage dollars are going to their competitor!
Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our .bphAaR2qhqA^0@/searchstorage>discussion forums.
This was first published in May 2003