Problem solve Get help with specific problems with your technologies, process and projects.

Multipathing for reliability

Having two separate data paths from the server to the storage helps avoid single points of failure, but it can also be a confusing process. Here's multipathing made simple.

One driver I have always seen when working with customers on their SAN strategies is a desire to avoid single points of failure, and in particular to have two separate data paths from the server through to the storage. In general, this is quite a good idea, and with storage consolidation where you are putting all your proverbial eggs in one basket it would seem an important objective. It is also one of those areas where I have often seen some confusion and indeed sometimes been confused myself.

Just to clarify, I am talking about more than one port on a server being able to see more than one port of the storage. I am not talking about the routing mechanisms that happen within the SAN Fabric.

Do we need multipathing at all?

There are occasions -- possibly too many occasions -- where I have seen a solution that is somewhat over engineered. I would typically argue that even in a non-redundant SAN, you have a number of availability upsides which can mean that even a non-resilient solution gives good availability -- possibly even more than the SCSI equivalent.

At the lowest level, fibre channel cabling is physically simpler than scsi cables and fibre channel also avoids the issue of bus termination. As you move up the stack, many aspects of fibre channel such as the 8b/10b encoding system also give high levels of data integrity. Then, once we are up to complete servers, having the data outside the server and on a SAN means that in many failure scenarios it is easy to recable and remount the data onto a different server -- the wonders of networking.

However, I would usually advise some level of redundant cabling -- at least on the storage side. And, wherever possible, I would also recommend redundant cabling on the server side as well, particularly as the SAN gets out of the small workgroup of single digit numbers of servers.

Multipathing at the physical level

At its most basic, we want two paths from server to disk, but even here, there are some decisions. To have two paths, we need two fibre channel ports on our server. For the best availability this should be two separate fibre channel host bus adapters (HBAs), although even a single dual ported HBA is a step up from a single connection.

To have two connections onto the SAN, we need to decide if this will be two connections to two separate fabrics, or two connections to the same fabric. In the past making two connections to two separate fabrics meant you had to have two connections to two different physical devices. Fortunately these days you can have two connections to the same physical box -- either the two logical switches in a Brocade Silkworm 12000, or to two VSANs in a single Cisco MDS Director.

If we have two connections to the same physical box -- whether the box is acting in separate fabrics or not -- there is the question of trying to make those two connections to different blades in the same physical box.

We then have to make two separate connections from the SAN to our storage array, and even there we may be faced with decisions on connecting to two ports on the storage array on the same interface blade of the array, or onto different interface blades.

Clearly there are a number of choices to be made even at the physical level. When making these decisions, however, it's important to think about the exposure. If we lose a connection to the storage array, we are potentially impacting a large number of servers -- so it's very important to make the best choices. On the other hand, if we lose a connection at the server end, just the one server (and possibly only one application) is impacted. This is why it's reasonable to have a dual connection setup from the disk to the SAN, but only a single connection from the SAN to the server.

How do we know we have more than one path?

Of course, physical cables don't make the solution, so we also need to consider multipathing in terms of the operating system being able to see the data. There are a number of ways to do this.

From a file system level, the operating system or application can use disk signatures or other file system meta data to identify that it can see the same data through more than one route. Alternatively, there may be some lower level mechanism whereby part of the FC/SCSI driver detects that it has more than one path and hides this fact from the operating system. This is either code from the HBA vendor or third-party code.

At this low level identification of multiple routes, there might be some vendor-specific mechanism used to identify multiple routes -- where some code on the server asks the storage device for some information. Alternately, you can use the fibre channel mechanism of Port World Wide Names and Node World Wide Names for this. The WWNs are unique IDs, created in a similar way to Ethernet MAC addresses; part of the WWN identifies the manufacturer of the device and the other part of the WWN is uniquely programmed by the vendor as they manufacture their many HBAs or many disk arrays. A port WWN is unique to an individual port, whereas the node WWN is unique to the node. (A node in network terminology is a device -- a server or storage device.) Therefore, regardless which of the many ports of the server or disk you look at, you will see the same node WWN, but you will see a different port WWN.

By using world wide names with fibre channel, you can always identify if the server has multiple routes to a device. In fact, using fibre channel you can easily tell if you have multiple routes to the same device, where you only have one route to each individual port, or indeed you have multiple routes to each individual port.

The only catch here is that different vendors use different approaches, so when you start to use different servers, different storage, different operating systems and software, and different switches, you need to plan ahead.

One final thought: Your planning stage should include a discussion on why you need to use multipathing. Do you need to send your data to multiple paths for performance or for redundancy? In the latter case, there may even be some clever OEM-specific end-to-end communication from server to storage so that the multipathing software tells the array that it is changing from accessing the data via one port to accessing the data via another port.

About the Author:

About the author: Simon Gordon is a senior solution architect for McDATA based in the UK. Simon has been working as a European expert in storage networking technology for more than 5 years. He specializes in distance solutions and business continuity. Simon has been working in the IT industry for more than 20 years in a variety or technologies and business sectors including software development, systems integration, Unix and open systems, Microsoft infrastructure design as well as storage networking. He is also a contributor to and presenter for the SNIA IP-Storage Forum in Europe.

Dig Deeper on Data storage strategy

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.


I see this error in the event viewer when one of the nodes in the Hyper-V cluster lost connectivity to the storage for a while.

Event ID: 3
Source: bfadi
Log Name: System
"Remote port (WWN = XX:XX:XX:XX....) connectivity lost for logical port (WWN = XX:XX:XX:.....).

Appreciate if someone could provide any pointers...