SAN availability and reliability

This chapter outlines some of the techniques and tactics used to achieve high availability and redundancy in a SAN (FC or iSCSI). These can include port aggregation, trunking, and failover.

Storage area networks (SAN) are basically an interconnected array of storage devices and host servers. For example, a storage array may connect to a switch through a Fibre Channel (FC) cable, or (in the case of iSCSI) an Ethernet cable. Each host server also connects to a switch and switches are typically connected to one another, all forming a fabric of devices. But even though the interconnections may seem complex, each connection is an individual link. If a problem occurs with the host bus adapter (HBA), cabling or switch port, the server or storage device may become unavailable -- this in turn can disable entire applications can cause significant downtime. Today's SANs use a combination of aggregation and failover techniques to improve their availability and reliability.

Aggregation (sometimes called "trunking", "link aggregation" or "port aggregation") is used to combine ports in order to form faster logical communication links between devices. For example, rather than replacing a single 4-port 1 Gbps FC with a 4 Gbps HBA, it may be possible to connect all four ports to the switch, "aggregating" all four of the existing 1 Gbps ports to form a single logical 4 Gbps path. Of course, this type of aggregation demands more cabling and switch ports, but it offers the benefit of faster performance, load balancing and redundancy. It is often possible to aggregate links between a host server and switch, or between a storage system and a switch, or even between inter-switch links (ISLs).

It's easy to see how multiple physical connections can be combined to improve speed, but the ideas of redundancy/failover and load balancing are all related, and should be explained together. By having multiple physical connections in the same "link," a failure in one HBA port or cable won't cut off the link entirely, and communication can continue at a reduced speed until the failure is repaired. For example, if three 2 Gbps links are aggregated into a single 6 Gbps link, a failure in one of the ports will allow data to continue on the remaining two 2 Gbps links (yielding 4 Gbps). Remaining connections will keep the host server or storage array connected to the SAN -- a key premise behind SAN redundancy and high-availability.

Another benefit of multiple physical connections is load balancing. Normally, unrelated physical links can transfer data at independent (and often unpredictable) speeds, and this can allow a bottleneck on one or more of the physical connections that can impact the overall performance of the SAN. Once multiple physical connections are aggregated into a logical data path, data can be distributed equally across the member links to balance the load and ease bottlenecks.

Keep in mind that just because multiple connections may be included in the same link does not automatically mean that all of the connections are carrying data simultaneously. For example, a storage array may be connected using two 4 Gbps links, but only one of the 4 Gbps links might be active -- the other link is connected, but kept inactive. If trouble occurs with the first link, communication switches over to a second link which will take over communication at the same speed until the original connection is repaired. This behavior is called failover, and is often seamless to the SAN user or application.

Check out the entire iSCSI vs. FC handbook.

Dig Deeper on SAN technology and arrays