Scaling SANs

Horizontal and vertical scaling are two methods of improving a SAN's capacity and performance. We discuss how to choose the appropriate approach for your environment.

Vertical and horizontal scaling

Vertical scaling is accomplished by replacing components with greater resources, such as switches with higher port counts.

Horizontal scaling builds out a storage network with additional switches, inter-switch links (ISLs) and so forth.
Storage area networks (SANs) must be able to scale as they grow to meet burgeoning storage demands. There are various techniques for scaling SANs from a switch and fabric perspective. The best technique is one that satisfies current storage requirements and accommodates future growth.

The basic building block of a switched Fibre Channel (FC) storage network is the FC switch, which ranges in size from a few ports to hundreds of ports. Similar to Ethernet-based networks, an FC storage network (fabric) can be implemented using a single switch or as a fabric network with multiple switches interconnected using inter-switch links (ISLs). We'll focus on scaling individual FC SAN fabrics to add ports to support more servers and storage, and to improve performance.

While there are various techniques for scaling FC SANs, the basic tools available to storage managers are vertical and horizontal scaling (see Options for scaling Fibre Channel storage networks). Deciding on which approach to use depends on the performance and availability needs of the servers you are attaching to and, to a lesser extent, on the physical distribution of your infrastructure and your ability to manage complexity.

In general, if you have a group of servers and storage located relatively close to each other, they could be attached to a high port-count switch (vertical scaling). For example, using high port-count switches deployed in pairs for redundancy, you could support from 128 to 256 server, storage and ISL ports. Horizontal scaling will likely be more appropriate when dealing with groups of servers at different locations, or when you need more ports than the number a single switch can support. For example, if you have multiple servers and storage located throughout a building, in a campus or across a metropolitan environment, horizontal scaling is more effective.

Vertical Scaling
Vertical scaling--sometimes called "scaling up"--involves increasing resources, such as ports, storage capacity and bandwidth. In its simplest terms, vertical scaling implies making a resource more powerful or larger, such as by adding a switch with greater bandwidth or more ports (see Vertical and horizontal scaling, this page).

Vertical scaling is used to consolidate resources to reduce per-unit costs, simplify management and support, and better utilize resources. Another characteristic of vertically scaled devices is the ability to logically and physically subdivide resources into logical partitions or domains. For example, current generation FC switches from Brocade Communications Systems Inc., Cisco Systems Inc., Computer Network Technology Corp. (recently acquired by McData) and McData Corp. allow for the creation of separate or logical domains.

One benefit of using a vertically scaled large switch is the ability to consolidate ports from many smaller switches into a single, larger device to simplify management, maintenance and configuration, and to reduce the number of ISLs. The downside is the potential lack of resiliency. Consequently, vertically scaled switches should be deployed in pairs in separate fabrics.

Horizontal Scaling
Horizontal scaling generally involves building out a fabric and networking switches together on a local or wide-area basis to increase the number of ports beyond what a single switch offers. Horizontal scaling may also be used to meet the requirements of distributed or geographically dispersed SANs. Horizontally scaled SANs use ISLs to establish links among the various switches.

With horizontal scaling, the high port-count switches provide a high degree of locality; with so many ports adjacent to each other, the need to make hops across an ISL is eliminated. This reduces latency and can improve performance while reducing SAN complexity. But congestion can still occur within a switch due to head-of-line blocking that slows traffic by blocking access to switch ports (see Head-of-line blocking). Congestion might also occur with oversubscribed ports.

Horizontal scaling based on cascade, ring and core/edge topologies use ISLs to provide redundancy, and to scale out to large numbers of ports and higher bandwidth (see Switch topologies). These topologies can be implemented to meet different needs for SAN scaling. They can also be used to connect SAN islands built around individual switches into larger, single fabrics or by using routers to physically connect the switches while logically isolating the fabrics.

Options for scaling Fibre Channel storage networks
Non-blocked and oversubscribed ports
Storage interfaces are typically configured to be undersubscribed to avoid congestion that could hamper performance. Non-blocking architectures provide full bandwidth between port pairs to reduce the possibility of congestion. But not all servers have the same requirements and performance needs, so their workloads may be aggregated to a single port on a storage device with faster ports. Access to the switch bandwidth may be shared by two or more adjacent ports instead of all ports operating at line or interface speed.

Oversubscribed ports lower costs while dispelling the myth that all servers should be treated with the same interface. With its MDS 9500 switch, Cisco has implemented oversubscribed host-optimized ports; the box's standard 16-port 2Gb FC blades operate at full speed, but its 32-port blades are oversubscribed by design to share internal bandwidth across multiple ports. Cisco has also implemented host-optimized ports on its 9000 series of edge switches.

The caveat related to oversubscribing ports is that the placement and locality of servers and storage becomes important to prevent blockage and congestion. For example, if all servers need to operate at full speed, the oversubscription ratio would be 3.2:1. This means that, assuming 100% utilization, there could be up to 3.2Gb/sec of server and storage I/O workload competing for 1Gb/sec of available switching bandwidth. For example, if you have four 2Gb/sec oversubscribed ports that fan-in to a 2.5Gb/sec interface in a switch core, assuming 100% utilization, this would result in an oversubscribed ratio of 3.2:1. In reality, many servers have relatively low I/O and workload requirements that, with planning, can be accommodated without performance delays on an oversubscribed port. But caution is required when oversubscribing ports. It's important to keep track of which ports are for low-performance servers and which ones are for the high performers that may need more bandwidth.

A configuration where many servers converge to a single storage port is called fan-in (or fan-out if viewed from the storage out to the servers). In a fan-in topology, servers and storage devices could be on full-speed or oversubscribed ports. Congestion may occur with a fan-in architecture at the port because of insufficient bandwidth or head-of-line blocking. Head-of-line blocking takes place when traffic at the beginning of the queue blocks other traffic.

Virtual output queues can help to solve the head-of-line blocking problem. A virtual output queue essentially creates a logical port for traffic from various servers while having shared access to the physical port resources. Virtual output queue functionally has become a standard capability on current-generation switching products and is sometimes referred to as a quality of service feature. This eliminates head-of-line blocking by providing fair access and preventing delays caused by slower devices accessing the port. In this manner, virtual output queues can help to scale performance and connectivity.

ISLs and horizontal scaling
ISLs are a key component for implementing horizontal scaling on a local or wide-area basis. One way to increase bandwidth and availability among switches is to add more ISLs aggregated using trunking or configured as individual ISLs. Another method is a hybrid approach using a combination of faster ISLs and additional ISLs to meet specific availability and performance needs. The number of ISLs needed between switches is a function of the amount of bandwidth that applications require and the port speed of the ISL. For example, a 7Gb/sec bandwidth required between two switches could be accommodated with seven 1Gb ISLs, four 2Gb ISLs, two 4Gb ISLs or one 10Gb ISL. Additional ISLs may be configured for load balancing and redundancy. The level of performance needed in a failure situation (such as the loss of an ISL, switch or communication circuit) will determine how to distribute the ISLs among the switches.

FC switches can typically be physically stacked on top of each other and interconnected with ISLs. Some switch vendors, such as QLogic Corp., provide dedicated 10Gb ISL ports. A stackable switch is a good scaling option in a rack or cabinet where additional ports may be added over time, rather than using a larger, frame-based director switch or connecting traditional switches with ISLs.

A virtual ISL is a new scaling technique that allows ports in different logical domains in the same physical switch to communicate. Virtual ISLs are also known as zero-cost or zero-overhead ISLs, and are available on newer switching products that use an internal backplane rather than an external physical ISL (ports and cables).

Head-of-line blocking
Head-of-line blocking is a form of congestion that can occur when multiple servers use the same port and are forced to wait for access to storage. Virtual output queues (shown at bottom) can alleviate some of the congestion by creating logical ports to handle traffic from the various servers.

Generally speaking, servers with high I/O and bandwidth requirements should be placed on full-performance ports, while servers with lower I/O requirements can be placed on oversubscribed ports. Therefore, if you can readily identify the I/O and bandwidth requirements of different servers you can match the appropriate interface to their needs.

Switch topologies
FC domains
A Fibre Channel domain is a collection of switch ports that function as a single entity. A single FC switch or director is an example of a domain with a unique domain ID identifier (DID). FC switches can have up to 256 ports in a single domain with fabric loop ports (FL_Ports) supporting up to 256 additional sub-addresses. Multiple domains (up to 239) can be interconnected using one or more ISLs to create a fabric. Check with your switch and storage vendors to verify how many domains and which domain numbers they support in a single fabric, as well as the maximum number of switches and ISLs supported.

Switches with more than 256 ports overcome the 256 ports per-domain addressing constraint by using logical domains to partition a physical switch into smaller virtual switches and virtual SANs. Dividing a larger switch into multiple logical switches can help to simplify management. But there are still management issues related to supporting the various logical domains and switches even when multiple physical switches are consolidated into logical domains on a single large switch.

With some vendor's switches, it's possible to run different versions of firmware in each partition. This can be helpful for backwards compatibility with older devices when consolidating physical switches. But be aware that simply physically consolidating switches doesn't necessarily result in consolidated management.

Tipping the scales
Vertical scaling by adding port count, and networking switches together into fabrics to scale horizontally, aren't mutually exclusive. A combination of vertical and horizontal scaling can be used to meet your specific application and environment needs. As the cost of FC adapters and ports continues to drop, more environments will be able to improve redundancy by adding secondary paths with extra adapters and switches. Scaling also enables tiered storage access, which is an important part of a flexible data infrastructure for organizations moving toward an information lifecycle management environment. Ultimately, the scaling approach and topology that's best for your needs is one that adapts to your environment with a minimal amount of maintenance and support. As a best practice for high availability and accessibility, deploy storage networks using redundant paths and separate fabrics for servers that need high performance or uninterrupted access to storage resources.

Dig Deeper on Storage management tools