When creating a SAN, you need to strike the right balance among performance, cost, scalability, high availability...
and ease of management.
SANs are growing in two ways: They're getting bigger and companies are adding more of them. SANs are no longer limited to large organizations and mission-critical applications; they're popping up in small- to medium-sized businesses (SMBs) and are increasingly used to serve all storage tiers. Fueling the growth of new SANs is rapid storage growth, the integration of geographically dispersed data, compliance, more stringent application requirements and the need for a higher level of redundancy.
With some SANs holding hundreds of terabytes of corporate data, it's imperative for a SAN to be reliable, scalable and have few, if any, performance issues. Growing a SAN involves balancing performance and high-availability (HA) requirements with your cost objectives in the following areas:
- Overall SAN architecture
- Switches and directors
- Storage arrays
- Computing platforms
Besides redundancy, the determining factors for your SAN architecture are performance, cost and scalability. From an acquisition standpoint, the least-expensive SAN design is accomplished by combining multiple, interconnected, smaller port-count switches. Unfortunately, the Google model of using a large number of low-cost components doesn't work well for SANs for two reasons. The links connecting the switches, also known as inter-switch links (ISLs), are prone to congestion and, as more switches are interconnected, performance becomes less predictable with a greater likelihood of bottlenecks. And a higher number of switches means more complex storage management, resulting in higher ongoing maintenance costs for the SAN.
|SAN architecture choices|
Consequently, one SAN design goal is to minimize the number of switches to eliminate ISL performance choke points. "A small to midsized SAN with more than four FC [Fibre Channel] switches in a single data center is a clear sign of a wrong SAN architecture," says James Opfer, research vice president at Gartner Inc.'s storage research group. With switch/director port counts ranging from eight to 512 FC ports in a single device, most small to midsized SANs can be based on a single dual-core architecture that eliminates ISL bottlenecks.
An important aspect of growing a SAN and SAN performance is the concept of locality. As a rule of thumb, the closer a server is to the storage array, the better performance will be. For instance, connecting a server and storage array to the same 16-port group on a 4Gb/sec blade on a Brocade Communications Systems Inc. SilkWorm 48000 director will result in optimal data throughput as the traffic will be locally switched to the destination port without having to leave the blade. If the server and array are on different blades on the same switch, all data needs to traverse the backplane. To make matters worse, if the server and storage array are on different switches, the data will also have to traverse an ISL link.
In small to midsized SANs, storage optimization by harnessing locality is manageable, but it's impossible to control in very large SANs with tens of switches and thousands of ports. If the example above described a SAN with five daisy-chained switches, with the server and array connected at opposite ends of the chain, the worst-case scenario could mean traffic between the server and array traversing five ISL links.
Hence, the design strategy for large and very large SANs is to confine the switch distance by tiering the SAN and introducing dedicated server switches (server tier) connecting back to core switches (core tier). While storage arrays and tape drives are directly connected to core switches in a two-tier architecture, three-tier architectures introduce a dedicated switch tier for storage arrays and tape drives. The benefits of a tiered SAN architecture are scalability, simplicity and predictable performance (see "SAN architecture choices," at right).
|High-availability SAN design|
The University of Minnesota SAN connects three locations and ensures that storage data will remain available in case one of the three sites is unavailable.
Beyond the main data center
SANs increasingly need to reach beyond a single data center, posing new challenges for storage architects. Disaster recovery and business-continuity requirements, as well as the need for organizations to operate in more than one geographic location, are spawning SANs that spread across multiple sites. Geographically dispersed SANs typically deploy an edge-to-core architecture, with edge SANs in smaller locations connecting back to core switches at larger sites. Obviously, segmentation and isolation of SANs is even more important in multisite SANs than it is in a single-location SAN. A problem or change in one location should never impact other locations. Depending on the size of the branch office, available bandwidth, latency tolerance of applications in use and the amount of data to be accessed, servers in remote locations are either directly connected back to storage arrays in the central location or they're attached to storage arrays in the branch office. While leveraging central storage arrays is the more cost-effective approach--eliminating the need to purchase and maintain arrays in the remote office--the cost benefit may be offset by lower performance.
Carl Follstad, manager, university data management services at the University of Minnesota in Minneapolis, went through a similar thought process when architecting the university's SAN. Follstad was faced with three locations of similar size and comparable storage needs that required him to deploy local storage arrays in each location (see "High-availability SAN design," this page). Follstad deployed a multisite SAN using three pairs of MDS 9509 directors from Cisco Systems Inc. that constitute the university's SAN backbone, spreading across the three locations. The core network forms a triangle, connecting each site with the two other sites, ensuring SAN availability of the remaining two sites if one of the three sites is unavailable. Along with three storage administrators, Follstad manages a total of 280TB of data using a combination of EMC Corp. Symmetrix DMX and Clariion CX arrays.
The University of Minnesota storage network illustrates the importance of HA in a SAN design. HA through redundancy is a prime objective of any SAN design. HA design involves reducing single points of failure at the device, SAN and site level. As the severity of an outage increases from the edge to the core, components closer to the core demand a higher level of redundancy. HA design doesn't stop at the switch and array--it goes all the way to the server. Multipath software for load balancing and automatic path failover like EMC's PowerPath and Sun Microsystems Inc.'s MPxIO enable servers to be dual-attached to redundant switches. As with performance, HA doesn't come for free and the level of redundancy needs to be balanced with cost.
Switches and directors
FC switches are a key component of SANs, connecting initiators like servers with SAN targets (arrays and tape libraries). With FC switches available from a range of vendors, including Brocade, Cisco, Emulex Corp., McData Corp. and QLogic Corp., the selection of an appropriately sized switch should be driven by your current needs, growth expectations, and redundancy and performance requirements.
|Tiered SAN design|
In a tiered SAN design, servers are connected to one tier of the SAN fabric while storage is connected to another tier.
First, determine the number of currently needed FC ports and then project the number of ports that will be needed within the next couple of years. "Typically, we see companies size their infrastructure up to a factor of two of today's needs. Sizing infrastructure beyond a factor of two is expensive and, in most cases, uneconomical," says Gartner's Opfer.
Combine the number of required ports with the best-practice guideline of limiting the number of switches to as few as possible and you'll get a rough idea of what type of switch to consider. Generally, if the number of switch ports you need is more than 64, you should seriously consider a director-class switch. While directors are expensive, their passive backplane, redundant hardware components, variable port-count FC blades (that can operate in 1Gb/sec, 2Gb/sec or 4Gb/sec FC modes), and aggregate backplane data throughput capacity of more than 1TB makes FC directors the ideal switching workhorse, especially for the SAN core. Besides HA, the chassis form factor of directors provides scalability by simply adding FC blades. For instance, Cisco's MDS 9513 will scale to 528 ports if fully populated with 48-port FC blades.
For port-count requirements of fewer than 64 ports, including some room for growth, director-level switches are overkill and in most cases not affordable. Although most stackable FC switches are available only with up to 32 ports, Brocade's SilkWorm 4900, and QLogic's SANbox 5200 and 5600 scale up to 64 ports. In fact, Opfer sees an increasing demand for high port-count switches in the future. "With both switches and directors deployed in pairs, the high-availability design of directors seems like overkill," comments Opfer. "High port-count switches are especially attractive for small to midsized companies that can't afford expensive directors."
One of the drawbacks of 1U and 2U form factor switches is their fixed port count. The key feature to look for in stackable switches is port scalability within the switch. In other words, does a switch support activating additional ports by procuring additional licenses in the future, or do all ports have to be purchased outright? While both Brocade and McData support ports-on-demand in all their switches, Cisco and QLogic typically don't. Ports-on-demand lets you pay for ports when they're needed rather than paying for idle ports that might be used in the future.
A well-designed SAN will take advantage of port oversubscription as a tool to balance performance requirements with cost. Oversubscription relies on the fact that not all ports within the same port group are fully utilized at the same time. For instance, Brocade's SilkWorm 48000 director will only operate at full line-rate speed on all ports when using 16-port blades, but it's oversubscribed 2:1 if 32-port FC blades are used.
Similarly, the Cisco MDS 9513 director with 48 4Gb FC ports is oversubscribed 4:1. "Compromises like oversubscription make a SAN somewhat more complex, as it becomes more difficult to predict the available bandwidth per port," says Mario Blandini, Brocade's product marketing manager. From a storage design perspective, ISL links and performance-sensitive servers shouldn't be connected to ports that are part of an oversubscribed port group.
Another tool for balancing cost with performance requirements is harnessing quality of service (QoS) to achieve defined performance goals. For instance, a port within an oversubscribed port group can be configured using port bandwidth reservation to ensure that it will always get the reserved bandwidth. It's also conceivable to operate a blade at a lower FC port rate. For example, a Cisco 48-port blade in the MDS 9513 director will operate at full line speed at 1Gb/sec FC mode. Furthermore, switch features like traffic classification and zone-based QoS, as well as the ability to assign more credits to certain switch ports, provide storage architects with the ability to assign guaranteed capacity where required.
To grow SANs horizontally, most switches and all directors provide the ability to trunk multiple ISL ports into a single data-aggregation pipe. For instance, Brocade supports trunking of as many as eight ISLs into a single 32Gb/sec trunk group, balancing data loads across trunk groups using a combination of standard route balancing via the fabric shortest path first (FSPF) protocol and optimized vendor-specific techniques such as Brocade's Dynamic Path Selection (DPS) protocol.
Growing a SAN cost-efficiently requires storage architects to look beyond FC. While iSCSI is being embraced by smaller companies as their primary SAN, it can be leveraged in FC SAN environments for less critical applications like disk-to-disk backup and departmental applications. With all major storage vendors supporting both FC and iSCSI, consideration of the iSCSI option should be part of any new SAN deployment and SAN expansion project. However, iSCSI's lower cost and complexity need to be weighed against its slightly lower performance and its spotty support in storage management applications today.
"We are seeing companies with existing FC SANs using iSCSI to connect midrange and workgroup servers for which FC can't be justified," says Rajeev Bhardwaj, senior manager of Cisco's storage business unit within the Data Center Business Unit. Multiprotocol support in directors, as well as dedicated iSCSI gateways and routers from companies like Brocade, Emulex and McData, enable enterprise customers to attach servers via iSCSI to their existing FC storage.
Bringing iSCSI into an FC SAN will definitely result in a somewhat more complex SAN that requires mastering the idiosyncrasies of two different SAN technologies that have their own unique requirements. "We have been evaluating iSCSI through an eight-port IP line card in our Cisco MDS 9509 directors," says Follstad at the University of Minnesota. "An ongoing LAN restructuring project and the lack of a Radius server infrastructure for CHAP [challenge handshake authentication protocol] authentication were some of the hurdles we had to overcome."
Storage managers trying to cut costs also need to decide how best to bring NAS into the SAN mix. The benefit of NAS for file access is its inherent ability to serve files in both CIFS and NFS file-system protocols, making files accessible to both Windows and Linux-/Unix-based systems. Most large storage vendors, including EMC, Hewlett-Packard Co., Hitachi Data Systems (HDS) and IBM Corp., have embraced NAS in the enterprise by offering midrange to large-scale systems as well as NAS-to-SAN gateways.
NAS gateways provide NAS head functionality for CIFS and NFS access to SAN storage. In other words, NAS gateways are NAS units that substitute a dedicated disk subsystem with storage from the SAN storage pool, consolidating NAS and SAN storage, and lowering both storage acquisition and maintenance costs. "By using two EMC Celerra NS704G NAS gateways, we are able to offer NFS file access by leveraging our current investment in Clariion and Symmetrix storage," says Follstad.
Not all storage arrays are equal. Therefore, clearly understanding vendor-specific differences and roadmaps is crucial when choosing a RAID vendor. While features like mirroring, synchronous/asynchronous replication and point-in-time copies are standard in today's storage arrays, they're largely proprietary and will work only between arrays of the same vendor or RAID family. For instance, EMC's SRDS software for remote replication won't work with HDS' TrueCopy or IBM's Peer-to-Peer Remote Copy. Therefore, standardizing on a single storage array vendor is prudent. Moreover, all major storage vendors, including EMC, HDS and IBM resell Brocade, Cisco and McData switches, enabling a single-vendor strategy for all SAN components. But not all CIOs are comfortable with putting all of their eggs in one basket. Therefore, the benefit of a single-vendor relationship and the volume discounts that go along with it need to be weighed against vendor lock-in.
The single most important factor when choosing a storage array is determining the class of storage required. Will a midrange array suffice or is an expensive, high-end storage array required to meet given performance and growth requirements? Midrange systems are typically dual-controller arrays like the EMC Clariion CX array family or Hitachi Thunder 9500 V Series. Tailored to the needs of smaller and midsized environments, they're easier to manage than high-end arrays. High-end arrays like the EMC Symmetrix DMX array family and Hitachi Lightning 9900 V Series are based on scalable architectures that scale to the most demanding performance requirements. For instance, the Direct Matrix Architecture implemented in the Symmetrix DMX array family supports up to 128 processors, enabling it to deal with a range of simultaneous tasks and, most importantly, to maintain performance and application response as the load increases.
Besides storage controllers, disk drives are a major part of the overall price of a storage array. Choosing the appropriate type of drive for a given purpose can result in noticeable savings. While expensive FC drives should be used for mission-critical applications and data, SATA drives suffice for less-critical, tier-two storage.
|Adding SAN intelligence|
SANs have transformed storage from a system-bound resource to an easily scalable shared network resource. The next evolution of SANs will likely include a higher level of virtualization that reduces the vendor dependency seen in contemporary SANs, and an increase in intelligence to simplify and automate many of the complex manual tasks performed by storage administrators today.
However, there's considerable controversy about where to embed this added intelligence in the SAN. There are several solutions:
When growing a SAN in both capacity and performance to meet application requirements, the computing platform can play a crucial role, especially for large and very large SANs. Determining the right computing model to enable an application to meet its performance, availability and scalability objectives is as important as choosing the right SAN architecture.
Contemporary application trends such as Web services, service-oriented architecture (SOA) and software as a service (SaaS) require a platform that scales linearly, easily and as close to infinitely as possible by simply adding computing resources, including storage. Grid computing is touted to be this platform. Grid computing or grid clusters are akin to a utility in which semitrusted nodes perform assigned tasks. Key functions performed by grid computing include scheduling of nodes and resources, data virtualization that makes information available whenever and wherever it's needed, provisioning of available computing resources and resource management.
At its core, grid computing is based on an open set of standards and protocols such as the Open Grid Services Architecture (OGSA) that enables communication across heterogeneous, geographically dispersed environments. Virtualization of computing resources is at the core of grid computing, but unlike traditional virtualization technologies that virtualize a single system, grid computing virtualizes vast and disparate IT resources.
The biggest stumbling block when growing today's SANs is the lack of multivendor product interoperability. Consequently, using products from as few vendors as possible is advantageous. Storage applications from mirroring, snapshots and replication, to virtualization and transparent data movement rarely work outside very specific single-vendor product configurations (see "Adding SAN intelligence," at right).