Scaling storage

"Scalability" is often defined as the ability of a storage system to support more or higher capacity hard drives. But adding capacity is only part of the scalability picture. To address scalability most effectively, you have to consider how additional capacity will affect other elements in the environment, as well as the performance of hosts and their applications.

Scaling storage might seem as easy as tossing a few more disks into the array, but adding just capacity can affect your overall performance.

"Scalability" is often defined as the ability of a storage system to support more or higher capacity hard drives. But that's not the whole story. For a storage array to be considered truly "scalable," there are other factors that are just as important -- or maybe even more important -- than disk capacity. Scaling an array's disk capacity may be as simple as buying a few drives, but scaling throughput or performance (such as changing the fan-out ratio or an application I/O profile) can be a challenging task. And it's a task that may be compounded if too little thought was put into the initial design of the storage system in terms of how it was implemented or its hardware configuration. There are many perils associated with growing a storage environment, as additional I/O may cause an imbalance that could impact the system's overall performance.

When additional capacity is being considered, you should first ensure that the additional disk won't push the system beyond its scalability margins. For example, with some modular arrays that use back-end arbitrated loops, loop contention creeps in after a certain number of disks are added and performance is affected.

Much of the confusion over scalability vs. capacity can be attributed to storage vendors not providing adequate information when pitching the virtues of their systems. More often than not, the vendor sees the sale of an array as a point solution to address a customer's immediate storage needs. So it's important that users articulate not only their present requirements, but their anticipated future growth and scalability needs. Users should also ask to see the vendor's roadmap and determine if the vendor's plan will preserve their investment over the next few years.

Be prepared
The best way to prepare for scaling a storage infrastructure is to buy the best parts in the first place. Here are some questions you should ask vendors when considering SAN and SAN array products:
  • What's the supported fan-out ratio per port?
  • What's the maximum number of logical unit numbers that can be presented per port?
  • How many front-end host directors or adapters can be added?
  • How many back-end disk directors or adapters can be added?
  • How many disks are supported per adapter? What's the recommended average?
  • What's the average IOPS supported by the host port? (Some vendors may not publish this information.)
  • How many disks are supported in a RAID group? Does it vary by RAID level? What's the recommended number of disks in a RAID group for each RAID level?
  • Does replication or snapshots/clones cause a performance penalty on the production devices? If yes, to what degree?
  • Does the array support a mix of different drive types?
  • What sort of multiplatform support is available? Are there any issues with connecting the array to multiple platforms?

Scaling and applications
Capacity and scalability requirements come in all forms, but they're ultimately reduced to a single consideration: Do your applications function in a manner that suits the business? For example, the I/O profile changes resulting from adding more disk space vary considerably from application to application. A database growing in size (and I/O) may need more disk space to generate more logs or to create bigger indices. Once the additional space has been made available, the array's performance may be affected because the application now demands more I/O. This may be very different from an Internet-based application that requires more space only to store more content.

Effective scaling is based on the correct sizing of all the parameters involved. If you don't understand the requirements of your applications, you may impact the larger environment.

Understanding application requirements for your storage infrastructure is a part of the application lifecycle management process. As applications change, so do their requirements. The application management process starts during the initial design of the storage system. Understanding these parameters--and how they'll likely grow--is the key to avoiding being locked into a disk solution that doesn't scale with the apps it hosts.

Scaling with disk
One basic principle of economics is that want generates demand, and it applies equally well to storage scaling. Consider a well-designed storage system that's still performing well within its optimum range, but whose hosted apps are running out of room. Additional disks appear to be the solution, but you also need to ensure that the additional storage presented to the hosts preserves the I/O balance: controller capacity, fan-out ratios, spindle contention and so forth.

All vendors publish best practices on adding more disk capacity without causing a performance imbalance. These guidelines may include the recommended size of a RAID group and volume, stripe element sizes, offsets and so on. Most vendors also provide some basic performance monitoring tools for their storage arrays that can be used to predict a system's growth and anticipated performance degradation. Tools such as EMC Corp.'s Navisphere Analyzer for Clariion can provide a good deal of information.

Scaling beyond disk
Maintaining an array is like taking care of a car--it needs gas, of course, and oil periodically, but you also have to keep an eye on other things to keep it running smoothly. For a storage array, you need to ensure that the load on the array subsystem is monitored and tuned as needed. Some of the variables (and hardware items) that bear watching include front-end host ports (for fan-out ratios and host traffic), the cache subsystem (for memory management), back-end disk processors, back-end I/O paths, and array or RAID processors. For well-balanced growth, all of these subsystems need to operate within healthy limits.

Viewing a storage array as an entire system, rather than as just the disks, has its benefits. First, you can control which aspects of the system need to grow right away and which ones can wait for the next budget cycle. Second, performance issues can be hedged by moving things around. For example, if your write cache is being hammered and your read cache is lightly used, some arrays let you change the ratio to provide more write cache on the fly. In the long term, you may need to add more cache (if the array allows), but a short-term fix can provide some relief while you determine the best way to grow the array.

Arrays have architectural limitations, so the initial design and array configuration will determine just how much growth flexibility you'll have. Critical architectural decisions such as modular vs. monolithic, crossbar vs. switched, and the number of host and back-end ports will play a key role in defining the array's scalability.

Performance tuning and scalability
Scaling a storage system also means addressing performance bottlenecks. This list highlights potential spots where performance bottlenecks can occur. Keep in mind that this entire stack is a tightly coupled chain--a change in any of the links will be felt throughout the infrastructure.
  • Hosts and applications
  • Application (layout of application objects across multiple file systems or devices, number of spindles made available)
  • File system (type of file system, journaling impact, file-system parameters, direct vs. asynchronous I/O, special access handling such as Quick I/O or Oracle Disk Manager)
  • Volume manager (layout to avoid or minimize spindle contention, stripe sizes, mirror policies, read/write optimization)
  • Operating system setup (SCSI and IP/network parameters, multipathing software)
  • Host bus adapters (fan-in ratios, number of HBAs per host, vendor-recommended settings, binding)

  • Network
  • Switch ports (speed settings, fabric parameters)
  • Fabric (inter-switch links, fabric parameters, Fabric Shortest Path First, oversubscription)

  • On the array
  • Front-end host directors or cards (fan-out ratios, port settings)
  • Cache (read vs. write cache, cache hit ratios, utilization)
  • Back-end disk directors or cards (I/O balance and spreading, disk geometry mismatch, RAID layout and access)
  • Disks (type, speed and size of disks; SATA vs. Fibre Channel; spindle contention)

Virtualization scales beyond the array
Virtualization provides additional flexibility in a storage infrastructure. It allows you to scale beyond a single storage array with seamless data mobility. Data replication and migrations can be performed across multiple storage systems (including heterogeneous environments) using a single interface. It is, however, an evolving technology and one has to pay careful attention while designing a solution. Virtualization standards (such as the Fabric Application Interface Standard) haven't been widely adopted, so a storage team may have to perform provisioning manually using individual point tools. But virtualization is here to stay, and as it matures some of its current limitations will be overcome, allowing the effortless provisioning of virtualized storage.

An array's architectural limitations are a key limiting factor when attempting to scale that array--hitting that wall essentially means that there are no longer any scalability options remaining. Virtualization allows users to mask the limitations and move data onto other storage subsystems without costly downtime. This enables performance and capacity-hungry applications to be satisfied from the virtual storage pool and, conversely, apps whose I/O and capacity requirements are reduced can be scaled down to slower, cheaper storage.

A growing list of vendors tout some type of virtualization in their arrays or other products. Products such as Hewlett-Packard Co.'s StorageWorks XP series, Hitachi Data Systems' TagmaStore Universal Storage Platform and Sun Microsystems Inc.'s StorEdge 6920 provide in-band virtualization in the array itself. That basically means all virtualization of existing storage is performed by the new array in a manner that's totally transparent to the network or host.

There are two types of block-level virtualization--symmetric and asymmetric. Symmetric virtualization can be either embedded or appliance-based. In symmetric virtualization, control and data traffic share the same I/O path, while in asymmetric virtualization, control and data traffic is segregated into separate channels or paths. Most control path processing is performed by an appliance or a meta data server that sits outside the data path.

In embedded virtualization, virtualization is managed by an ASIC or array processor that controls a switch, port or its subset. All control and data traffic flows through the port the hosts are plugged into. Appliance-based virtualization devices are designed to enable all communication to pass through a single appliance plugged into the fabric. This appliance is responsible for all virtualization and data migration/mobility functions, with no dependencies on the existing fabric.

In semi-embedded or hybrid virtualization, the control traffic is segregated and routed via either a separate out-of-band network or in-band through IP over Fibre Channel (FC). Data, however, flows in-band via the fabric.

IBM Corp.'s SAN Volume Controller provides symmetric virtualization, while EMC's Invista employs hybrid virtualization--virtualization that's a combination of asymmetric and symmetric approaches. The latter requires intelligent switches to be installed in the SAN. Switch vendors are also catching up on fabric virtualization where the entire abstraction is performed in the network.

Selecting the type of virtualization that suits your environment is a complex process that requires careful evaluation and selection. Symmetric embedded and appliance- or array-based virtualization seem to be the most prevalent, but this doesn't imply that other methods don't merit evaluation.

Scaling the infrastructure
Adding or reassigning resources may appropriately address storage array scaling, but these enhancements can be undone by a poorly designed or badly scaled infrastructure. Your network infrastructure, whether it's an FC SAN, the corporate IP network for NAS or an IP SAN, must have the bandwidth and scalability to allow hosts to take advantage of the storage system.

For example, as the amount of data accessed by a host grows, the number of I/O paths (host bus adapters [HBAs] or network interface cards) used by the host may need to be increased--a change that will have to be accommodated by the network. And as a network grows more complex, there are more elements that require attention when attempting to scale storage. For example, if you're using a single inter-switch link (ISL) vs. an ISL trunk, you need to give careful consideration to oversubscription ratios and related matters. Trunks may need to be monitored for saturation. In addition, increasing storage capacity means that the amount of data to be backed up also increases; if you use IP for backups, you may need to perform GigE aggregation to allow for greater I/O.

Scaling storage requires a more holistic approach, where the considerations go beyond the storage array itself. For example, let's assume you have a 3TB database with peaks of 40,000 IOPS. You expect a 40% growth rate for both the disk and I/O over the next three or four years. The amount of I/O dedicated to only this database could outgrow most small arrays, making a forklift upgrade almost unavoidable. But even if you're able to upgrade the storage array to handle the higher capacity and performance requirements, will you be able to scale to handle the increased loads?

It's important to monitor an array's network utilization at all times. Use network monitoring tools to keep a constant watch on how much network traffic is generated by the array. Key network indicators are input and output errors, retransmissions, line loss and saturation, and latency. Make use of both horizontal and vertical spreads to minimize network imbalance. For example, if it's a NAS or IP SAN solution, make sure your array is connected to multiple switches and that the client load is evenly distributed.

Don't overlook hosts
In addition to the network, how hosts use available storage will have a direct affect on any storage scaling efforts. The way that hosts access storage will largely dictate how applications running on them perform. Some of the host-related design issues that can have a bearing on storage scaling include:

  • The number of I/O paths (HBAs) to the storage ports and the fan-in ratio per path
  • The type of multipathing software (and load-balancing algorithm) used
  • Volume management (volume layout) and file system
  • SCSI stack tuning

The way in which an application's objects use the storage resources assigned to them can also make a huge difference. For example, putting log and data files on separate file systems (and spindles) for an Oracle database is common practice. All of the above parameters also change based on how powerful the host is in terms of its CPU, memory and backplane. Vendors' engineering documents should be checked to see how much I/O a host is capable of driving through each HBA.

Planning is essential
Detailed planning and design before you invest in a storage infrastructure will make it more likely that future scaling projects will be successful. It's critical to understand and be able to quantify the scalability limits of your storage infrastructure--storage arrays as well as the network.

Adding capacity to an environment may appear to be a linear and superficial task, but the consequences can be considerable. A storage array is complex, and its components--disk controllers, cache, processors and front-end controllers--all need to function within vendor-recommended ranges for the array to perform at its best.


Dig Deeper on SAN technology and arrays