Don't let capacity concerns or virtualized servers bog down the performance of your storage systems. Here are 10...
ways to pump up the performance of your storage arrays and networks.
Given the choice between fine-tuning data storage for capacity or for performance, most data storage managers would choose the latter. Tips and tricks to boost storage speed are common, but they're not all equally effective in every environment. A variety of products and technologies do have great potential for many shops, from optimizing server-side access to improving the storage-area network (SAN). We'll look at some effective, but often overlooked, methods to speed up storage system performance.
Networked storage is incredibly complex, requiring a diverse set of hardware and software elements to interoperate smoothly. Not surprisingly, one of the most common causes of slow storage performance is the misconfiguration or actual failure of one or more of these components. Therefore, the first place to look for better performance is in the existing storage I/O stack.
Check server and storage array logs for signs of physical faults; I/O retries, path movement and timeouts along a functional link are sure signs. Try to isolate the failing element, but start with cable-related components. Flaky transceivers and cables are common, and can seriously impact performance while still letting things run well enough to go unnoticed. These items often fail after being physically disturbed, so be especially vigilant after installation, migration or removal of data center equipment.
1. Freshen firmware and drivers
Manufacturers are constantly fixing bugs, and new capabilities can sneak in with software updates. It's wise to stay on top of driver and firmware updates for all components in the storage network, with scheduled and proactive testing, tuning and upgrading. Microsoft Corp. and VMware Inc. have been actively adding new performance features to the storage stacks in Windows and vSphere, often without much fanfare. SMB 2.0 and 2.1, for example, dramatically accelerated Windows file sharing, especially over slower networks. Updates to NTFS and VMFS have also routinely improved performance and scalability. Stay tuned to storage blogs and publications to keep on top of these developments.
But you should note that not all updates are worth the time and effort, and some can be downright perilous. Make sure your configuration is supported by all vendors involved and has been thoroughly tested, and never use beta code in production. As a systems administrator, I tend to be conservative about what I roll out, waiting for reports from others before taking the plunge myself.
2. Question the queries
Most of our tips focus on locating and eliminating bottlenecks in the storage stack, but one should also consider reducing the I/O load before it's created. Working with database administrators (DBAs) to tune their queries for efficiency and performance can pay off big time, since a reduced I/O workload benefits everyone and every application.
3. Break down backup bottlenecks
Traditional backup applications are extremely taxing on storage resources, dumping massive volumes of data according to a daily and weekly schedule. Improving the performance of backups so they can fit within their assigned "window" has become a priority for data protection pros, but the techniques employed can help improve overall data storage performance as well.
One effective method to reduce the backup crunch is to spread it out using continuous data protection (CDP) technology. Built into many products intended for virtual servers, CDP continually copies data from a server rather than collecting it in a single, concentrated operation. This is especially valuable in virtual machine (VM) environments because the nightly backup "kick off" across multiple guests can crush storage responsiveness, from the bus to the host bus adapter (HBA) or network interface card (NIC) to the array. Microsoft and VMware also have technologies to offload backup-related snapshots to storage arrays that are better able to handle data movement.
4. Offload virtual machine I/O with VAAI
The release of VMware vSphere 4.1 included many new features, but one of the most important was the vStorage API for Array Integration (VAAI). This new interface allows VMware ESX to coordinate certain I/O tasks with supported Fibre Channel (FC) or iSCSI storage systems, integrating the hypervisor and array to work more closely and effectively together.
VAAI includes three "primitives," or integration points:
- Unused storage can be released for thin provisioning using the efficient "write_same" SCSI command, increasing capacity utilization and reducing I/O overhead.
- Snapshot and mirroring operations can be offloaded to the storage array, greatly reducing the network, hypervisor and operating system I/O workload.
- Access locking can take place at a level more granular than the whole LUN, reducing contention and wait time for virtual machines.
Although none of these screams "storage performance," the net effect can be a dramatic reduction in the I/O workload of the hypervisor as well as less traffic over the SAN. Analysts expect further improvements (including NFS support) in future versions of VMware vSphere, and one imagines that Microsoft is working on similar integration features for Hyper-V.
5. Balance virtual machine I/O with SIOC
While not a performance acceleration technology per se, VMware vSphere Storage I/O Control (SIOC) is a "quality of service" mechanism that makes I/O performance more predictable. SIOC monitors the response latency of VMFS datastores and acts to throttle back the I/O of lower-priority machines to maintain the performance of others. In practice, SIOC reduces the impact of "noisy neighbors" on production virtual machines, improving their responsiveness. This helps keep application developers and managers happy, bringing the appearance of better performance even though total throughput remains the same.
What doesn't work
Besides looking at how to ratchet up storage network performance, we also need to consider some not-so-effective approaches to improving performance. Testing can reveal interesting outcomes: Enabling jumbo frames on Ethernet networks, for example, typically hasn't yielded much of a performance benefit.
One common question relates to the merits of various storage protocols and the common belief that Fibre Channel is inherently faster than iSCSI, NFS or SMB. This isn't the case generally, although implementations and configurations vary. Similar architectures produce similar levels of performance regardless of protocol.
One should also be cautious about employing "bare-metal" technologies in virtual environments, including paravirtualized drivers, direct I/O like VMDirectPath and raw device mapping (RDM). None delivers much performance improvement and all interfere with desirable features like VMotion.
6. Streamline the server side
Today's multicore servers have CPU power to spare, but network interface cards (NICs) and HBAs have traditionally been locked to a single processor core. Receive-side scaling (RSS) allows these interface cards to distribute processing across multiple cores, accelerating performance.
Hypervisors face another task when it comes to sorting I/O and directing it to the correct virtual machine guest, and this is where Intel Corp.'s virtual machine device queues (VMDq) technology steps in. VMDq allows the Ethernet adapter to communicate with hypervisors like Microsoft Hyper-V and VMware ESX, grouping packets according to the guest virtual machine they're destined for.
Technologies like RSS and VMDq help accelerate I/O traffic in demanding server virtualization applications, delivering amazing levels of performance. By leveraging these technologies, Microsoft and VMware have demonstrated the appropriateness of placing demanding production workloads on virtual machines.
7. Get active multipathing
Setting up multiple paths between servers and storage systems is a traditional approach for high availability, but advanced active implementations can improve performance as well.
Basic multipathing software merely provides for failover, bringing up an alternative path in the event of a loss of connectivity. So-called "dual-active" configurations assign different workloads to each link, improving utilization but restricting each connection to a single path. Some storage arrays support trunking multiple connections together or a full active-active configuration, where links are aggregated and the full potential can be realized.
Modern multipathing frameworks like Microsoft MPIO, Symantec Dynamic Multi Path (DMP) and VMware PSA use storage array-specific plug-ins to enable this sort of active multipathing. Ask your storage vendor if a plug-in is available, but don't be surprised if it costs extra or requires a special enterprise license.
8. Deploy 8 Gbps Fibre Channel
Fibre Channel throughput has continually doubled since the first 1 Gbps FC products appeared, yet backwards compatibility and interoperability have been maintained along the way. Upgrading to 8 Gbps FC is a simple way to accelerate storage I/O, and can be remarkably affordable: today, 8 Gbps FC switches and HBAs are widely available and priced approximately the same as common 4 Gbps parts. As SANs are expanded and new servers and storage arrays are purchased, buying 8 Gbps FC gear instead of 4 Gbps is a no-brainer; and 16 Gbps FC equipment is on the way.
Remember that throughput (usually expressed as megabytes per second) isn't the only metric of data storage performance; latency is just as critical. Often experienced in terms of I/O operations per second (IOPS) or response time (measured in milliseconds or nanoseconds), latency is the speed at which individual I/O requests are processed and has become critical in virtualized server environments. Stacking multiple virtual servers together behind a single I/O interface requires quick processing of packets, not just the ability to stream large amounts of sequential data.
Each doubling of Fibre Channel throughput also halves the amount of time it takes to process an I/O operation. Therefore, 8 Gbps FC isn't just twice as fast in terms of megabytes per second, it can also handle twice as many I/O requests as 4 Gbps, which is a real boon for server virtualization.
9. Employ 10 Gbps Ethernet (10 GbE)
Fibre Channel isn't alone in cranking up its speed. Ethernet performance has recently jumped by a factor of 10, with 10 Gbps Ethernet becoming increasingly common and affordable, but 10 GbE storage array availability lags somewhat behind NICs and switches. Environments using iSCSI or NAS protocols like SMB and NFS can experience massive performance improvements by moving to 10 Gbps Ethernet, provided such a network can be deployed.
An alternative to end-to-end 10 Gb Ethernet is trunking or bonding 1 Gbps Ethernet links using the link aggregation control protocol (LACP). In this way, one can create multigigabyte Ethernet connections to the host, between switches or to arrays that haven't yet been upgraded to 10 GbE. This helps address the "Goldilocks problem" where Gigabit Ethernet is too slow but 10 Gbps Ethernet isn't yet attainable.
Fibre Channel over Ethernet (FCoE) brings together the Fibre Channel and Ethernet worlds and promises better performance and greater flexibility. Although one would assume that the 10 Gbps Ethernet links used by FCoE would be 20% faster than 8 Gbps FC, the difference in throughput is an impressive 50%, thanks to a more efficient encoding method. FCoE also promises reduced I/O latency, though this is mitigated when a bridge is used to a traditional Fibre Channel SAN or storage array. In the long term, FCoE will improve performance, and some environments are ready for it today.
10. Add cache
Although the quickest I/O request is one that's never issued, as a means of speeding things up, caching is a close second. Caches are appearing throughout the I/O chain, promising improved responsiveness by storing frequently requested information for later use. This is hardly a new technique, but interest has intensified with the advent of affordable NAND flash memory capacity.
There are essentially three types of cache offered today:
- Host-side caches place NVRAM or NAND flash in the server, often on a high-performance PCI Express card. These keep I/O off the network but are only useful on a server-by-server basis.
- Caching appliances sit in the network, reducing the load on the storage array. These serve multiple hosts but introduce concerns about availability and data consistency in the event of an outage.
- Storage array-based caches and tiered storage solutions are also common, including NetApp's Flash Cache cards (formerly called Performance Acceleration Module or PAM), EMC's Fully Automated Storage Tiering (FAST) and Hitachi Data Systems' Dynamic Tiering (DT).
Still no silver bullet
There are many options for improving storage performance, but there's still no single silver bullet. Although storage vendors are quick to claim that their latest innovations (from tiered storage to FCoE) will solve data storage performance issues, none is foolish enough to focus on just one area. The most effective performance improvement strategy starts with an analysis of the bottlenecks found in existing systems and ends with a plan to address them.
BIO: Stephen Foskett is an independent consultant and author specializing in enterprise storage and cloud computing. He is responsible for Gestalt IT, a community of independent IT thought leaders, and organizes their Tech Field Day events. He can be found online at GestaltIT.com, FoskettS.net and on Twitter at @SFoskett.