There are many potential locations for storage performance bottlenecks in enterprise data storage, including applications, email, file, and Web or database servers in the I/O
First, make note of errors or availability issues that can manifest themselves as storage I/O performance problems. For example, while a storage system may appear to be slow, a disk drive may have failed or a controller might have taken proactive action to rebuild onto a hot spare disk drive. Another scenario could be that network or I/O path errors are being logged, or that the I/O path has failed over to an alternate adapter and switch path.
The question for storage pros is how to gain insight into where a storage performance bottleneck exists.
Server performance bottlenecks
Issues regarding performance bottlenecks in servers include lack of adequate CPU and memory to initiate or process I/O operations, PCI, PCI-X or PCIe I/O bus contention and adapter configuration. You should also keep application or system software configuration, I/O type and size in perspective.
High IOPS usually equates to low throughput; similarly, high throughput usually indicates low IOPS. For example, what might appear to be a performance problem during backup indicated by a high response time on a backup device could be the result of large sequential throughput transfer. Metrics to consider include bytes or Kbytes read or written, the number of I/O operations and queue depths, along with error counts.
I/O path and network bottlenecks
It's time to consider the number, type, configuration and speed of ports. For example, do ports share backplane or paths to core switching components, or are they dedicated? Are any errors, re-tries, dropped packets or frames being reported? Have any ports renegotiated to a lower speed than what's normal? Have any devices, adapters or ports failed over to an alternate path, or not returned to their normal state after an automated failover?
If an I/O or network path is the bottleneck, then upgrade to a faster interface or link. But pay attention to performance bottlenecks that may have moved elsewhere. Likewise, if a network link, path or port is upgraded and no benefit is seen, look elsewhere to see if the bottleneck has moved.
Storage system and device bottlenecks
With storage systems, keep data transfer rates in perspective as part of performance along with cache effectiveness vs. cache hits and cache utilization. Performance metrics include IOPS, throughput, latency or response time with error counts, as well as other incidents noted in logs that could be signs of component failure or pending problems.
Items that impact performance on storage systems include RAID levels, the number and type of disk drives in a RAID set or volume group, the type of drives and their performance capabilities, as well as host server front-end ports and back-end device ports. Background tasks, including parity and data scrubbing, or disk drive rebuilds, can also impact performance along with controllers that aren't load balanced.
On paper, more disk drives, more controllers, more cache, and more or faster interfaces may seem like the best option. But it's how those components work together, running benchmarks or simulated workloads (if not your actual applications), that determines your true performance. Consequently, storage with fewer disks, less cache or other perceived inadequacies may actually be faster in terms of IOPS, bandwidth and/or latency.
While servers can be upgraded with multiple, faster processor cores, and be clustered to enable scale out and scale up capacity for demanding applications, so to can storage be scaled up and out with clustered and non-clustered solutions.
Non-clustered solutions can be enhanced with newer, faster controllers leveraging optimized processing algorithms and firmware coupled with robust processors and caching functions, as well as through the use of faster front-end (server side) and back-end (device) side interfaces.
General tips when dealing with storage performance bottlenecks
- Establish baseline performance indicators during normal time periods
- Compare baseline to performance and other indicators during degraded performance
- Review RAID and storage system configuration for low cost, near-term opportunities
- Faster servers need fast I/O paths, networks and storage systems
- Align tiered storage to meet different performance, availability, capacity and energy needs
- Solid-state drives (SSDs) attached to slow or high latency controllers can introduce bottlenecks
- Look beyond IOPS and bandwidth to keep response time or latency in focus
- Keep availability in perspective as errors or device failures can cause performance issues
This was first published in September 2009