A warning about improving utilization: Avoid introducing instability as a result of your optimization efforts....
This means that improving utilization is a balancing act and requires understanding your environment, applicable technologies and workloads -- including what is normal and what is abnormal. In general, there are five things to consider for improving your SAN utilization:
- Insight and knowledge of your environment (what's normal and abnormal)
- Leverage various tools, topologies and technologies (appropriate configuration)
- Isolate traffic, maximize resource usage, avoid bottlenecks (effective utilization)
- Plan for maintenance and currency of technologies (up to date technology)
- Develop and maintain a performance and capacity plan (be proactive)
Gain insight into how your SAN is performing: What's normal? What's not? How is it configured? Who is using what resources and when? Are there seasonal workload spikes? How are backups performing?
Coverage and verification tools that can analyze your environment -- reporting on whether items are correctly configured -- can help to identify issues before they become problems. Similarly, event correlation and analysis tools are a welcome addition over just the capacity usage reporting. These tools can identify what really happened to prevent reoccurring problems.
It's a good plan to leverage different topologies, including fan-in and fan-out of shared bandwidth ports, as means of aggregating multiple slower server ports to a faster storage port.. For high performance and low latency, time sensitive applications such as OLTP, you would want to provide a pair (or more) of high performance 4 gigabit (Gb) FC or 10 Gb Ethernet for iSCSI or NAS ports to meet performance and availability requirements. Note that it's more than just about bandwidth. Maximize the use of high performance ports by aggregating multiple slower ports and increasing the fan-in ratio -- what some might refer to as oversubscription. The key is to identify what the workload is in terms of not only bandwidth, but I/Os and response time, allocating resources accordingly.
Identify and resolve bottlenecks including port or ISL congestion and storage devices (disk and tape) performance issues. In addition to using faster network interfaces, including 4 Gb FC or 10 Gb Ethernet, another technology that can be used locally or over distance in a campus setting or in a large building is wave division multiplexing (WDM). While normally used and thought of for distance, WDM technology -- including dense WDM (DWDM) and coarse WDM (CWDM) -- can be used over a short distance to maximize utilization of fiber optic cabling.
A technique that can be used in some environments to improve SAN performance and utilization involves keeping local traffic local as much as possible. This means attaching servers when possible to the switch that is attached to the storage being used. Large port count switches and directors can be used to replace multiple smaller switches that are networked together using inter-switch links (ISLs) to improve on locality and improve performance. Avoid consolidating to a single large port count switch or director as that would be a single point of failure.
SAN segmentation, which involves separate logical fabrics that are physically interconnected using a SAN router, can also be used to help isolate traffic to SAN sub-networks (also known as logical SANs or virtual SANs). Segmentation can be used in SAN environments similarly to the way it is used in LAN networking to isolate traffic from different parts of a network to reduce or contain the amount of traffic that traverses across a large network (local or remote).
Keep your technologies, including hardware, software and networks, up to date with software and firmware revisions per the manufacturer's recommendations. Test and simulate before deployment with as much real-world or applicable workload conditions as possible for your environment. Also, use change control management techniques along with change and configuration coverage analysis tools to reduce or eliminate the chance of errors due to incorrect configuration.
If you do not already have a performance and capacity plan in effect, then now is a good time to put one together. If you already have a performance and capacity plan, does it just address how much storage capacity you are using and will need in the future? Does it also consider storage activity (I/O and bandwidth) along with response time and availability? If you are not sure how to put a performance and capacity plan together, check out some of expert advice on SearchStorage.com or Chapter 10 ("Storage Capacity Planning") in my book "Resilient Storage Networks" (Elsevier).
Last, but not least in importance, is to keep availability in mind as there is a direct relationship between availability and performance. If you don't have the availability, how you can have performance and utilization? Likewise, bottlenecks can appear as a result in system instability and downtime. Learn more about I/O performance bottlenecks in the StorageIO group white paper "Data Center I/O Performance Issues and Impacts."
Do you know..
About the author: Greg Schulz is founder and senior analyst with the IT infrastructure analyst and consulting firm StorageIO. Greg is also the author and illustrator of "Resilient Storage Networks" (Elsevier) and has contributed material to "Storage" magazine and other TechTarget venues.