This is fourth of a five-part series on enterprise data storage management tools. In the third part, we looked at how thin provisioning and virtualization aid data storage configuration.
Enterprise data storage performance monitoring and troubleshooting may be as much art as science, yet there are plenty of tools that aim to help administrators fine tune storage and pinpoint storage bottlenecks before their application slows to a crawl.
The popularity of virtual server technology has heightened the importance of storage performance tools, especially products that can monitor the oft-changing environment from the host servers to the network and the storage arrays.
"If you're only looking at performance in one area and ignoring the other two, then that's a problem," said Bob Laliberte, an analyst at Enterprise Strategy Group. "For performance, we're not just looking at the storage but how that also is impacted along that whole data path."
Even if storage isn't the cause of a performance bottleneck, accusatory fingers often point in that direction, so it helps to have one or more tools that seek to identify the problem and/or determine the root cause. A growing numbers of tools provide alerts when a troublesome level of memory or CPU usage is reached or a policy is violated. Some of the more advanced offerings can even move data.
Tools that specialize in performance monitoring include Akorri Network Inc.'s BalancePoint, Tek-Tools Inc.'s Profiler and Virtual Instruments Corp.'s NetWisdom. The general trend toward agent-free software is reducing the installation and maintenance burden on IT. NetWisdom stands out with its hardware-based taps and probes for collecting information.
But, Ryan Perkowski, manager of storage operations at a large financial institution, which he asked not be named, said NetWisdom's tap-/probe-based system is far less burdensome than agents, since firmware updates are infrequent. Also, the taps and probes operate out of band and don't eat up precious server CPU or RAM, he noted.
Perkowski said he was hesitant to set his NetApp Inc. SANscreen tool to collect information any faster than once every 15 minutes for fear of bogging down a switch with too many questions. Because NetWisdom probes out of band, the storage team can pound the devices with questions without affecting performance, he said.
"The servers are unaware that they're being eavesdropped on," Perkowski said.
His team installed 96 hardware taps, which he likened to cable TV splitters, and 24 appliance-like probes on its most critical databases. Perkowski said he monitors 3,135 server-to-LUN conversations and can get up to 86 metrics, including performance data, on each one on a second-by-second basis.
"I never look at that amount of data, but it's always at your fingertips," he said, adding that he can non-disruptively shuffle a probe from one trouble spot to another, as necessary, and set the system to alert him of any troubling metrics.
More commonly, when an application's performance degrades, an IT organization starts it troubleshooting quest with built-in application tools, such as those in Microsoft Corp.'s Exchange Server and SQL Server or Oracle Corp.'s database servers. An administrator might move on to operating system utilities such as Perfmon for Windows and iostat or sar in a Unix or Linux environment.
Users of storage resource management (SRM) software consult the tool's performance monitoring features. Prominent products include CA Inc.'s Storage Resource Manager, EMC's Ionix ControlCenter, HP's Storage Essentials, IBM Corp.'s Tivoli Storage Productivity Center (TPC), NetApp Inc.'s SANscreen and Symantec Corp.'s Veritas CommandCentral Storage.
Some SRM tools have special modules devoted to performance. EMC, for instance, sells Performance Manager for full SAN monitoring and Symmetrix Performance Analyzer to zone in on Symmetrix environments.
If the SRM tools don't provide answers, administrators can drill down into a storage device's element manager, such as EMC Corp.'s Symmetrix Management Console or Hewlett-Packard Co.'s EVA Command View. Some IT departments opt for homegrown tools, writing scripts to collect inventory and performance data.
There are also SAN monitoring tools, such as Brocade Communications Systems Inc.'s Data Center Fabric Manager, Dell Inc.'s EqualLogic SAN HQ and NetApp Inc.'s SANscreen.
The key metrics that performance tools track include response time, queue depth, average disk queue length, average I/O size in kilobytes, IOPS (reads and writes; random and sequential; average of overall IOPS), throughput in megabytes per second, write percentage vs. read percentage and capacity (free, used and reserve).
"Much of the performance tools have a capacity element to them," said Jeff Boles, a senior analyst at Taneja Group, "so you can see your capacity trending over time as well."
Some IT organizations cobble together a mix of commercial and free or homegrown tools to get a more complete picture. Ed Delgado, the storage architect at RiskMetrics Group Inc., said his primary monitoring tool, Tek-Tools Profiler, to collects metrics on both performance and capacity. He also has free tools from Nagios Enterprises LLC and OpenNMS Group for file-system monitoring.
The final part of the series looks at disaster recovery monitoring.
This was first published in October 2009