Published: 14 Oct 2002
With the advent of storage area networks (SANs) and higher bandwidths, data analysis tools are struggling to keep pace with the sheer magnitude of data. Operations are taking longer and hardware costs are increasing. The reason? Huge bandwidth increases and the growing complexities of storage networks.
Just a few years ago, bandwidth was measured in MB/s. Today the norm is 2GB/s, with higher bandwidths on the way. Another factor is the inherent complexities of storage networks, which occurred when Fibre Channel (FC) SANs arrived. Devices are farther apart and are connected through many ports, making the problems encountered harder to find. As a result, storage analysis tool vendors are scrambling to kick their products up a notch or two.
In the storage industry, analysis tools exist for developing and maintaining storage devices and networks. The analyzer plugs into a network and records the traffic. Users can view this traffic - known as a trace - in a decoded, understandable format to monitor and research link activity. Analyzers offer many features for examining traces such as searching, traffic filtering, code violation checking and triggering on a specific activity. They're also slowly emerging as invaluable tools for IT staffers to optimize performance and resolve problems.
With the move to FC and serial communications, analyzers had to become more robust, e.g., the analyzer's onboard high-speed memory, disk capacity, CPU speeds and port counts have increased dramatically. Five years ago, a typical SCSI bus analyzer contained 5MB of memory and a 20MB hard drive. Today, a 2Gb/s FC analyzer can have up to 4GB of memory and a 60GB hard drive. That is an 80-fold increase in memory alone.
One reason analyzers have been forced to bulk up can be explained as a simple equation: Link data volume increases proportionately to bandwidth increases over a given period of time. In other words, increasing link speed means more data passes through the network, so there's more data to store and analyze. Another factor is the growing complexities of storage networks. With bus protocols such as the SCSI architecture, problems were confined to short distances and between few devices, making the traces relatively small, and the issues easier to locate. Now with serial communication network storage protocols such as FC SANs and the future iSCSI SANs, the distances are great and the number of devices enormous. As SANs grow with increasing port counts, link lengths and higher bandwidths, the analyzer will be further challenged.
Trace sizes are exploding
Dramatic increases in bandwidth are contributing to the rising volume of link data. For a 2Gb/s link speed running at peak utilization, there's more than 400MB of data transferred across the port each second. At 10Gb/s link speed, there would be more than 2,000MB (or 2GB) of data transferred across the port each second - a five-fold increase - and would require at least 2GB of memory and disk to store. This is leading to an explosion in trace sizes.
To confront this ongoing concern, some vendors such as Ancot, Menlo Park, CA, have begun offering up to 4GB per port, while others, such as I-Tech in Eden Prairie, MN, offer more than 20GB in a chassis that includes 16 ports. With traces becoming so large, the demand on the analyzer's trace memory is challenging practical limits, and simple everyday operations such as trace file management and post processing are becoming slow and cumbersome. For users, this translates into longer debug cycles and decreased productivity.
To handle large traces, vendors are incorporating a number of techniques. To address the trace memory constraints, all analyzers from the major vendors incorporate the well-known pre-capture filter and triggering hardware features. Pre-capture filter enables users to eliminate specific link data from being stored, which elongates the capture time. For example, eliminating all but the first 16 words of payload can increase the capture by 20 times.
The trick is users need to know what data they're not interested in ahead of time. Triggering enables users to eliminate all incoming data until an event they want occurs on the link, and then capture begins. Both of these features address the issue to a certain extent, but require some knowledge of the events before the capture begins. And, there remain cases where intermittent and unpredictable problems require all link traffic be captured. To address this need, vendors are attempting to design ways to combine memory from several analyzers together into one virtual memory space. For example, if you have four individual analyzers with 1GB of trace memory, together, they would appear as one analyzer with 4GB of memory.
In addition, many vendors are improving their analyzer's searching operations. Some vendors utilize the analyzer's hardware logic to assist searching, resulting in operations taking seconds to complete. However, hardware-assisted search engines only work on traces stored in the trace memory, and are somewhat limited to the range of search criteria supported by the hardware. The limitations arise from the number of hardware levels used in making the comparisons. In sophisticated searches exceeding the hardware capability, the software takes over and validates the search, introducing delays. When hardware-assisted search works, the results are impressive.
Filters and expert systems
Many vendors feature post-capture processing such as post-filter and expert system tools, assisting in analyzing the trace. Post-filter allows users to control what's displayed on the screen after the capture. Some vendors are addressing post-filter performance by using hardware-assisted index tables to produce the data virtually on demand. Systems such as Finisar's SANMetrics check the traces for a number of bugs. It also helps users locate problems, but can vary in processing time due to the size of the trace. As traces become larger, post processing and expert system analysis will need to keep pace.
File management is another area of ongoing concern, again due to the size of the traces. Most vendors will tell you not to save or file transfer an entire trace - they are too long. They suggest you partially save the trace and transfer only that section. When you need help in examining an entire trace, you could access the analyzer remotely across the Web either directly, as in the case of Ancot, or via a third-party software package. This clever feature allows a distant user to have complete control over the analyzer, including the view and hardware set up. Users from several different locations of a company can use this to work together to analyze the trace. To do this, users must bypass any security measures the company may have set up. And that's the rub. Many users who need to collaborate on a problem are actually from different companies, and firms often restrict outside users from entering their site.
As the rapid pace of SAN deployment continues, network complexities are escalating. Longer distances, massive port count SAN configuration (exceeding 1,000 ports) and increased switch intelligence are increasing trace sizes and placing additional challenges on the analyzers.
Distances between host and devices on a SAN are growing because of the sheer increase in port count configuration and through the use of MAN routers. Greater distances are an issue because they tend to elongate the trace. The longer it takes for an I/O operation to complete, the more the analyzer needs to capture.
Larger port counts are increasing the size of the trace and are having a profound effect on the design of the analyzers. Most analyzers offer up to 16 ports of simultaneous operation. In order to monitor and control multiple ports simultaneously, the analyzer needs to operate the ports at the same time or events may be missed. Vendors accomplish this by stringing single port hardware together as a group interconnected via a cable or back plane.
A distributed clock is necessary so traffic can be recorded in the same order as seen on the link. Using any other connection - such as Ethernet - would not provide enough precision. This design imposes practical limits to the number of ports that can be simultaneously controlled. The interconnect - cable length and back plane - is constrained by the clock cycle of the link capture speed, and as bandwidth increases, the clock cycle will be shorter. As users need to monitor more ports simultaneously, vendors will need to confront these challenges.
Vendors have been constantly upgrading their viewing software to keep pace with the industry's changes, and many now offer many different formats and displays. Finisar offers a top-down viewer that allows you to see all the ports interleaved one after another in the order they were captured with a detailed view of the highlighted event in another window. I-Tech features a side by side display of all ports also in the order they were captured. Other vendors are making their views more customizable. U.K.-based Xyratex offers a protocol editor that provides greater flexibility in altering the format and display of the data. As analyzers support higher port counts, vendors will need to ensure their displays can handle even more data in an easy-to-view format. Given the amount of space on a standard monitor, this is no easy feat.
SANs complexity is increasing with the advent of emerging protocols, including iSCSI and InfiniBand. These protocols are presenting challenges to the analyzer because they're built upon different physical layers from FC and analyzer vendors based their initial designs on FC. The challenge is to handle more than one protocol in a consistent, unified look and feel. I-Tech offers a seamless approach to analyze FC and iSCSI protocols in a single chassis. Their latest offering allows you to view each of the protocols side by side so you can easily watch data flow from one to the other, in a time synchronized order. Finisar offers support for InfiniBand as well. Their software will control and view all three protocols simultaneously in top-down, time synchronized, intertwined viewing. Other vendors have indicated plans to support other protocols as well.
Higher bandwidths and increasingly complex networks are the future. The FC and Gigabit Ethernet communities are preparing 10Gb/s specifications to meet these demands and the FC is considering an interim 4Gb/s speed. These changes will necessitate new and increasingly more sophisticated analyzers. It's likely analyzer tools will take on the shape of the networks they monitor: more ports, higher speeds and more complexity.