This article can also be found in the Premium Editorial Download "Storage magazine: Are your data storage costs too high?."
Download it now to read this article plus other related content.
Congestion control is yet another responsibility of the FC2 layer. While engaged in a Class2 conversation, the communicating end points will have acknowledgments (ACKs) flowing between them for every frame sent by the initiator. If any of these ACK frames are dropped or delayed beyond error detect timeout value (E_D_TOV), the associated sequence and/or exchange will be terminated and re-sent. In Class3 conversations, ACKs aren't sent. Instead, only receiver ready (R_RDY) is sent back from the target to the initiator to indicate the target's receive buffer was cleared and is ready for another frame. If this frame is lost or corrupted, the initiator won't be allowed to send another frame until either a link credit reset (LCR) frame has been sent to the target, or the ULP aborts and resends the sequence.
There are quite a bit of management frames on the link besides commands and data. And although you can probe the connecting ports to view the error counters associated with these management frames, you won't be able to determine what protocol errors are occurring without a FC analyzer. By placing a FC analyzer on the data path between the originator and responder of an exchange, you can capture a finite amount of frames--depending on the amount of memory in the analyzer--destined for the target FC node. As data flows into the analyzer's memory, real-time performance data can be graphically displayed with counters for such protocol errors as malformed frames, elapsed timers,
Although the analyzer is placed between the two endpoints to make a copy of the frames into its buffer, at no time should the analyzer retime or modify the captured frames in any way. However, the optical signal will be amplified as it's retransmitted out of the analyzers transmit port, thereby altering any test data related to distance. Data captured at this layer can be viewed directly from memory or exported to a file to be viewed by your vendor's support team.
The common services at the FC3 layer can best be understood by looking at them as you would a set of daemon processes in Unix. Whether telnetd, named or routed, these deamon processes have a specific set of tasks to perform in your IP network. Thus, depending on the problem you're experiencing, you'll focus your efforts on the process responsible for facilitating that service.
The same is true in FC networks. For example, the fabric login server (FFFFFE) is responsible for facilitating the login of a port entering the fabric. During this conversation, the port will attempt to log into the fabric indicating the class of services it would like to communicate on the fabric with, its line speed, and hardware revision numbers. This process can be likened to the establishment of a line of communication between an IP host and a telnetd process on another host in the network.
The same similarities can be applied to named and the nameserver (FFFFFC), as well as routed and the fabric controller (FFFFFD).
So your method should be: Formulate a hypothesis, associate the failure to a particular service in the fabric and then follow that lead by uncovering the configuration related to that service. As an example, suppose you're trying to connect a new host to your fabric and it doesn't seem to want to show up as connected in your status display. After checking your LED status and ensuring that you have a good physical connection, mentally trace through the steps and services that a connecting node must go through to be visible on a FC network. In this example, it could be that the capabilities of the host's HBA aren't in compliance with the capabilities of the fabric when compared by the fabric login server. And thus a successful login won't be possible until the capabilities are matched, probably through a firmware or even hardware upgrade.
Upper layer protocols
The FC4 layer is responsible for mapping ULP information units onto the FC transport. Each protocol specification is responsible for defining how its command, data and status blocks will be mapped onto the FC network using information categories with defined formats. These protocol mappings usually appear in the guise of device drivers and firmware on the originating host or communicating target. Therefore, in the event should you find yourself chasing down an FC4 layer problem, your attention will be focused on combining the right mix of device drivers and firmware revisions in your SAN.
However, to arrive at the conclusion that it's indeed the FC4 layer that needs your attention, you must rule out the lower layers as possible problem areas. Because vendors aren't typically willing to open code or post a revision for every FC4 layer problem being experienced in the field by a user, it's important to ensure that the lower layers aren't suspect, and that when you call your vendors for support, you have sound technical reasoning behind your conclusion that the problems you're experiencing are related to the FC4 layer.
Documenting the common errors in your SAN is also a good idea. Not only will this serve as a valuable knowledge base for your support staff, and will one day be input into event monitoring and self-healing software, it should also serve you well when negotiating your support contracts with your vendors.
This was first published in December 2002