Troubleshooting link errors

IBM SAN b-type directors and switches use the latest high bandwidth Fibre Channel technology and auto-negotiate to 16 Gbps, 8 Gbps, 4 Gbps, or 2 Gbps based on the link data rate capability of the attached transceiver and the speed supported by the switches and directors. Negotiation to 1 Gbps is not supported unless 4 Gbps FC transceivers are used. As the 8 and 16 Gbps channel is more sensitive to the condition of the existing multimode and single mode cable plant, it is very important to minimize connector reflections and maintain an acceptable link loss budget.

This section provides link troubleshooting advice on fault isolation and provides guidance in the following areas:
  • Dust and dirt contamination
  • Link loss
  • Attenuation on LWL connections

Fault isolation

Since a job loss issue can be caused by a variety of problems, it is important to employ a systematic fault isolation process to remedy the issue. Note that job losses do not necessarily result from link errors. They may also be due to:
  • Configuration issues
  • Networking overload
  • Failures on storage device, switch, or server

Assume for these procedures that the observed errors originate from link errors and are not the result of configuration issues, network overload or network equipment failures.

Whenever CRC errors are discovered on a particular link, it is easy to jump to the conclusion that the link is causing the network issue. This might not be the case. Since CRC errors are just symptoms of a link issue, we need to trace the propagated error to where it originated.

Figure 1 shows a simplified network involving a server, a switch, and a storage device. In this example, assume that the server experienced an error at port 1. This observable error can potentially originate from links 1, 2, 3 or 4 and/or SFP 1, 2, 3 or 4.

Figure 1. Identifying the origin of failure
Identifying the origin of failure

To determine the original failing link, the observable CRC error needs to be tracked back to the first occurrence of the CRC error. By following this process, it is discovered in this example that CRC errors observed in link 4 were propagated from link 3, which in turn originated from link 2.

Once the original failing link (link 2) has been determined, the two connecting ports of that link need to be checked for the following errors:
  • Encoder errors
  • Disparity errors
  • Invalid transmission words
The port that displays any of the above errors is the cause of the link issue, which can be caused by dust or dirt in the connectors or fiber, an insufficient link loss budget, and/or incompatible SFPs.

Dust, dirt, or other contaminants

One of the most common optical link problems is caused by dust, dirt, or oil in the connectors and fiber. 8 and 16 Gbps links are more prone to such issues while lower link data rates, such as 1, 2, or 4 Gbps may be unaffected.

Once the failing port has been identified by following the above fault isolation process, the receive power of the transceiver sitting in that port needs to be determined. An abnormally low receive power usually means that the physical link is dirty.

The receive power can be checked by querying the SFP diagnostics data via the command line interface. This information will provide a rough gauge whether the receive power is abnormally lower than the minimum receive specification of the transceiver. It is also prudent to compare this receive power with those of neighboring transceivers.

For better accuracy, it is advisable to use a power meter to measure the actual receive power of the link. If you are experiencing excessive bit errors and the receive power of the transceiver is abnormally low, it is recommended that you:
  • Re-seat the transceivers for the failing link
  • Clean the connector and optical fiber
Most link issues are solved by completing these steps.

Best practices for minimizing link loss

The "link margin" or the "power budget" of the link is a measure of signal power gain or loss expressed in decibels (dB). Maintaining a healthy link budget is critical to establishing a reliable and stable network.

Follow these best practices for minimizing link loss:
  • Stay well within the maximum cable distance calculated for the link.
  • Apply typical or worst-case values during loss calculations.
  • Use the highest grade cabling components for the application to be supported.
  • Match the cable type with the wavelength, bandwidth, and distance to be supported; do not mix cable types within a link.
  • Inspect loss ratings of all cabling components during the selection process.
  • Record loss measurements for horizontal and vertical cable runs during installation.
  • Become familiar with how to quickly determine the link budget and link loss of selected sections of the cabling.
  • Account for power loss associated with future repairs and expansion.
  • Do not stress the cables.
  • Prototype a link with anticipated maximum cable distance and selected components—and then take measurements to calculate the actual link loss

Attenuation on LWL connections

In the datacenter environment, there may be 8 Gbps or 4 Gbps LWL transceivers that are connected to 2 Gbps LWL transceivers using single-mode fiber over short distances. Such connections need to be optically engineered because there is a possibility that the transmit power of the 8 Gbps / 4 Gbps LWL transceivers may saturate the receiver of 2 Gbps LWL transceivers and cause CRC errors.

This discussion does not apply to 16 Gbps transceivers since connections between 16 Gbps and 2 Gbps transceivers is not supported.
Refer to Table 1 for the typical specifications of maximum transmit and receive power of LWL transceivers. This information is also available in the information technology industry standard "Fibre Channel – Physical Interface-4 (FC-PI-4)" document.
Table 1. Specifications of LWL 10km transceivers
LWL SFP/SFP+ 2GFC 4GFC 8GFC
Power, Tx (max) dB -3 -1 +0.5
Power, Rx (max) dB 0 or -3 1 N/A N/A
1 The maximum receive power specifications of some 2 Gbps LWL 10km transceivers can vary from 0dB to -3dB. However, most 2 Gbps 10km transceivers specify a maximum receive power of 0dB.

It is important to check the maximum receive power of the 2 Gbps LWL transceiver in the manufacturer's datasheet.

Without taking into account connector and fiber losses, the transmit power of 8 Gbps / 4 Gbps LWL transceivers may be observed to overdrive the -3dB maximum receive power of 2 Gbps LWL transceivers. If the maximum receive power of the 2 Gbps LWL transceiver is 0dB, a 4 Gbps LWL transceiver with a -1dB transmit power will not overdrive the 2 Gbps transceiver. Although some 8 Gbps LWL transceivers may reduce their transmit power to 4 Gbps levels, this may still overdrive a 2 Gbps LWL transceiver.

Best Practice for LWL connections – Optically engineer a long-distance connection

It is always a good practice to optically engineer a long-distance connection. Most link issues caused by SFP incompatibility can be solved either by use of 4 Gbps LWL transceivers or use of 8 Gbps LWL transceivers that employ rate select.

2G LWL SFP maximum receive power

The IBM SAN b-type 8 Gbps and 16 Gbps directors and switches use the latest high bandwidth Fibre Channel technology and auto-negotiate to 16 Gbps, 8 Gbps, 4 Gbps, or 2 Gbps based on the link data rate capability of the attached transceiver. Negotiation to 1 Gbps is not supported. Since 8 and 16 Gbps equipment is more sensitive, the existing cable plant may require additional attention to cable care after upgrading from 2 or 4 Gbps fibre. In a few cases, the Tx power of the switch can be higher than the connected equipment. In nearly all of those cases, there is enough attenuation in the existing cable plant so as not to require additional attenuation. The common 2 Gbps SFP Rx maximum power levels are listed in Table 2 and can be used as a quick method to relieve concerns of over saturation.

Table 2. Maximum receive power of 2 Gbps LWL SFPs
Vendor Part number Description Max Receive Power (dB)
Avago AFCT-57M5ATPZ 2 Gbps 10 km SFP -3 dB
Finisar FTLF1319P1xTL 2 Gbps 10 km SFP 0 dB
Finisar FTRJ1319P1xTL 2 Gbps 10 km SFP 0 dB
JDSU JSH-12L1DD1 2 Gbps 10 km SFP 1 dB
Hitachi Cable HTR6517 2 Gbps 10 km SFP -3 dB
Optoway SPS-9110FG 2 Gbps 10 km SFP -3 dB
Optoway SPS-9110AFG 2 Gbps 10 km SFP -3 dB
JDSU JSH-21L3AR3 2 Gbps 10 km SFP 1 dB
E20 ES212-LP3TA 2 Gbps 10 km SFP -3 dB