You can isolate switch link errors with this procedure.
Ensure that you have eliminated the failures in
Table 1.
To fix symptoms, such as the blue link-LED is not lit, permanent
link errors are reported in error logs, or intermittent link errors
are reported in errors logs, complete the following steps until the
original symptom is resolved.
- Reseat the switch cable at the switch-end.
- At the node-end, check that the InfiniBand host channel
adapter is shown as active (some adapter LEDs) and reseat the cable.
- If the node is down, fix that problem. If only this
adapter is down, it might be guarded or broken.
- Fix any problems, and then recheck for the original
symptom.
- For isolation, replace the InfiniBand cable, and
then recheck for the original symptom.
- If possible, identify another switch-port that is functional
and connect this node-port to it (it must be on the same subnet, preferably
an unused port). Then, disconnect the InfiniBand cable at
the switch-end and plug the cable into the other port.
- If the link works with the new port, then the original
switch-port is bad, and the switch leaf must be replaced.
- If link still does not work, then the port on the InfiniBand adapter
is bad, and the adapter must be replaced.
After
the original symptom is fixed, check or monitor for other link errors.