Troubleshooting
Problem
A Mellanox InfiniBand Host Channel Adapter (HCA) may show 'LinkUp' but not show the state as 'ACTIVE' as seen in the following 'ibstatus' example: [root@n138 approximately]# ibstatus mlx4_0:1 InfiniBand device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:5cf3:fc30:0005:2490 base lid: 0x5d sm lid: 0x1 state: 4: INIT phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand
Resolving The Problem
Source
RETAIN tip: H207057
Symptom
A Mellanox InfiniBand Host Channel Adapter (HCA) may show 'LinkUp' but not show the state as 'ACTIVE' as seen in the following 'ibstatus' example:
| [root@n138 approximately]# ibstatus
mlx4_0:1 InfiniBand device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:5cf3:fc30:0005:2490 base lid: 0x5d sm lid: 0x1 state: 4: INIT phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand |
Affected configurations
The system may be any of the following IBM servers:
- IBM System Cluster 1350, type 0445, any model
- IBM System Cluster 1350, type 0448, any model
- IBM System Cluster 1350, type 1410, any model
- IBM System Cluster 1350, type 4667, any model
- IBM System Cluster 1350, type 4668, any model
- IBM System Cluster 1350, type 4669, any model
- IBM System Cluster 1350, type 4670, any model
The system is configured with at least one of the following:
- Red Hat Enterprise Linux 6, any update
- SUSE Linux Enterprise Server 11, any service pack
The system is configured with one or more of the following IBM Options:
- Mellanox ConnectX-3 Dual Port QDR/FDR10 Mezz Card, Option part
number 90Y6338, any Replacement part number (CRU)
Note: This does not imply that the network operating system will work under all combinations of hardware and software.
Please see the compatibility page for more information: http://www.ibm.com/systems/info/x86servers/serverproven/compat/us/
Solution
This behavior will be corrected in a future release of Mellanox OpenFabrics Enterprise Distribution (OFED).
The target date for this release is scheduled for second quarter 2013.
The file is or will be available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:
Workaround
Rebooting the system will allow the link to come up and the port state to go active.
Additional information
The combination of 'LinkUp' and not 'ACTIVE' normally indicates the subnet manager is not running even though the subnet manager is running.
Occasionally the subnet manager does not bring a node into the fabric. Toggling the link and OFED stack by restarting the node prompts the subnet manager to add the node into the fabric.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5091973