IBM Support

Link up but not 'ACTIVE' - Mellanox HCA - IBM System Cluster 1350

Troubleshooting


Problem

A Mellanox InfiniBand Host Channel Adapter (HCA) may show 'LinkUp' but not show the state as 'ACTIVE' as seen in the following 'ibstatus' example: [root@n138 approximately]# ibstatus mlx4_0:1 InfiniBand device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:5cf3:fc30:0005:2490 base lid: 0x5d sm lid: 0x1 state: 4: INIT phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand

Resolving The Problem

Source

RETAIN tip: H207057

Symptom

A Mellanox InfiniBand Host Channel Adapter (HCA) may show 'LinkUp' but not show the state as 'ACTIVE' as seen in the following 'ibstatus' example:

  [root@n138 approximately]# ibstatus mlx4_0:1
InfiniBand device 'mlx4_0' port 1 status:
default gid:
fe80:0000:0000:0000:5cf3:fc30:0005:2490
base lid: 0x5d
sm lid: 0x1
state: 4: INIT
phys state: 5: LinkUp
rate: 56 Gb/sec (4X FDR)
link_layer: InfiniBand

Affected configurations

The system may be any of the following IBM servers:

  • IBM System Cluster 1350, type 0445, any model
  • IBM System Cluster 1350, type 0448, any model
  • IBM System Cluster 1350, type 1410, any model
  • IBM System Cluster 1350, type 4667, any model
  • IBM System Cluster 1350, type 4668, any model
  • IBM System Cluster 1350, type 4669, any model
  • IBM System Cluster 1350, type 4670, any model

The system is configured with at least one of the following:

  • Red Hat Enterprise Linux 6, any update
  • SUSE Linux Enterprise Server 11, any service pack

The system is configured with one or more of the following IBM Options:

  • Mellanox ConnectX-3 Dual Port QDR/FDR10 Mezz Card, Option part number 90Y6338, any Replacement part number (CRU)

Note: This does not imply that the network operating system will work under all combinations of hardware and software.

Please see the compatibility page for more information: http://www.ibm.com/systems/info/x86servers/serverproven/compat/us/

Solution

This behavior will be corrected in a future release of Mellanox OpenFabrics Enterprise Distribution (OFED).

The target date for this release is scheduled for second quarter 2013.

The file is or will be available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:

Workaround

Rebooting the system will allow the link to come up and the port state to go active.

Additional information

The combination of 'LinkUp' and not 'ACTIVE' normally indicates the subnet manager is not running even though the subnet manager is running.

Occasionally the subnet manager does not bring a node into the fabric. Toggling the link and OFED stack by restarting the node prompts the subnet manager to add the node into the fabric.

Document Location

Worldwide

Operating System

System x Hardware Options:SUSE Linux Enterprise Server 11

System x Integrated Solutions:SUSE Linux Enterprise Server 11

System x Hardware Options:Red Hat Enterprise Linux 6

System x Integrated Solutions:Red Hat Enterprise Linux 6

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QUOEARX","label":"System x Hardware Options->Ethernet->10 Gb->90Y6338"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20W","label":"eServer Cluster 1350"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5091973