IBM Support

Mellanox mlx5 driver fails to load on Power 9 systems with more than 240 SMT threads

Flashes (Alerts)


Abstract

The Mellanox mlx5 driver can fail to load if the logical partition has 240 SMT threads (30 CPU cores) or more assigned on Power 9 servers. This can also affect doing netboot installation on Power 9 servers.

Content

Linux Releases Affected
		All 
		
IBM Systems Affected
		9008-22L, 9009-22A, 9009-41A, 9009-42A, 9223-22H, 9223-42H
		9040-MR9, 9225-50H
		9080-M9S, 9222-80H
		
Affected Hardware
		PCIe3 2-port 10 Gb NIC & RoCE SR/Cu adapter (FC EC2R and EC2S; CCIN 58FA) 
		PCIe3 2-port 25/10 Gb NIC & RoCE SFP28 adapter (FC EC2T and FC EC2U; CCIN 58FB)
		PCIe3 LP 2-port 100 Gb EDR InfiniBand Adapter (FC EC3E and EC3F; CCIN 2CEA)
		PCIe3 2-port 100 GbE NIC & RoCE QSFP28 Adapter (FC EC3L and EC3M; CCIN 2CEC)
		PCIe3 x16 LP 1-port 100 Gb EDR InfiniBand Adapter (FC EC3T and EC3U; CCIN 2CEB)
		PCIe4 x16 1-Port EDR 100 GB IB ConnectX-5 CAPI Capable Adapter (FC EC62 and EC63; CCIN 2CF1)
		PCIe4 x16 2-Port EDR 100 GB IB ConnectX-5 CAPI Capable Adapter (FC EC64 and EC65; CCIN 2CF2)
		PCIe4 2-port 100 GbE RoCE x16 adapter (FC EC66 and EC67; CCIN 2CF3)

Symptoms 
The Mellanox mlx5 driver can fail to load if the LPAR has 240 logical CPUs (SMT threads) or more assigned on Power 9 servers.
The failure will look like this:

mlx5_cmd_check:731:(pid 413): CREATE_EQ(0x301) op_mod(0x0) failed, status limits exceeded(0x8), syndrome (0x7957c3)

Workaround 
MLNX_OFED version 4.4 located here http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers should be used,
which does not have this issue. 
If performing a netboot install then assign less than 240 logical CPUs to the server to do the installation.
After installation, you can install MLNX_OFED and assign all the CPUs desired to the server. 

Fix Outlook 
Mellanox will submit the upstream solution to kernel community and once accepted then OS Linux releases can pick up the solution.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 September 2022

UID

ibm10726069