IBM Support

Memory Leak in Mellanox ConnectX-3 Linux Driver Under Heavy Load - IBM System x

Troubleshooting


Problem

Memory Leak in Mellanox ConnectX-3 Linux Driver Under Heavy Load

Resolving The Problem

Source

RETAIN Tip:H213680

Symptom

The currently available Mellanox ConnectX-3 Linux driver has a memory leak that results in hangs and panics when the system is under heavy network load.

Affected Configurations

The system is configured with one or more of the following IBM Options:

This tip is not system specific.

This tip is not software specific.

The Mellanox device driver for the Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter is affected.

The Mellanox firmware for the Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter is affected.

The system has the symptom described above.

Solution

For BigInsights or other solutions driving significant network stress upon the Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter's driver, use versions 2.1-1.0.6 driver and 2.30.8000 firmware to avoid an issue with a kernel memory leak.

  1. Download the Mellanox OFED from the driver link listed below and install it using the following command  ./mlnxofedinstall --all --without-fw-update --force (This will cause the Mellanox firmware included in this package not to not be applied since we want to use a different version.  The Firmware will be updated later in the process.)

  2. Install the FW per the instructions below based on adapter part number/PSID.

Driver:

Follow the instructions at the following link to install the firmware for your adapter based on the Part Number.

Intelligent Cluster Firmware PN 00W0054PSID IBM1080110023:


System X Firmware (PN 00D9691 PSID IBM1080111023:

 

Additional Information

The driver currently recommended for Big Insights solutions has a kernel memory leak. The latest driver posted also has an issue that is not favorable to solutions in high network stress environments. This version of the driver is not available on Fix Central for Intelligent Cluster, but is the recommended code fix combination from the solution performance team and Mellanox.

 

Document Location

Worldwide

Operating System

System x Hardware Options:Operating system independent / None

System x Integrated Solutions:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HWC20","label":"Intelligent Cluster"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"QUOFMQP","label":"System x Hardware Options->Ethernet->10 Gb->00W0053"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5096775