Troubleshooting
Problem
Memory Leak in Mellanox ConnectX-3 Linux Driver Under Heavy Load
Resolving The Problem
Source
RETAIN Tip:H213680
Symptom
The currently available Mellanox ConnectX-3 Linux driver has a memory leak that results in hangs and panics when the system is under heavy network load.
Affected Configurations
The system is configured with one or more of the following IBM Options:
- Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter, Option 00W0053, any model
This tip is not system specific.
This tip is not software specific.
The Mellanox device driver for the Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter is affected.
The Mellanox firmware for the Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter is affected.
The system has the symptom described above.
Solution
For BigInsights or other solutions driving significant network stress upon the Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter's driver, use versions 2.1-1.0.6 driver and 2.30.8000 firmware to avoid an issue with a kernel memory leak.
- Download the Mellanox OFED from the driver link listed below
and install it using the following command
./mlnxofedinstall --all --without-fw-update
--force (This will cause the Mellanox firmware included in
this package not to not be applied since we want to use a different
version. The Firmware will be updated later in the
process.)
- Install the FW per the instructions below based on adapter part number/PSID.
Driver:
Follow the instructions at the following link to install the firmware for your adapter based on the Part Number.
Intelligent Cluster Firmware PN 00W0054PSID IBM1080110023:
System X Firmware (PN 00D9691 PSID
IBM1080111023:
Additional Information
The driver currently recommended for Big Insights solutions has a kernel memory leak. The latest driver posted also has an issue that is not favorable to solutions in high network stress environments. This version of the driver is not available on Fix Central for Intelligent Cluster, but is the recommended code fix combination from the solution performance team and Mellanox.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5096775