IBM Support

IBM Spectrum Scale : Node crashes after upgrading to V5.1.2+ with msgqueue enabled

Troubleshooting


Problem

A node crash can occur due to resource exhaustion after upgrading to V5.1.2.0 or higher with msgqueue enabled. This affects all upgraded nodes until the filesystem version is moved to V5.1.2.0+ (mmchfs -V) and the migration off msgqueue occurs (mmmsgqueue config --remove-msgqueue).

Symptom

This typically manifests itself with an out of memory node crash. One can tell if excessive threads are created with the following command:
ps -eT | grep `pidof mmfsd` | awk '{print $5}' | sort | uniq -c
As IO runs, you see rdk:broker* threads continually increase.

Cause

msgqueue has been deprecated since 5.1.0.0 but is still allowed to be used if it was previously installed. A migration path is provided with the mmmsgqueue config --remove-msgqueue command starting in 5.1.0.0. As of 5.1.2.0, msgqueue is disabled and customers wanting to run Clustered Watch Folder or File Audit Logging have to migrate off of msgqueue before any event logs are generated. 
When an IO event happens, there is a check to see whether msgqueue is still in use. If so, a system health event is generated indicating so: 
auditp_msgq_unsupported
watchfolderp_msgq_unsupported
Once it is detected that msgqueue is still in use, we prevent the audit event from being created. However, by this time, a librdkafka thread was already created and passed to the librdkafka library. A new thread will be created for each IO event and will create excessive threads and memory consumption.

Environment

IBM Spectrum Scale V5.1.2.0 through V5.1.2.5
IBM Spectrum Scale V5.1.3.0 through V5.1.4.0

Diagnosing The Problem

This issue will affect all versions mentioned above until msgqueue is disabled across the cluster.

Resolving The Problem

Users running IBM Spectrum Scale V5.1.2.0 to V5.1.2.5 code levels should upgrade to IBM Spectrum Scale V5.1.2.6 or later available from Fix Central:
Users running IBM Spectrum Scale V5.1.3.0 to V5.1.4.0 code levels should upgrade to IBM Spectrum Scale V5.1.4.1 or later available from Fix Central:
If you cannot apply the above code level, contact IBM service to request an efix:
 
• For Spectrum Scale V5.1.2.0 to V5.1.2.5: APAR IJ40726
• For Spectrum Scale 5.1.3.0 to V5.1.4.0:  APAR IJ40950

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"ARM Category":[{"code":"a8m3p000000hAkCAAU","label":"FAL"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"5.1.2;5.1.3"}]

Document Information

Modified date:
11 October 2022

UID

ibm16824149