APAR status
Closed as program error.
Error description
After node updated from 5.0.5 to 5.1.2, without disable message queue. mmhealth showed:auditp_msgq_unsupported mmfsd threads keep increasing till crash due to failed to allocate memory. # ps -eT | grep 'pidof mmfsd' | awk '{print $5}' | sort | uniq -c 349 mmfsd 81 rdk:broker-1 243 rdk:broker10000 81 rdk:main A few hours later. Whatever the rdk:broker10000 thread is growing. # ps -eT | grep 'pidof mmfsd' | awk '{print $5}' | sort | uniq -c 373 mmfsd 666 rdk:broker-1 1998 rdk:broker10000 666 rdk:main Created in nodes updated from 5.0.5 to 5.1.2.4 or 5.1.3.1
Local fix
Problem summary
A problem was identified when running in a mixed level cluster where some nodes support msgqueue and others do not. Excessive librdkafka threads will be created for each IO event on the 5.1.2+ nodes resulting in thread exhaustion for that particular node.
Problem conclusion
This problem is fixed in 5.1.2 PTF 6 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: The fix will prevent these librdkafka threads from being created and will notify the user that msgqueue is unsupported. The 5.1.2+ node will be unable to generate audit events until all nodes are upgraded and moved off of the deprecated msgqueue infrastructure. The nodes <5.1.2 level will continue to generate auditing events. Work around: None Problem trigger: Running a cluster where msgqueue is supported. Upgrading a node to 5.1.2+ where msgqueue is no longer supported. Running IO to the 5.1.2+ node. Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters Platforms affected: ALL Linux OS environments supported by Clustered Watch Folder / File Audit Logging Functional Area affected: Watch Folder / File audit logging Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ40726
Reported component name
SPEC SCALE ADV
Reported component ID
5737F35AP
Reported release
512
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-06-17
Closed date
2022-06-30
Last modified date
2022-06-30
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE ADV
Fixed component ID
5737F35AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
01 July 2022