APAR status
Closed as program error.
Error description
If NFS is used and FAL is enabled, there will be memory leak and cause OOM finally. The NFS open a file, 64 byte memory will be allocated for the client IP and inserted into a global table to record the source NFS client IP of the NFS operations. But the close of NFS file won't release this memory and thus leads to leaks. From the OOM dump from /var/log/messages, or /proc/slabinfo, most memory is occupied by kmalloc-64 in such a scenario. Reported in: Spectrum Scale 5.1.3.1 Known Impact: memory leaks and cause out of memory Verification steps: Check in /proc/slabinfo, or OOM dump messages from /var/log/messages, to see if most of memory is occupied by kmalloc-64 Recovery action: Disable file audit logging.
Local fix
Disable file audit logging.
Problem summary
With File Audit Logging (FAL) enabled, when kx Ganesha operation op 112 (GET_XSTAT) is being handled, the NFS client ip is malloc'ed and inserted into a table by the current Ganesha thread for use by FAL. The responsibility for freeing the ip is left to close during a close file routine. However, the routine is called by a different thread and not immediately after the kxGanesha op 112 call. This results in the ip remaining in the table and not being freed, leading to memory leaks and subsequent memory exhaustion.
Problem conclusion
This problem is fixed in 5.1.7.1 To see all Spectrum Scale APARs and their respective Fix solutions refer to page: https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Fixed the code so the NFS client ip is not malloc'ed and inserted into the table during the Ganesha operation GET_XSTAT. Work Around: Disable File Audit Logging for the effected file system (for Scale version 5.1.3 to 5.1.6). Restart the mmfs daemon on the CES nodes (for Scale version 5.1.7). Problem trigger: With File Audit Logging and CES NFS enabled, perform GET_XSTAT work loads (e.g, stat, nfs4_getfacl) to files/directories in a NFS Ganesha mount for some period of time until seeing out of memory issues. Symptom: Error output/message Platforms affected: All Linux OS environments Functional Area affected: NFS and File Audit Logging Customer impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ45590
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
512
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-02-24
Closed date
2023-04-13
Last modified date
2023-04-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
13 April 2023