APAR status
Closed as program error.
Error description
Under certain conditions, the mmfsd daemon grows in size to the point where it is no longer able to allocate memory , and is forced to shut down. The source of this unchecked growth in memory is the lists of saved unacknowledged RPC replies to other nodes in the cluster. One particular case where this has been seen is with an application making fsync() calls with high frequency (>200,000 fsyncs in 30s). The fsync() results in GPFS sending RPCs to every node that has a token for the file being sync'ed, even if it is a read-only token. Those nodes then send a reply to that RPC, but the sender of the RPC does not have a seqno to match to the RPC reply and so the reply is not acknowledged. While there are mechanisms in place to periodically clean up this list of replies, if the list gets too big, too fast, then it is not kept in check and can continue to grow into the 100s of millions before the mmfsd is finally forced to exit. The messages in /var/adm/ras/mmfs.log.latest preceding the shutdown may look like these: [W] ReadMap: Cannot open map file /usr/lpp/mmfs/bin/mmfsd, not enough memory [E] processStart: fork: err 12 [W] ReadMap: Cannot open map file /usr/lpp/mmfs/bin/mmfsd, not enough memory [N] Restarting mmsdrserv [E] processStart: fork: err 12 [E] Cannot allocate memory [X] The mmfs daemon is shutting down abnormally. [N] mmfsd is shutting down. [N] Reason for shutdown: LOGSHUTDOWN called The root cause, the list of unacknowledged replies, can be seen with the 'mmfsadm dump tscomm' command. The entry for one or more connected nodes will have a long list of unacknowledged replies: <c0n3> 10.1.1.3/0 (gpfsnode3) sndbuf 47520 rcvbuf 4194304 authEnabled 1 securityEnabled 0 sameSubnet 1 in_conn 0 need_notify 0 reconnEnabled 1 reconnecting 0 reconnected 0 reconnCheckdup 0 reconnConnecting 0 resending 0 disconnecting 0 shutting 0 idleCount 0 reconnects 0 rdmaConnInProgress 0 rdmaConnDone 0 rdmaVsendEnabled 0 rdmaVsendOkay 0 rdmaCMEnabled 0 n_rw 0 handlerCount 1 inboundCount 0 connRetryCount 0 sentBytes 0 thread 0 sendState initial Messages being serviced pool 14: msg_id 668409134 thread 7601 age 2.120 fileMsgSyncFile ran into a deleted object unacknowledged replies: msg_id 690078333 seq 45551 resent 0 msg_type 1 'fileMsgSyncFile' msg_id 690078357 seq 45552 resent 0 msg_type 1 'fileMsgSyncFile' msg_id 690078366 seq 45553 resent 0 msg_type 1 'fileMsgSyncFile' msg_id 690078372 seq 45554 resent 0 msg_type 1 'fileMsgSyncFile' msg_id 690078386 seq 45555 resent 0 msg_type 1 'fileMsgSyncFile' ...potentially millions of these... Recovery action: If this growth in the lists of unacknowledged replies, and thus the mmfsd process, is observed, you can restart GPFS on either node: the one with the long list, or the one that those replies were sent to, and the list will be cleared. Reported in: Spectrum Scale 4.2.3.4 on RHEL 7
Local fix
If possible, identify the source of the VFS calls in an effort to reduce their frequency. In the case where the message is 'fileMsgSyncFile', those result from fsync() calls. Use tools like 'lsof' to determine which processes have files open in the GPFS filesystems, and then use 'strace -p <pid>' against those running processes to see if they are making the fsync() or other offending VFS calls. By reducing the frequency of the calls that lead to these RPC messages being sent and replied to, you will prevent the list of unacknowledged replies from growing so large, and prevent the mmfsd from having to exit.
Problem summary
When receiving RPCs which are sending to multiply nodes, these RPCs don't have ack_seqno in them, then the saved replies in the saved reply list would increase dramatically if no other RPCs which have ack_seqno acknowledge them, and if in a very fast speed 65535/5 per second, seqno would overflow very soon(uint16), then our saved reply acknowledge method and reply cleanup thread wouldn't work properly, then the saved reply list would become larger and larger, finally, run out of memory.
Problem conclusion
Enhance saved reply acknowledge method to support seqno overflow.
Temporary fix
Comments
APAR Information
APAR number
IJ05921
Reported component name
SPECTRUM SCALE
Reported component ID
5725Q01AP
Reported release
423
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-04-20
Closed date
2018-05-07
Last modified date
2018-05-07
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
IJ06242
Fix information
Fixed component name
SPECTRUM SCALE
Fixed component ID
5725Q01AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"423","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSFKCN","label":"General Parallel File System"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"423","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
07 May 2018