Delays and deadlocks
The first item to check when a file system appears hung is the condition of the networks including the network used to access the disks.
Look for increasing numbers of dropped packets on all nodes by
issuing:
- The netstat -D command on an AIX® node.
- The ifconfig interfacename command, where interfacename is the name of the interface being used by GPFS for communication.
If file system processes appear to stop making progress, there may be a system resource problem or an internal deadlock within GPFS.
Note: A deadlock can occur if user exit scripts that will be called by the
mmaddcallback facility are placed in a GPFS file system. The scripts should be placed in a local file system, so that
the scripts are accessible even when the networks fail.
To debug a deadlock, do the following:
- Check
how full your file system is by issuing the mmdf command.
If the mmdf command does not respond, contact
the IBM® Support Center. Otherwise,
the system displays information similar to:
disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 1.1 TB) dm2 140095488 1 yes yes 136434304 ( 97%) 278232 ( 0%) dm4 140095488 1 yes yes 136318016 ( 97%) 287442 ( 0%) dm5 140095488 4000 yes yes 133382400 ( 95%) 386018 ( 0%) dm0nsd 140095488 4005 yes yes 134701696 ( 96%) 456188 ( 0%) dm1nsd 140095488 4006 yes yes 133650560 ( 95%) 492698 ( 0%) dm15 140095488 4006 yes yes 140093376 (100%) 62 ( 0%) ------------- -------------------- ------------------- (pool total) 840572928 814580352 ( 97%) 1900640 ( 0%) ============= ==================== =================== (total) 840572928 814580352 ( 97%) 1900640 ( 0%) Inode Information ----------------- Number of used inodes: 4244 Number of free inodes: 157036 Number of allocated inodes: 161280 Maximum number of inodes: 512000
GPFS operations that involve allocation of data and metadata blocks (that is, file creation and writes) will slow down significantly if the number of free blocks drops below 5% of the total number. Free up some space by deleting some files or snapshots (keeping in mind that deleting a file will not necessarily result in any disk space being freed up when snapshots are present). Another possible cause of a performance loss is the lack of free inodes. Issue the mmchfs command to increase the number of inodes for the file system so there is at least a minimum of 5% free. If the file system is approaching these limits, you may notice the following error messages:- 6027-533 [W]
- Inode space inodeSpace in file system fileSystem is approaching the limit for the maximum number of inodes.
- operating system error log entry
- Jul 19 12:51:49 node1 mmfs: Error=MMFS_SYSTEM_WARNING, ID=0x4DC797C6, Tag=3690419: File system warning. Volume fs1. Reason: File system fs1 is approaching the limit for the maximum number of inodes/files.
- If
automated deadlock detection and deadlock data collection are enabled,
look in the latest GPFS log
file to determine if the system detected the deadlock and collected
the appropriate debug data. Look in /var/adm/ras/mmfs.log.latest for
messages similar to the following:
This example shows that deadlock debug data was automatically collected in /tmp/mmfs. If deadlock debug data was not automatically collected, it would need to be manually collected.Thu Feb 13 14:58:09.524 2014: [A] Deadlock detected: 2014-02-13 14:52:59: waiting 309.888 seconds on node p7fbn12: SyncHandlerThread 65327: on LkObjCondvar, reason 'waiting for RO lock' Thu Feb 13 14:58:09.525 2014: [I] Forwarding debug data collection request to cluster manager p7fbn11 of cluster cluster1.gpfs.net Thu Feb 13 14:58:09.524 2014: [I] Calling User Exit Script gpfsDebugDataCollection: event deadlockDebugData, Async command /usr/lpp/mmfs/bin/mmcommon. Thu Feb 13 14:58:10.625 2014: [N] sdrServ: Received deadlock notification from 192.168.117.21 Thu Feb 13 14:58:10.626 2014: [N] GPFS will attempt to collect debug data on this node. mmtrace: move /tmp/mmfs/lxtrace.trc.p7fbn12.recycle.cpu0 /tmp/mmfs/trcfile.140213.14.58.10.deadlock.p7fbn12.recycle.cpu0 mmtrace: formatting /tmp/mmfs/trcfile.140213.14.58.10.deadlock.p7fbn12.recycle to /tmp/mmfs/trcrpt.140213.14.58.10.deadlock.p7fbn12.gz
To determine which nodes have the longest waiting threads, issue this command on each node:
/usr/lpp/mmfs/bin/mmdiag --waiters waitTimeInSeconds
For all nodes that have threads waiting longer than waitTimeInSeconds seconds, issue:mmfsadm dump all
Notes:- Each node can potentially dump more than 200 MB of data.
- Run the mmfsadm dump all command only on nodes that you are sure the threads are really hung. An mmfsadm dump all command can follow pointers that are changing and cause the node to crash.
- If the deadlock situation cannot be corrected, follow the instructions in Additional information to collect for delays and deadlocks, then contact the IBM Support Center.