Debug data for deadlocks

Debug data for potential deadlocks is automatically collected. System administrators must monitor and manage the file systems where debug data is stored.

Automated deadlock detection and automated deadlock data collection are enabled by default. Automated deadlock breakup is disabled by default.

At the start of the GPFS daemon, the mmfs.log file shows entries like the following:

Thu Jul 16 18:50:14.097 2015: [I] Enabled automated deadlock detection.
Thu Jul 16 18:50:14.098 2015: [I] Enabled automated deadlock debug data
collection.
Thu Jul 16 18:50:14.099 2015: [I] Enabled automated expel debug data collection.
Thu Jul 16 18:50:14.100 2015: [I] Please see https://ibm.biz/Bd4bNK for more
information on deadlock amelioration.

The short URL points to this help topic to make it easier to find the information later.

By default, debug data is put into the /tmp/mmfs directory, or the directory specified for the dataStructureDump configuration parameter, on each node. Plenty of disk space, typically many GBs, needs to be available. Debug data is not collected when the directory runs out of disk space.

Important: Before you change the value of dataStructureDump, stop the GPFS trace. Otherwise, you lose the GPFS trace data. Restart the GPFS trace afterwards. For more information, see Generating GPFS trace reports.

After a potential deadlock is detected and the relevant debug data is collected, IBM® Service needs to be contacted to report the problem and to upload the debug data. Outdated debug data needs to be removed to make room for new debug data in case a new potential deadlock is detected.

It is the responsibility of system administrators to manage the disk space under the /tmp/mmfs directory or dataStructureDump. They know which set of debug data is still useful.

The "expel debug data" is similar to the "deadlock debug data", but it is collected when a node is expelled from a cluster for no apparent reasons.