Automated deadlock data collection

Automated deadlock data collection gathers crucial debug data when a potential deadlock is detected.

Automated deadlock data collection helps gather crucial debug data on detection of a potential deadlock. Messages similar to the following ones are written to the mmfs.log file:

Sat Jul 18 09:52:04.626 2015: [A] Unexpected long waiter detected:
2015-07-18 09:36:58: waiting 905.938 seconds on node c33f2in01:
SharedHashTabFetchHandlerThread 8397: on MsgRecordCondvar,
reason 'RPC wait' for tmMsgTellAcquire1
Sat Jul 18 09:52:04.627 2015: [I] Initiate debug data collection from
this node.
Sat Jul 18 09:52:04.628 2015: [I] Calling User Exit Script
gpfsDebugDataCollection: event deadlockDebugData,
Async command /usr/lpp/mmfs/bin/mmcommon.

What debug data is collected depends on the value of the configuration parameter debugDataControl. The default value is light and a minimum amount of debug data, the data that is most frequently needed to debug a GPFS issue, is collected. The value medium gets more debug data collected. The value heavy is meant to be used routinely by internal test teams only. The value verbose needed only for troubleshooting special cases and can result in very large dumps. No debug data is collected when the value none is specified. You can set different values for the debugDataControl parameter across nodes in the cluster. For more information, see the debugDataControl parameter in the topic mmchconfig command.

Automated deadlock data collection is enabled by default and controlled by the configuration parameter deadlockDataCollectionDailyLimit. This parameter specifies the maximum number of times debug data can be collected in a 24-hour period by automated deadlock data collection.

To view the current value of deadlockDataCollectionDailyLimit, enter the following command:

mmlsconfig deadlockDataCollectionDailyLimit

The system displays output similar to the following:

deadlockDataCollectionDailyLimit 3

To disable automated deadlock data collection, specify a value of 0 for deadlockDataCollectionDailyLimit.

Another configuration parameter, deadlockDataCollectionMinInterval, is used to control the minimum amount of time between consecutive debug data collections. The default is 3600 seconds or 1 hour.