Automated deadlock breakup

Automated deadlock breakup helps resolve a deadlock situation without human intervention. To break up a deadlock, less disruptive actions are tried first; for example, causing a file system panic. If necessary, more disruptive actions are then taken; for example, shutting down a GPFS™ mmfsd daemon.

If a system administrator prefers to control the deadlock breakup process, the deadlockDetected callback can be used to notify system administrators that a potential deadlock was detected. The information from the mmdiag --deadlock section can then be used to help determine what steps to take to resolve the deadlock.

Automated deadlock breakup is disabled by default and controlled with the mmchconfig attribute deadlockBreakupDelay. The deadlockBreakupDelay attribute specifies how long to wait after a deadlock is detected before attempting to break up the deadlock. Enough time must be provided to allow the debug data collection to complete. To view the current breakup delay, enter the following command:

mmlsconfig deadlockBreakupDelay

The system displays output similar to the following:

deadlockBreakupDelay 0

The value of 0 shows that automated deadlock breakup is disabled. To enable automated deadlock breakup, specify a positive value for deadlockBreakupDelay. If automated deadlock breakup is to be enabled, a delay of 300 seconds or longer is recommended.

Automated deadlock breakup is done on a node-by-node basis. If automated deadlock breakup is enabled, the breakup process is started when the suspected deadlock waiter is detected on a node. The process first waits for the deadlockBreakupDelay, and then goes through various phases until the deadlock waiters disappear. There is no central coordination on the deadlock breakup, so the time to take deadlock breakup actions may be different on each node. Breaking up a deadlock waiter on one node can cause some deadlock waiters on other nodes to disappear, so no breakup actions need to be taken on those other nodes.

If a suspected deadlock waiter disappears while waiting for the deadlockBreakupDelay, the automated deadlock breakup process stops immediately without taking any further action. To lessen the number of breakup actions that are taken in response to detecting a false-positive deadlock, increase the deadlockBreakupDelay. If you decide to increase the deadlockBreakupDelay, a deadlock can potentially exist for a longer period.

If your goal is to break up a deadlock as soon as possible, and your workload can afford an interruption at any time, then enable automated deadlock breakup from the beginning. Otherwise, keep automated deadlock breakup disabled to avoid unexpected interruptions to your workload. In this case, you can choose to break the deadlock manually, or use the function that is described in the Deadlock breakup on demand topic.

Due to the complexity of the GPFS code, asserts or segmentation faults might happen during a deadlock breakup action. That might cause unwanted disruptions to a customer workload still running normally on the cluster. A good reason to use deadlock breakup on demand is to not disturb a partially working cluster until it is safe to do so. Try not to break up a suspected deadlock prematurely to avoid unnecessary disruptions. If automated deadlock breakup is enabled all of the time, it is good to set deadlockBreakupDelay to a large value such as 3600 seconds. If using mmcommon breakDeadlock, it is better to wait until the longest deadlock waiter is an hour or longer. Much shorter times can be used if a customer prefers fast action in breaking a deadlock over assurance that a deadlock is real.

The following messages, related to deadlock breakup, might be found in the mmfs.log files:

[I] Enabled automated deadlock breakup.

[N] Deadlock breakup: starting in 300 seconds

[N] Deadlock breakup: aborting RPC on 1 pending nodes.

[N] Deadlock breakup: panicking fs fs1

[N] Deadlock breakup: shutting down this node.

[N] Deadlock breakup: the process has ended.