db2_hang_detect - Detect, report on, and resolve Db2 database hangs script

Detects possible application hangs on a Db2 database and acts to inform and resolve the situation. db2_hang_detect is a ksh script that contains a package of hang detection tests. The db2_hang_detect script is available only on Linux® and UNIX operating systems.

The db2_hang_detect scriptd is designed to run periodically per node and per instance. Ideally, you run the script as a complement to the high availability (HA) capabilities that are provided by an HA cluster manager.

The db2_hang_detect script is in the sqllib/samples/pd/ directory.

There are four hang detection tests which, depending on the value of the parameters that you choose, run separately or as groups:

instance_level_detect
Detects an instance that is hanging.
latch_level_detect
Detects hangs that are caused by latches.
db_cat_node_fail_monitor
Detects hangs during node recovery.
progress_detect
Detects hangs in agents.

Authorization

Root user authority

Required connection

None

Syntax

db2_hang_detect [DB2INSTANCE] [NN] [DETECTLEVEL] [ACTION] [DBNAME] [NOTIFYADDRESS] [LOOPBACKNAME] [VERBOSE]

Parameters

DB2INSTANCE
Specifies the instance to monitor.
NN partition or node number
Specifies the number of the node or partition to monitor.
DETECTLEVEL test level
Specifies the level of the hang detection test. The following values are possible:
0
Runs none of the tests.
1
Runs instance_level_detect test.
2
Runs instance_level_detect and latch_level_detect tests.
3
Runs instance_level_detect, latch_level_detect, and db_cat_node_fail_monitor tests.
4
Runs instance_level_detect, latch_level_detect, db_cat_node_fail_monitor, and progress_detect tests.
ACTION
Specifies the action that is taken if a hang is detected. The following values are possible:
INFORM
Sends a message that specifies which application is hanging.
FORCE
Stops the application that is hanging by forcing it off the system.
TERMINATE
Stops the application that is hanging by terminating it.
DBNAME
Specifies the name of the database to monitor.
NOTIFYADDRESS
Specifies the email address to which a message is sent if a hang is detected.
LOOPBACKDBNAME
Specifies the name of the database to catalog through a remote connection. If you specify nulldb, no database is cataloged.
VERBOSE
Prints status messages to standard output. The following values are possible:
VERBOSE
Prints the status messages.
NOVERBOSE
Does not print status messages.

Example

In the following example, you search for a potential hang on instance DB2INST1. You specify node 0 and user all four tests. If a hang is detected, you are informed by a message. The database that is being monitored is DB2SAMPL. The message that an application is hanging is sent to the email address user@domain.com. No database is cataloged and no status message is printed.
db2_hang_detect db2inst1 0 4 inform  db2sampl user@domain.com nulldb noverbose

Return codes

The following table lists all possible return codes.

Table 1. db2_hang_detect command return codes
Return code Explanation
0 Indicates that no hang is detected and that the instance is up.
1 Indicates that the instance is offline.
2 Indicates that an environmental issue, for example your system is not set up properly, is preventing a successful execution.
3 and higher Indicates a potential hang.

Usage notes

You must run the script as root. You must also run the script locally on the server where the Db2 instance is located.

Multiple copies of the script cannot be run concurrently.