db2_hang_detect - Detect, report on, and resolve Db2 database hangs script
Detects possible application hangs on a Db2 database and acts to inform and resolve the situation. db2_hang_detect is a ksh script that contains a package of hang detection tests. The db2_hang_detect script is available only on Linux® and UNIX operating systems.
The db2_hang_detect scriptd is designed to run periodically per node and per instance. Ideally, you run the script as a complement to the high availability (HA) capabilities that are provided by an HA cluster manager.
The db2_hang_detect script is in the sqllib/samples/pd/ directory.
There are four hang detection tests which, depending on the value of the parameters that you choose, run separately or as groups:
- instance_level_detect
- Detects an instance that is hanging.
- latch_level_detect
- Detects hangs that are caused by latches.
- db_cat_node_fail_monitor
- Detects hangs during node recovery.
- progress_detect
- Detects hangs in agents.
Authorization
Root user authorityRequired connection
NoneSyntax
db2_hang_detect [DB2INSTANCE] [NN] [DETECTLEVEL] [ACTION] [DBNAME] [NOTIFYADDRESS] [LOOPBACKNAME] [VERBOSE]
Parameters
- DB2INSTANCE
- Specifies the instance to monitor.
- NN partition or node number
- Specifies the number of the node or partition to monitor.
- DETECTLEVEL test level
- Specifies the level of the hang detection test. The following
values are possible:
- 0
- Runs none of the tests.
- 1
- Runs instance_level_detect test.
- 2
- Runs instance_level_detect and latch_level_detect tests.
- 3
- Runs instance_level_detect, latch_level_detect, and db_cat_node_fail_monitor tests.
- 4
- Runs instance_level_detect, latch_level_detect, db_cat_node_fail_monitor, and progress_detect tests.
- ACTION
- Specifies the action that is taken if a hang is detected. The
following values are possible:
- INFORM
- Sends a message that specifies which application is hanging.
- FORCE
- Stops the application that is hanging by forcing it off the system.
- TERMINATE
- Stops the application that is hanging by terminating it.
- DBNAME
- Specifies the name of the database to monitor.
- NOTIFYADDRESS
- Specifies the email address to which a message is sent if a hang is detected.
- LOOPBACKDBNAME
- Specifies the name of the database to catalog through a remote connection. If you specify nulldb, no database is cataloged.
- VERBOSE
- Prints status messages to standard output. The following values
are possible:
- VERBOSE
- Prints the status messages.
- NOVERBOSE
- Does not print status messages.
Example
In the following example, you search for a potential hang on instance DB2INST1. You specify node 0 and user all four tests. If a hang is detected, you are informed by a message. The database that is being monitored is DB2SAMPL. The message that an application is hanging is sent to the email address user@domain.com. No database is cataloged and no status message is printed.db2_hang_detect db2inst1 0 4 inform db2sampl user@domain.com nulldb noverbose
Return codes
The following table lists all possible return codes.
Return code | Explanation |
---|---|
0 | Indicates that no hang is detected and that the instance is up. |
1 | Indicates that the instance is offline. |
2 | Indicates that an environmental issue, for example your system is not set up properly, is preventing a successful execution. |
3 and higher | Indicates a potential hang. |
Usage notes
You must run the script as root. You must also run the script locally on the server where the Db2 instance is located.
Multiple copies of the script cannot be run concurrently.