db2_hang_analyze - Detect and analyze possible application hangs script
Detects and analyzes possible Db2® database application hangs by using various metrics that are gathered from the db2pd command. The db2_hang_analyze script is available only on Linux® and UNIX operating systems.
The db2_hang_analyze script is a Perl script that runs indefinitely. It gathers metrics on each application for each iteration, and checks if the application is active over certain time interval (the default value is 300 seconds). If the application is not active over the time interval, it is flagged as hanging. If a potential hang is detected, a list of applications is written to a report file. You can terminate the script by pressing Ctrl-C or Ctrl-Z.
The db2_hang_analyze script is in the sqllib/samples/pd/ directory.
Authorization
You require one of the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- SYSMON
Required connection
Active databaseSyntax
Parameters
-
-db dbname
- Specifies the database for which to detect hangs. This parameter is required. member member number
- Specifies the member on which the script is issued. If this option is not specified, the script is issued on the current member. If the instance is in serial mode, this parameter is ignored. -timerlimit seconds
- Specifies the amount of time, in seconds, that an application is idle (no change in metrics) before it is determined to be hanging. The default value is 300 seconds. -sleeptime seconds
- Specifies the amount of time to wait, in seconds, before the script starts the next iteration of hang analysis. The default value is 60 seconds. -retrylimit attempts
- Specifies the number of times a db2pd command is run after it fails or timeouts. The default value is three attempts. -path directory
- Specifies the full directory path where the report file and log
file are stored. The default value is specified by the
DIAGPATH
database manager configuration parameters.
-cputhreshold percentage
- Specifies the threshold for a change in the percentage of processor time that an application uses. If an application's consumption change exceeds the threshold, the application is flagged as active. The default value is 0.1 (0.1%). -exec script_path
- Specifies a path to a script that runs after a hang is detected.
The script is likely customized to further troubleshooting of the
hang. The output of the script is placed in the
DIAGPATH
directory in the format ofdb2_hang_analyze.<timestamp>.exec
. The default value is off, meaning that no script is run after a hang is detected.
-log
- Specify yes to print notification about possible hanging applications, warnings, and errors to the log file. The default value is no. -sql
Specify yes to print the most recent SQL statement that was issued by the hanging application, if the data exists. The default value is no.
-list
- Generates a list of different db2_hang_analyze scripts that are running across your system. In this list, you can specify which db2_hang_analyze scripts to terminate. -h
- Displays help information.
Example 1
In the following example, thescript monitors the SAMPLE database to detect any possible application hangs. Various metrics from the db2pd command are collected on every application every 60 seconds. If an application is determined to be hanging, the script writes a report and then exits.
$HOME/sqllib/samples/pd/db2_hang_analyze -db sample -log
Invoked: /home/hotel32/shenli/sqllib/samples/pd/db2_hang_analyze -db sample -log
APPLICATION HANG DETECTION: Started on Fri Jan 25 14:41:38 EST 2013
Sleeptime : 60 seconds
Timer Limit : 300 seconds
Node Member : default
Retry Limit : 3
Log : yes
SQL : no
Event Metrics Available : yes
Logfile : db2_hang_analyze.20130125.14.41.38.10297.log
Script PID : 10297
CPU Threshold : 0.1%
Post Detection Script : none
Path : /home/hotel32/shenli/sqllib/db2dump
Press CTRL-C or CTRL-Z to terminate script
Pre-loop setup...
Iteration 1: No hang found.
Iteration 2: No hang found.
Iteration 3: No hang found.
Iteration 4: No hang found.
Iteration 5: POSSIBLE HANG DETECTED!
Logfile : /home/hotel32/shenli/sqllib/db2dump/db2_hang_analyze.20130125.14.41.38.10297.log
VIEW REPORT AT : /home/hotel32/shenli/sqllib/db2dump/db2_hang_analyze.20130125.14.41.38.10297.report
APPLICATION HANG DETECTION: Ended on Fri Jan 25 14:46:47 EST 2013
If a hang or multiple hangs are detected, then a report file is generated that lists the hanging applications:
cat /home/hotel32/shenli/sqllib/db2dump/db2_hang_analyze.20130125.14.41.38.10297.report
APPLICATION HANG DETECTION: Started on Fri Jan 25 14:41:38 EST 2013
Sleeptime : 60 seconds
Timer Limit : 300 seconds
Node Member : default
Retry Limit : 3
Log : yes
SQL : no
Event Metrics Available : yes
Logfile : db2_hang_analyze.20130125.14.41.38.10297.log
Script PID : 10297
CPU Threshold : 0.1%
Post Detection Script : none
Path : /home/hotel32/shenli/sqllib/db2dump/
POTENTIAL APPLICATIONS HANGING: 3 application(s).
Apphdl : 7
Status : CommitActive
AgentEDUID : 16
Apphdl : 9
Status : CommitActive
AgentEDUID : 28
Apphdl : 19
Status : CommitActive
AgentEDUID : 37
APPLICATION HANG DETECTION: Ended on Fri Jan 25 14:46:47 EST 2013
Example 2
You can control how the script detects possible hanging application by altering a number of the options:
$HOME/sqllib/samples/pd/db2_hang_analyze -db sample -member 0 -log -timerlimit 30 -sleeptime 15 -sql -path /TMP/
Invoked: /home/hotel32/shenli/sqllib/samples/pd/db2_hang_analyze -db sample -member 0 -log -timerlimit 30 -sleeptime 15 -sql -path /TMP/
APPLICATION HANG DETECTION: Started on Fri Jan 25 15:38:27 EST 2013
Sleeptime : 15 seconds
Timer Limit : 30 seconds
Node Member : 0
Retry Limit : 3
Log : yes
SQL : yes
Event Metrics Available : yes
Logfile : db2_hang_analyze.20130125.15.38.27.10189.log
Script PID : 10189
CPU Threshold : 0.1%
Post Detection Script : none
Path : /TMP
Press CTRL-C or CTRL-Z to terminate script
Pre-loop setup...
Iteration 1: No hang found.
Iteration 2: POSSIBLE HANG DETECTED!
Logfile : /TMP/db2_hang_analyze.20130125.15.38.27.10189.log
VIEW REPORT AT : /TMP/db2_hang_analyze.20130125.15.38.27.10189.report
APPLICATION HANG DETECTION: Ended on Fri Jan 25 15:39:03 EST 2013
In this example, the user wants to check whether there are any applications that are hanging on member 0. A number of db2pd command metrics are gathered every 15 seconds, instead of the default 60 seconds. The application is identified as hanging if its metrics do not change within 30 seconds, instead of the default of 300 seconds. After a hang is detected, the latest SQL statement that relates to the hanging application is printed in the report file. The report and log files are written to the /TMP/ directory.
The following report file displays the results. One application is possibly hanging.
cat /TMP/db2_hang_analyze.20130125.15.38.27.10189.report
APPLICATION HANG DETECTION: Started on Fri Jan 25 15:38:27 EST 2013
Sleeptime : 15 seconds
Timer Limit : 30 seconds
Node Member : 0
Retry Limit : 3
Log : yes
SQL : yes
Event Metrics Available : yes
Logfile : db2_hang_analyze.20130125.15.38.27.10189.log
Script PID : 10189
CPU Threshold : 0.1%
Post Detection Script : none
Path : /TMP
POTENTIAL APPLICATIONS HANGING: 1 application(s).
Apphdl : 51
Status : UOW-Executing
AgentEDUID : 44
Current query : select * from staff
Last query: : none
APPLICATION HANG DETECTION: Ended on Fri Jan 25 15:39:03 EST 2013
Usage notes
The script does not consider applications that are in a lock wait state to be hanging.
The script requires Perl v5.6.0 or higher.