db2_hang_analyze - Detect and analyze possible application hangs script

Detects and analyzes possible Db2® database application hangs by using various metrics that are gathered from the db2pd command. The db2_hang_analyze script is available only on Linux® and UNIX operating systems.

The db2_hang_analyze script is a Perl script that runs indefinitely. It gathers metrics on each application for each iteration, and checks if the application is active over certain time interval (the default value is 300 seconds). If the application is not active over the time interval, it is flagged as hanging. If a potential hang is detected, a list of applications is written to a report file. You can terminate the script by pressing Ctrl-C or Ctrl-Z.

The db2_hang_analyze script is in the sqllib/samples/pd/ directory.

Authorization

You require one of the following authorities:

  • SYSADM
  • SYSCTRL
  • SYSMAINT
  • SYSMON

Required connection

Active database

Syntax

Read syntax diagramSkip visual syntax diagramdb2_hang_analyzedbdbname membermember–numbertimerlimit secondsSleeptime Secondsretrylimit attemptspathdirectorycputhreshold percentageexec script_pathlogsqllisth

Parameters

-db dbname
Specifies the database for which to detect hangs. This parameter is required.
member member number
Specifies the member on which the script is issued. If this option is not specified, the script is issued on the current member. If the instance is in serial mode, this parameter is ignored.
-timerlimit seconds
Specifies the amount of time, in seconds, that an application is idle (no change in metrics) before it is determined to be hanging. The default value is 300 seconds.
-sleeptime seconds
Specifies the amount of time to wait, in seconds, before the script starts the next iteration of hang analysis. The default value is 60 seconds.
-retrylimit attempts
Specifies the number of times a db2pd command is run after it fails or timeouts. The default value is three attempts.
-path directory
Specifies the full directory path where the report file and log file are stored. The default value is specified by the DIAGPATH database manager configuration parameters.
-cputhreshold percentage
Specifies the threshold for a change in the percentage of processor time that an application uses. If an application's consumption change exceeds the threshold, the application is flagged as active. The default value is 0.1 (0.1%).
-exec script_path
Specifies a path to a script that runs after a hang is detected. The script is likely customized to further troubleshooting of the hang. The output of the script is placed in the DIAGPATH directory in the format of db2_hang_analyze.<timestamp>.exec. The default value is off, meaning that no script is run after a hang is detected.
-log
Specify yes to print notification about possible hanging applications, warnings, and errors to the log file. The default value is no.
-sql

Specify yes to print the most recent SQL statement that was issued by the hanging application, if the data exists. The default value is no.

-list
Generates a list of different db2_hang_analyze scripts that are running across your system. In this list, you can specify which db2_hang_analyze scripts to terminate.
-h
Displays help information.

Example 1

In the following example, thescript monitors the SAMPLE database to detect any possible application hangs. Various metrics from the db2pd command are collected on every application every 60 seconds. If an application is determined to be hanging, the script writes a report and then exits.

$HOME/sqllib/samples/pd/db2_hang_analyze -db sample -log
Invoked: /home/hotel32/shenli/sqllib/samples/pd/db2_hang_analyze -db sample -log
APPLICATION HANG DETECTION: Started on Fri Jan 25 14:41:38 EST 2013
Sleeptime                :  60 seconds
Timer Limit              :  300 seconds
Node Member              :  default
Retry Limit              :  3
Log                      :  yes
SQL                      :  no
Event Metrics Available  :  yes
Logfile                  :  db2_hang_analyze.20130125.14.41.38.10297.log
Script PID               :  10297
CPU Threshold            :  0.1%
Post Detection Script    :  none
Path                     :  /home/hotel32/shenli/sqllib/db2dump

Press CTRL-C or CTRL-Z to terminate script
Pre-loop setup...
Iteration 1: No hang found.
Iteration 2: No hang found.
Iteration 3: No hang found.
Iteration 4: No hang found.
Iteration 5: POSSIBLE HANG DETECTED!
Logfile                  :  /home/hotel32/shenli/sqllib/db2dump/db2_hang_analyze.20130125.14.41.38.10297.log
VIEW REPORT AT           :  /home/hotel32/shenli/sqllib/db2dump/db2_hang_analyze.20130125.14.41.38.10297.report
APPLICATION HANG DETECTION: Ended on Fri Jan 25 14:46:47 EST 2013

If a hang or multiple hangs are detected, then a report file is generated that lists the hanging applications:

cat /home/hotel32/shenli/sqllib/db2dump/db2_hang_analyze.20130125.14.41.38.10297.report
APPLICATION HANG DETECTION: Started on Fri Jan 25 14:41:38 EST 2013
Sleeptime                :  60 seconds
Timer Limit              :  300 seconds
Node Member              :  default
Retry Limit              :  3
Log                      :  yes
SQL                      :  no
Event Metrics Available  :  yes
Logfile                  :  db2_hang_analyze.20130125.14.41.38.10297.log
Script PID               :  10297
CPU Threshold            :  0.1%
Post Detection Script    :  none
Path                     :  /home/hotel32/shenli/sqllib/db2dump/

POTENTIAL APPLICATIONS HANGING: 3 application(s).
Apphdl              :  7
Status              :  CommitActive
AgentEDUID          :  16

Apphdl              :  9
Status              :  CommitActive
AgentEDUID          :  28

Apphdl              :  19
Status              :  CommitActive
AgentEDUID          :  37


APPLICATION HANG DETECTION: Ended on Fri Jan 25 14:46:47 EST 2013

Example 2

You can control how the script detects possible hanging application by altering a number of the options:

$HOME/sqllib/samples/pd/db2_hang_analyze -db sample -member 0 -log -timerlimit 30 -sleeptime 15 -sql -path /TMP/
Invoked: /home/hotel32/shenli/sqllib/samples/pd/db2_hang_analyze -db sample -member 0 -log -timerlimit 30 -sleeptime 15 -sql -path /TMP/
APPLICATION HANG DETECTION: Started on Fri Jan 25 15:38:27 EST 2013
Sleeptime                :  15 seconds
Timer Limit              :  30 seconds
Node Member              :  0
Retry Limit              :  3
Log                      :  yes
SQL                      :  yes
Event Metrics Available  :  yes
Logfile                  :  db2_hang_analyze.20130125.15.38.27.10189.log
Script PID               :  10189
CPU Threshold            :  0.1%
Post Detection Script    :  none
Path                     :  /TMP

Press CTRL-C or CTRL-Z to terminate script
Pre-loop setup...
Iteration 1: No hang found.
Iteration 2: POSSIBLE HANG DETECTED!
Logfile                  :  /TMP/db2_hang_analyze.20130125.15.38.27.10189.log
VIEW REPORT AT           :  /TMP/db2_hang_analyze.20130125.15.38.27.10189.report
APPLICATION HANG DETECTION: Ended on Fri Jan 25 15:39:03 EST 2013

In this example, the user wants to check whether there are any applications that are hanging on member 0. A number of db2pd command metrics are gathered every 15 seconds, instead of the default 60 seconds. The application is identified as hanging if its metrics do not change within 30 seconds, instead of the default of 300 seconds. After a hang is detected, the latest SQL statement that relates to the hanging application is printed in the report file. The report and log files are written to the /TMP/ directory.

The following report file displays the results. One application is possibly hanging.


cat /TMP/db2_hang_analyze.20130125.15.38.27.10189.report
APPLICATION HANG DETECTION: Started on Fri Jan 25 15:38:27 EST 2013
Sleeptime                :  15 seconds
Timer Limit              :  30 seconds
Node Member              :  0
Retry Limit              :  3
Log                      :  yes
SQL                      :  yes
Event Metrics Available  :  yes
Logfile                  :  db2_hang_analyze.20130125.15.38.27.10189.log
Script PID               :  10189
CPU Threshold            :  0.1%
Post Detection Script    :  none
Path                     :  /TMP

POTENTIAL APPLICATIONS HANGING: 1 application(s).
Apphdl              :  51
Status              :  UOW-Executing
AgentEDUID          :  44
Current query       :  select * from staff
Last query:         :  none


APPLICATION HANG DETECTION: Ended on Fri Jan 25 15:39:03 EST 2013

Usage notes

The script does not consider applications that are in a lock wait state to be hanging.

The script requires Perl v5.6.0 or higher.