IBM Support

50 DB2 Nuggets # 46: Expert Advice Collecting hang diagnostics manually when db2fodc -hang itself hangs.

Technical Blog Post


Abstract

50 DB2 Nuggets # 46: Expert Advice Collecting hang diagnostics manually when db2fodc -hang itself hangs.

Body

Hello db2 DBAs

Have you encountered any db2 instance hang situations where in the db2fodc -hang command also hung?  Fear Not!  I have created a template that walks through the commands to execute at various levels i.e. db2 instance / database etc manually to collect the hang diagnostics and provide that to the support team.  It would make the diagnosis of the hang much easier :)

Step 0.
Clean out sqllib/db2dump directory,  move or remove any FODC_* directories from this directory to keep the db2support.zip file small.
  also remove any earlier *.bin files or trap* or stack*
cd $DB2_HOME/db2dump  (or DIAGPATH directory) and create a new directory and collect diagnostics in that directory.

Step 1.
Execute vmstat/iostat command to check for OS level bottleneck which
can indirectly cause db2 hangs.
Sample OS commands
$vmstat -Iwt 30 > /db/db2inst5/db2diag/dbvmstat.out
$iostat -DlTR 30 > /db/db2inst5/db2diag/iostat.out
e.g.
System configuration: lcpu=16 mem=12288MB ent=0.40

kthr    memory              page              faults              cpu
----- ----------- ------------------------ ------------ -----------------------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
29  0 4514677 229661   0   0   0   0    0   0  11 4785 1749  3 32 65  0  0.22  54.5
15  0 4514782 229374   0   0   0   0    0   0  28 7774 1765  4  5 91  0  0.06  15.0
19  0 4514785 225894   0   0   0   0    0   0 196 5118 2383  4  8 87  1  0.08  20.8
20  0 4514786 225686   0   0   0   0    0   0  16 10154 1812  5  5 90  0  0.07  17.3
18  0 4514786 225686   0   0   0   0    0   0   5 4523 1735  3  3 94  0  0.05  11.4

Pay particular attention to cpu usage % idle. If cpu is 100% utilized idle % would be 0 which is a cause for concern. Talk to your SysAdmin to confirm no OS bottlenecks present when the db2 hang is occurring.

Step 2.

Execute from the shell
$db2level
If the data is returned we know the server is NOT hung. Atleast some db2 binaries
are able to be executed.  If this returned data fine proceed to next step.  If this
command also hung engage your System Admin to collect trace at OS level to find the root cause.

Step 3.
$db2 get dbm cfg 
$db2 get db cfg for <dbname>

and see if data is returned. This confirms the config files are accessible.

If this successful proceed to next step.

Step 4.
$db2 list applications   (see if it hangs)
If hangs proceed to the hang data collection step.

$db2 connect to <dbname>  (see if it hangs)
If successful it proves local shared memory connections are fine.

Step 5.
$db2 connect to <db tcp alias> user <username>   (see if it hangs)
If this is successful it confirms TCP/IP protocol is working fine for db connection.
If this hangs it indicates the tcp/ip protocol may be the culprit.

If they are successful time to gather some db2pd command outputs that can assist db2 support in finding the root cause.

$db2pd -eve |tee $DIAGPATH/eve.out
$db2pd -latches -rep 30 3  |tee $DIAGPATH/latches.out
$db2pd -stack all -rep 30 3


Regular hang data collection step to collect all other data after forcing as many connections as possible.
$db2 force application all
wait for it to complete.  This will remove any extraneous connection to db that does not involve the instance hang.

$db2fodc -hang full
  and wait for it to complete..  
  you can monitor the progress with the log file in the FODC_Hang_<timestamp> directlory.  
  Use Ctrl C if any of the commands hang.. If it still hangs issue db2_kill to forcibly kill the instance.
 
  Restart the instance and run a db2support command
 $db2support . -d <dbname> -s
 
  and upload the db2support.zip file after creating a PMR with Tech support team.
 
Results of the above steps would help us understand whether the instance hang is due to some Latching issues or Lock issues or OS bottleneck or any other issues and help to arrive at a faster root cause analysis.

What kind of data collection do you use for db2 instance hangs?  Let me know if you have any questions/ comments on above data collection method.

Murali

RESOURCES

1. db2fodc command (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0051934.html?cp=SSEPGG_10.1.0%2F3-6-2-6-53&lang=en

2. db2pd command (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.admin.trb.doc/doc/c0054595.html?lang=en

3. db2support command (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0004503.html?cp=SSEPGG_10.1.0%2F3-6-2-6-115&lang=en
 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11141420