IBM Support

DB2 & TSA: WHAT PROGRAM DID TSA REPORT TIMING OUT?

Technical Blog Post


Abstract

DB2 & TSA: WHAT PROGRAM DID TSA REPORT TIMING OUT?

Body

In a purescale environment TSA is used to monitor the various resource
that need to be running to ensure a healthy purescale system.

 

It might happen that some of these resource fail. If that is the case
you want to find out which one it is and take proper actions. Sometimes
it's not easy to navigate the various files to find out what happened.

 

This small document will use a live example to show you how to do it.

 

When a problem occurs it will be reported in the system log files that
can be viewed using 'errpt -a'. For example:

 

  ---------------------------------------------------------------------------
  LABEL:          GBLRESRM_MONITOR_TI
  IDENTIFIER:     87EB4A70

  Date/Time:       Fri Aug 11 05:33:00 KORST 2017
  Sequence Number: 6932
  Machine Id:      00CD15C74C00
  Node Id:         prodnode1
  Class:           O
  Type:            PERM
  WPAR:            Global
  Resource Name:   GblResRM

 

  Description
  IBM.Application monitor command timed out.

 

  Probable Causes
  The Resource Manager killed the monitor command
  because it did not return within the specified command timeout

 

  Failure Causes
  The Resource Manager killed the monitor command
  because it did not return within the specified command timeout

          Recommended Actions
          Check the time the monitor command needs to complete.
  Adapt the MonitorCommandTimeout attribute if necessary.

 

  Detail Data
  DETECTING MODULE
  RSCT,Application.C,1.92.3.177,6127
  ERROR ID

  REFERENCE CODE

  Resource name
  ca_db2prod_0-rs

 

In the above message we have 3 important information that will allow
us to go a bit further:

 

  Date of event: Fri Aug 11 05:33:00 KORST 2017
  Class and Nature of failure: IBM.Application monitor command timed out
  Resource for which monitoring failed: ca_db2prod_0-rs

 

The next step is to find out what is the program used to monitor the resource.
This can be found by printing the list of resource and their attributes using
the 'lsrsrc' command. To do that we will use the 'class' we found in the
'errpt -a' output above, that is 'IBM.Application':

 

  lsrsrc IBM.Application

 

The output might be rather big but we would be looking for the 'resource'
named 'ca_db2prod_0-rs' as appeared in the 'errpt -a' output. We find it
in the 'lsrsrc' output:

 

  resource 29:
        Name                  = "ca_db2prod_0-rs"
        ResourceType          = 0
        AggregateResource     = "0x2028 0xffff 0x187fd6e7 0xb47a34be 0x94760fcd 0xdef74e22"
        StartCommand          = "/db2/db2prod/sqllib/adm/db2rocme 1 CF db2prod 128 START"
        StopCommand           = "/db2/db2prod/sqllib/adm/db2rocme 1 CF db2prod 128 STOP"
        MonitorCommand        = "/db2/db2prod/sqllib/adm/db2rocme 1 CF db2prod 128 MONITOR"

 

The 'program' we are interested in is 'MonitorCommand'. So we find that the
command that timed out was:

 

  "/db2/db2prod/sqllib/adm/db2rocme 1 CF db2prod 128 MONITOR"

 

Most commands can be run on their own (with appropriate environment) to
help diagnose further the issue.

 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11140370