IBM Support

TSAMP: Monitor Command Timeout

Troubleshooting


Problem

Found error messages in syslogs and error report showing that IBM.GblResRM is reporting an error condition.

Symptom

Syslogs:
Oct 30 19:25:37 NC106152 daemon:err|error GblResRM[15204516]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: d889c2ac:::Details File:  :::Location: RSCT,Application.C,1.92.3.59,4876             :::GBLRESRM_MONITOR_TIMEOUT IBM.Application monitor command timed out. Resource name res1

Error Report:
---------------------------------------------------------------------------
LABEL:          GBLRESRM_MONITOR_TI
IDENTIFIER:     D889C2AC

Date/Time:       Thu Oct 30 19:27:11 2014
Sequence Number: 7004
Machine Id:      00F6A84B4C00
Node Id:         NC106152
Class:           O
Type:            PERM
WPAR:            Global
Resource Name:   GblResRM

Description
IBM.Application monitor command timed out.

Cause

1) What is a monitor command timeout:
A monitor command timeout occurs when a set amount of time has passed after TSAMP issues a monitor command and the monitor command has not exited/finished. This set amount of time is set when the resource is created and can be found in the persistent attributes for all IBM.Application resources.

2) What happens when a monitor command times out:
When the monitor command runs in excess of the time (in seconds) set in the MonitorCommandTimeout attribute, then TSAMP will kill the monitor command script and issue the above messages. When this happens the resource who's monitor command was killed is marked with the OpState of "Unknown" and no further automation action will be taken against that resource until such a time as a monitor command returns within the MonitorCommandTimout timeframe with a valid OpState value.

To check the value of the MonitorCommandTimeout attribute issue the following command on either node as root:
lsrsrc -s 'Name like"%" && ResourceType=1' IBM.Application Name MonitorCommandTimeout

3) What causes monitor command timeouts:
A monitor command timeout is typically due to a increased load on the system making what are typically simple commands take longer to complete. TSAMP uses shells scripts to monitor resources and those scripts can vary in complexity depending on what is needed for the solution.
If monitor command timeouts are occurring and there is no extra load on the systems, then likely the value of the MonitorCommandTimeout attribute has been set to aggressive and needs to be relaxed.

Resolving The Problem

To adjust the MonitorCommandTimeout attribute for a resource, you will need to know what the value is for the resource currently. In this example we are using an IBM.Application resource called "res1":

First you should find out how long the time out needs to be set for. This is not always easy as your time frame for testing might not match the load level during the times when the monitor command timeout messages are appearing. However, to test how long it takes for the monitor command to return follow the steps below:

Issue the following command on any node:


[root@NC106152 /]# lsrsrc -s 'Name ="res1"' IBM.Application MonitorCommand MonitorCommandTimeout UserName
Resource Persistent Attributes for IBM.Application
resource 1:
  MonitorCommand        = "/usr/sbin/rsct/sapolicies/fileapp/fileapp.sh status res1"
  MonitorCommandTimeout = 5
  UserName              = "root"
resource 2:
  MonitorCommand        = "/usr/sbin/rsct/sapolicies/fileapp/fileapp.sh status res1"
  MonitorCommandTimeout = 5
  UserName              = "root"
resource 3:
  MonitorCommand        = "/usr/sbin/rsct/sapolicies/fileapp/fileapp.sh status res1"
  MonitorCommandTimeout = 5
  UserName              = "root"

The issue the command as the user "UserName", with the following syntax :

time <MonitorCommand as shown within quotes in above output>

Here's an example :
[root@NC106152 /]# time /usr/sbin/rsct/sapolicies/fileapp/fileapp.sh status res1

real    1m0.006s
user    0m0.002s
sys     0m0.002s

1 minute and 6 thousandths of a second was how long this monitor command took to complete.

To adjust the timeout issue the following command as root on either node:

chrsrc -s 'Name ="res1"' IBM.Application MonitorCommandTimeout=10

After changing this value, you should monitor your cluster for repeats of the timeout messages and ensure that the problem has been taken care of.

Note: You should not adjust this too high as it affects how quickly TSAMP will detect and react to an outage. You want it high enough to get valid operational state.

[{"Product":{"code":"SSRM2X","label":"Tivoli System Automation for Multiplatforms"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"2.1;2.2;2.3;3.1;3.2;3.2.1;3.2.2;4.1","Edition":"All Editions","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
24 June 2019

UID

swg21688755