IBM Support

tacmd executecommand fails on some AIX agents

Technical Blog Post


Abstract

tacmd executecommand fails on some AIX agents

Body

image

tacmd executecommand fails on some AIX agents


This turned out to be an agent issue rather than a tacmd issue but showed up first as the tacmd command not responding.

 I will in this blog go through the areas checked for issues, as they may help identify issues that you see.

 Issue was reported as being that a tacmd executecommand would work correctly on some AIX ux agents and then on others it would fail.

The failure message was :

KUIEXC203E: Agent has not returned a response to the Execute Command request.
 
The request was not completed by the agent and a result was not returned.
This can happen for several reasons:
- The command is still "in progress" at the agent and the waiting TEMS times out the request.
- The agent has switched to a different TEMS.
- The RTEMS for that agent has stopped.

The machines were on different AIX version (6 and 7) and the agents varied from 06.22.04.00 to 06.30.04.00

The agents were connected via a number of RTEMS and reconnecting the agents to other RTEMS did not help.
A reinstall of the agent had also been tried and this did not help.  

KT1_TEMS_SECURE property is set to YES on all RTEMSes.
Note:   This parameter needs to be set to YES on all TEMSes to allow tacmd get/put/execute requests to be run.

So the information given covered some of the first steps to be looked at.
If it was a single AIX version or agent version this could have indicated an issue, but that could be ruled out.
Since the agents were across RTEMS, and switching made no difference this ruled out an issue there.
A reinstall, made it clear the agent had been stopped and started, so it was not just a hung agent.

Even though there were different versions of the agent with the problem, the next check made sure that the ue level had been upgraded:

Some of the machines were upgraded to 6.3 fixpacks and on these machines UE should be installed as per the README  on the fixpack:
 
If you upgrade any OS agent to 6.3.0 Fix Pack 2 or later, you must also separately upgrade tacmd (User Interface Extensions, product code "ue")
on the same machine to 6.3.0 Fix Pack 2 or later. If tacmd is not upgraded, commands run from this machine will fail.
 

However this showed to be correct on the agents.

The logs on the HUB TEMS  and agent were reviewed and were not showing any errors that indicated any further information on the issue.
A quick trace on the tacmd command also did not give any more information.

It was therefore decide to do a trace  on the agent and use a simple command of /bin/ls in the format:


tacmd executecommand -m dbk_dbkpcat02:KUX -c "/bin/ls" -o -e -l -t 30 -r -f LOCAL -d /opt/IBM/ITM/logs/response.out

Note:that the executecommand had a full path name on it, as a $PATH is not always available when the command it run, it is much better to give a full path.

A simple command was used to rule any issues with the running of the command itself.

Tracing was set on the agent machine to

ERROR(UNIT:KRA ALL)(UNIT:KGL ALL)
 

This should have showed the  KGL_Execute function being run but this never showed in the logs.

After work with Level 3 it was seen that the the automation request was, for some reason, getting hung in the Agent processing for CPU Statistics.


It was also noticed by the customer at this time, that the workspaces, or some of  the workspaces for the agents with this problem showed no data.

So if you are having this type of problem make sure that you check this as well.

Further investigation on the agent side identified a known issue with the ux agent on AIX which could cause a hang or high cpu on the agent.

the issue is actually due to an AIX problem but there are apars for both AIX and for the OS agent:

APAR IV81845

where the ifstat deamon can become unresponsive on AIX.   
 

The problem is that AIX returns some invalid data to the request for interface data.

The agent does not handle the invalid data well which results in a hang,  crash or high CPU.
 The fix is available as:

IBM Tivoli Monitoring: Unix(R) OS Agent 6.3.0.6-TIV-ITM_UNIX-IF0001
IBM Tivoli Monitoring: Unix(R) OS Agent 6.3.0.4-TIV-ITM_UNIX-IF0004
IBM Tivoli Monitoring 6.3.0 Fix Pack 7 (6.3.0-TIV-ITM-FP0007)

This allows the agent to handle the invalid data.
 
The underlying issue, however, is in AIX  and APAR IV82718 corrects the invalid data.

This is available at a number of AIX levels as different APARS:

6100-09 - use AIX APAR IV82718
6100-09 - use AIX APAR IV82718
7100-04 - use AIX APAR IV83237
7200-01 - use AIX APAR IV83025


In this customers case it was easier to install the Os Agent fix, and once that was done the tacmd executecommand worked correctly on all the agents.

These are the links to more detail on the fixes:

http://www-01.ibm.com/support/docview.wss?uid=swg1IV81845

http://www-01.ibm.com/support/docview.wss?uid=isg1IV82718

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""},{"Business Unit":{"code":"BU050","label":"BU NOT IDENTIFIED"},"Product":{"code":"SSTFXA","label":"Tivoli Monitoring"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":""}]

UID

ibm11083849