IBM Support

Detect data provider malfunctioning for the ITM System P agents

Technical Blog Post


Abstract

Detect data provider malfunctioning for the ITM System P agents

Body

Although some ITM agents for System P have been replaced by the Unix OS agent and HMC 6.2.2.3 agents, there are still customers that are using the ITM System P agents, for example in a VIOS environment.

How to detect that an ITM System P agent data provider has stopped running?
There can be several causes for the data provider process (aixDataProvider-61, cecDataProvider or hmcDataProvider) to fail from running correctly: SPMI shared memory corruption, coredump, known ITM or AIX bugs, ...
There are several possible options:
- You can create a situation for the "Processes Detail" attribute group (AIX Premium, VIOS Premium or HMC Base agent) that checks for the missing process.

- You can monitor the update of the file khdexp.cfg, but this will only work if data can be successfully exported to the warehouse.  When the data provider is running fine, but the warehouse  
(or warehouse proxy) is not running or not reachable, khdexp.cfg will not be updated neither.  So, this will cause an event for khdexp.cfg even though the data provider is running.  File monitoring is not possible with the System P agents, so you will need to use the UNIX OS agent to achieve this.

- You can check for "Spmi" or "RSi" errors in the logs, but this would require an additional agent (Unix Logs agent, Log File agent or an Agent Builder agent).

- You can create a situation that checks the Performance Object Status of its attribute groups.
For example, for the AIX Premium agent this would be:
(  #'KPXPOBJST.ERRCODE' != NO ERROR)
This will trigger as soon as one or more attribute groups are in error status and not collecting data, due to a failing data provider.

 

About the possible causes, when the shared memory in the underlying operating system is corrupted, it could also result in a data provider failing to start. Alternately, bad or missing data could also be reported on the portal. System P agents make SPMI library calls in order to retrieve data. Shared memory corruption affects data collection. In order to correct the problem, shared memory needs to be cleared and the System P agents restarted for correct data to start flowing in. The steps to accomplish that is as follows:

a) Stop the System P (AIX premium, VIOS or CEC  agent )

b) List all processes that are using libSpmi.a:
       # genld -l | grep -p Spmi | grep Proc

c) Kill those processes:
       # genld -l | grep -p Spmi | grep Proc | awk '{print $2}' | xargs kill

d) Repeat step (b). If the processes still show up, individually kill -9 <pid of the process>

e) Clean up the stale ipcs:
   -  Run the command:                                                      
        # ipcs -a | grep 0x78 | awk '{print $2}'     

   - If there are any listed from the above command, remove them by running:
        # ipcrm -m <id as returned by step i) one at a time>
   - Rerun the "ipcs -a  | grep 0x78 | awk '{print $2}' "  to confirm there is no ipcs left out.

f) Then run:                                                             
slibclean                                                                

g) Restart System P agents and ensure that data is being retrieved.

Shared memory corruption could occur during a library version mismatch. It could also occur due to memory overflows and memory overwrites, caused by bugs in the agent or the operating system.
Check this wiki for known issues and fixes for the ITM System P agents:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Monitoring/page/System%20P%20Agents

There are also some other underlying AIX problems related to the perfagent.tools fileset which cause the failure of System P agents collecting data. Refer to the following link for details:
http://www-01.ibm.com/support/docview.wss?uid=swg21447016

The versions of perfagent.tools mentioned in the above article should also fix problems related to Shared memory corruption.

 

Finally, there was recently a case where the /usr/lib/libSpmi.a library was missing, although the perfagent.tools fileset was present. This prevented the data provider to start up.
We solved this problem by reinstalling the perfagent.tools fileset using this procedure:
using the smitty command:
-> Install and Update Software
-> Install and Update from ALL Available Software
-> INPUT device / directory for software => enter the location where you downloaded perfagent.tools
-> SOFTWARE to install => F4 and select perfagent.tools
-> OVERWRITE same or newer versions? => F4 and set to 'yes'
push the 'Enter' button to install it

 

Hope this helps!

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

UID

ibm11084257