Detect data provider malfunctioning for the ITM System P agents
bogaerba 10000087KQ Visits (4330)
Although some ITM agents for System P have been replaced by the Unix OS agent and HMC 188.8.131.52 agents, there are still customers that are using the ITM System P agents, for example in a VIOS environment.
How to detect that an ITM System P agent data provider has stopped running?
- You can monitor the update of the file khdexp.cfg, but this will only work if data can be successfully exported to the warehouse. When the data provider is running fine, but the warehouse
- You can check for "Spmi" or "RSi" errors in the logs, but this would require an additional agent (Unix Logs agent, Log File agent or an Agent Builder agent).
- You can create a situation that checks the Performance Object Status of its attribute groups.
About the possible causes, when the shared memory in the underlying operating system is corrupted, it could also result in a data provider failing to start. Alternately, bad or missing data could also be reported on the portal. System P agents make SPMI library calls in order to retrieve data. Shared memory corruption affects data collection. In order to correct the problem, shared memory needs to be cleared and the System P agents restarted for correct data to start flowing in. The steps to accomplish that is as follows:
a) Stop the System P (AIX premium, VIOS or CEC agent )
b) List all processes that are using libSpmi.a:
c) Kill those processes:
d) Repeat step (b). If the processes still show up, individually kill -9 <pid of the process>
e) Clean up the stale ipcs:
- If there are any listed from the above command, remove them by running:
f) Then run:
g) Restart System P agents and ensure that data is being retrieved.
Shared memory corruption could occur during a library version mismatch. It could also occur due to memory overflows and memory overwrites, caused by bugs in the agent or the operating system.
There are also some other underlying AIX problems related to the perfagent.tools fileset which cause the failure of System P agents collecting data. Refer to the following link for details:
The versions of perfagent.tools mentioned in the above article should also fix problems related to Shared memory corruption.
Finally, there was recently a case where the /usr/lib/libSpmi.a library was missing, although the perfagent.tools fileset was present. This prevented the data provider to start up.
Hope this helps!