IBM Support

RTEMS reporting high CPU

Technical Blog Post


Abstract

RTEMS reporting high CPU

Body

image

RTEMS reporting high CPU


RTEMS using high levels of CPU which then results in the RTEMS crashing.  

The first step is review the logs and to see if there were any messages that could explain the issue.

In the <temsname>_ms_nnnnnnnn-0x.log was seen a line:


56D699F0.0000-2F0:kdepenq.c,124,"KDEP_Enqueue") (6019:1918) receive limit (8192) reached: 0.0.0.75

This is normally an indication that there are too many agents connected to an RTEMS, or that an agent is trying to send too much data back to a TEMS.


The value of this can be tuned as the default maximum size of data that an Agent will attempt to return in one RPC request is 4096 KB.

 
If the size exceeds this limit, the error KDE1_STC_RXLIMITEXCEEDED should be seen

A limit can be set to avoid possible memory overruns for an unusually large RPC request.

If the limit leads to problems processing requests,
consider increasing the limit by adding or editing the KDCFC_RXLIMIT value in the
%CANDLE_HOME%\CMS\KBBENV file (on Windows) or <InstallDirectory>/config/kbbenv.ini file (on UNIX/Linux).

The maximum value allowed is 65536; the minimum value allowed is 1024.
For example, to change it to 32768 KB (32 MB):
KDCFC_RXLIMIT=32768

However this is not the only issue and diagnostic work can be done using the tool published here:

https://www.ibm.com/developerworks/community/blogs/jalvord/entry/sitword_tems_audit_process_and_tool?lang=en

A trace of:
KBB_RAS1='error (unit:kpxrpcrq,Entry="IRA_NCS_Sample" state er)(unit:kdsstc1,Entry="ProcessTable" all er)(unit:kdssqprs in er)(unit:kraafira,Entry="runAutomationCommand" all)(unit:kglhc1c all)'

can be set at the TEMS and run for an hour before a pdcollect is taken, and the tracing then turned off.

This is very important if you use this tool; the tracing should only run for a short time as the tracing is verbose and is only needed over a short time to measure what is happening with situations.
 
A report can be produced using the trace logs and the perl given in the above blog.

This can identify if one or more situations are the main users of the cpu.

In the case seen the situation causes the large cpu usage was reviewed , and it was seen that it used more than one multi-row attribute group.

The situation did a scan in a file and then checked for a missing process.  Both of these actions return more than one row of data.
A situation like this cannot be created in the TEP Situation Editor but can be created by hand and inserted into the TEMS with tacmd editSit.
The situation did not error but caused massive amounts of data to be sent to the RTEMS, which once the situation was distributed to a number of agents caused the RTEMS to overload and shutdown.

Once the situation was deleted then the RTEMS functioned normally.

Check out all our other posts and updates:

Academy Blogs:                    http://ow.ly/FezGi

Academy Videos:                  http://bit.ly/1wFKveY

Academy Google+:               http://bit.ly/1sR5QTV

Academy Twitter Handle:     http://bit.ly/1CknfoF   

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""},{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"SSZ8F3","label":"IBM Tivoli Monitoring V6"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":""}]

UID

ibm11084119