IBM Support

Fixing high CPU issue with Agentless Linux

Technical Blog Post


Abstract

Fixing high CPU issue with Agentless Linux

Body

If you have one or more Agentless Linux instances monitoring a meaningful number of remote systems, let's say 70-80 for each instance, you may notice the Agentless processes are constantly consuming around 30% of CPU.

This could cause resource shortage in case other processes are started or when there are temporary CPU peaks from other workloads.

It happens because

by default Agentless collect data for all attribute groups every 60 seconds.

Depending on the number of monitored servers, the used CPU can reach such peaks (20%-30%).

Typically the Processes attribute group can take longer to collect as there can be a lot of processes on some of the remote systems.

In order to reduce CPU consumption, you should go through a fine tuning of the Agentless instances, by increasing the data collection interval time.

This is configurable, down to the attribute group level.   


Here are some options for tuning that can be done.    
     
These environment variables are set in the `r<n>.ini/env` file:    
     
CDP_DP_REFRESH_INTERVAL = 60    
This is the overall SNMP polling interval in seconds which updates the cache. Default is 60.

 Can change this to a value more like 120 or 180    seconds.    
     
CDP_DP_CACHE_TTL = 60    
This is the timeout for the cache in seconds. Default is 60.
It should be set to a minimum of the highest value of the REFRESH_INTERVAL    settings.    
     
There is also the ability to specify collection intervals by attribute group using    

    CDP_<attrubute group name >_REFRESH_INTERVAL.

So different attribute groups could be collected at different intervals.    
     
For example:

CDP_DISK_REFRESH_INTERVAL=70    
CDP_PROCESSES_REFRESH_INTERVAL=90    
CDP_NETWORK_REFRESH_INTERVAL=80    

     
     
The biggest benefit would most likely be for increase the interval for the processes as that typically will have the most requests back and forth, like Processes.

Those are the attribute groups available for the Linux Agentless:    
     
LNX Performance Object Status    
Performance Object Status    
Managed Systems    
Disk    
hrStorageTable    
Memory    
Network    
Processes    
Processor    
System    
Thread Pool Status    
Total Virtual MB    
Used Virtual MB    
Virtual Memory

The best practice requires to to check the situations active on the Agentless instances, investigate about the situation interval    
of each situation, find the smaller situation interval for each attribute group and then make further evaluation for an optimization of the agent data collection interval.    
For example, we know that Processes attribute group may be CPU intensive, especially in the servers with a big number of running processes.    
If the situations running for the Processes attribute group have a 5 minute interval, we can think to set the data collection for this attribute group to 3 or 4 minutes instead of 1 minute.    
     
Similar evaluation should be made for all the other attribute groups, once you know the interval of the situations 
and of the historical collection currently active, but generally speaking, I would not expect to keep any of them at the default 1 minute interval.

For the attribute groups you are not interested to, you can set a very high collection interval (hourly, for example) so that you will save further CPU from being wasted for useless data collection

 

Thanks for reading

 

 

Tutorials Point

 

Subscribe and follow us for all the latest information directly on your social feeds:

 

 

image

 

image

 

image

 

 

  

Check out all our other posts and updates:

Academy Blogs:https://goo.gl/U7cYYY
Academy Videos:https://goo.gl/FE7F59
Academy Google+:https://goo.gl/Kj2mvZ
Academy Twitter :https://goo.gl/GsVecH


image

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm11082637