IBM Support

QRadar: How to assess High CPU usage

How To


Summary

CPU load is one of the four key areas of performance.
We need to troubleshoot and understand the CPU load, by using the 'top' command.

Objective

Provide a guide about how to troubleshoot high CPU load taking in account the different scenarios that can be contributing to this condition.

Environment

7.3.x, 7.4.x, 7.5.x

Steps

For understanding CPU load, the most important part of the top output is Load Average:
Top Command
These values correspond to the system load in periods of 1 minute, 5 minutes, and 15 minutes:
Load times
The 'uptime' command also displays system load information:
Uptime
Procedure
How to determine whether the system load is appropriate or overloaded?

Run the lscpu command to check how many CPU cores are available to the system.

lscpu

CPUs = Cores per socket * Sockets

If the Load Average value per minute corresponds to the total number of CPU cores for the next 5 and 15 minutes that means your system is being overloaded. 

Example:

Time Values
How can you fix it? 
In QRadar, if you are getting CPU Load notifications or Sar Sentinel notifications for CPU usage, it is not possible to renice or simply unfollow the processes that are causing issues. You need to analyze which service is taking the most CPU time. Type the top command and then press the c key to display details about the command for each PID.
Top detailed

Perform the different tuning tasks on the system configuration depending on the scenarios discussed here:

  1. If the ariel process is the one consuming the most CPU, that usually indicates that there are one or more expensive searches running, refer to QRadar performance and what causes slow searches.
  2. If ecs-ec-ingress service is taking the most CPU time, it could be an indicator of one or more poorly configured log sources. In case you know about recent changes in log source and want to monitor them using the command line, refer to QRadar using command line troubleshoot syslog event source.
  3. High CPU load at ecs-ec-ingress can also be caused by environmental changes that impact event collection. If you notice that ecs-ec service is being shown as the top consumer that might indicate a problem in the parsing stage of the event pipeline. High CPU load at ecs-ec is most commonly caused by an expensive custom event property or log source extension. For more details, refer to Identifying DSM and optimized custom property issues.
  4. In some cases, ecs-ep might be consuming the most CPU resources. High CPU load at ecs-ep indicates that correlation is causing the issue. The most common causes of high CPU load at ecs-ep are expensive rules, so you need to look for expensive rules or recent modifications to the rule set. To troubleshoot the correlation engine, refer to QRadar troubleshooting custom rule performance indexpensivecustomrulessh
  5. It is also possible to see postgresql database listed as the top CPU consumer. This could indicate that SQL queries are overloading the system. In many cases, this might be caused by queries associated with offenses or reference sets. To read more about improving performance related to reference sets and TTLs, refer to Overview adding editing deleting reference sets.

The scenarios discussed here point to different tuning tasks that need to be performed on the system configuration depending on high CPU load on specific processes.
When assessing high CPU load on a system, the first step should be to ensure that the EPS load does not exceed the capacity of the allocated hardware. For virtual appliances, refer to System requirements for virtual appliances.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
01 April 2023

UID

ibm16551086