IBM Support

QRadar: How to monitor and check if the CPU is bound or overloaded

How To


Summary

This article provides instructions on how to monitor and check a QRadar® system's CPU average loads to determine if it is bound or overloaded.

The load average shows you the average tasks and processes that the CPU is handling at any given time. Every system load average is different depending on your deployment, tasks and processes that QRadar® or manage host handles. For example, some averages are busy and others are idle, it depends on the system needs.

Objective

Objective for this article it helps users monitor and checks the CPU load average. You can monitor CPU load averages statistics in the QRadar's CLI. You can compare historical CPU averages over a period of time or with current load averages allowing you to identify trends that might need to be addressed.

System load average should be checked when system performance starts to degrade. For example, you may notice the event pipeline, searches, events processing, or navigating in the UI slow or get hung.
 
Important: If the load average value is higher than the number of CPUs assigned you might be facing a performance issue. For example, the CPU average load of 30 is no big deal for 64 CPU systems, but an average load of 30 for a 16 CPU box might cause performance issues.

Steps


1. To confirm the number of CPU assigned to the QRadar system run the command: cat /proc/cpuinfo | grep "model name" | wc -l
[root@console~]# cat /proc/cpuinfo | grep "model name" | wc -l
56
[root@console~]#
Or you can run lscpu, see CPU(s):
[root@console ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Stepping:              1
CPU MHz:               3199.853
CPU max MHz:           3500.0000
CPU min MHz:           1200.0000
BogoMIPS:              5199.73
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55
2. In order to see our current load average, you can use the command: uptime 
$ uptime
 17:48:24 up  4:11,  1 user,  load average: 19.25, 21.40, 23.46
Load average uses three metrics 1-minute average, 5-minute average, and 15 minutes average.
3. In order to see live information on the load average, you can use the top command.
top - 12:57:55 up 136 days,  3:30,  2 users,  load average: 98.48, 102.69, 109.14
Tasks: 931 total,  69 running, 862 sleeping,   0 stopped,   0 zombie
%Cpu(s): 76.7 us, 21.6 sy,  1.2 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st

In the previous scenario, this box has 16 CPU cores with a load of 98.48 first minute, 102.69 last 5 min average, and 109.14 last 15 min average. For 16 CPU's this is too much load and your system might show performance issues.
4. Checking for historical load average values. You can find the collected performance data under /var/log/sa/
[root@qradar_box]# ls /var/log/sa/sa*
/var/log/sa/sa01  /var/log/sa/sa09  /var/log/sa/sa18  /var/log/sa/sa26   /var/log/sa/sar03  /var/log/sa/sar12  /var/log/sa/sar20  /var/log/sa/sar28
/var/log/sa/sa02  /var/log/sa/sa10  /var/log/sa/sa19  /var/log/sa/sa27   /var/log/sa/sar04  /var/log/sa/sar13  /var/log/sa/sar21  /var/log/sa/sar29
/var/log/sa/sa03  /var/log/sa/sa12  /var/log/sa/sa20  /var/log/sa/sa28   /var/log/sa/sar05  /var/log/sa/sar14  /var/log/sa/sar22  /var/log/sa/sar30
/var/log/sa/sa04  /var/log/sa/sa13  /var/log/sa/sa21  /var/log/sa/sa29   /var/log/sa/sar06  /var/log/sa/sar15  /var/log/sa/sar23  /var/log/sa/sar31
/var/log/sa/sa05  /var/log/sa/sa14  /var/log/sa/sa22  /var/log/sa/sa30   /var/log/sa/sar07  /var/log/sa/sar16  /var/log/sa/sar24
/var/log/sa/sa06  /var/log/sa/sa15  /var/log/sa/sa23  /var/log/sa/sa31   /var/log/sa/sar08  /var/log/sa/sar17  /var/log/sa/sar25
/var/log/sa/sa07  /var/log/sa/sa16  /var/log/sa/sa24  /var/log/sa/sar01  /var/log/sa/sar09  /var/log/sa/sar18  /var/log/sa/sar26
/var/log/sa/sa08  /var/log/sa/sa17  /var/log/sa/sa25  /var/log/sa/sar02  /var/log/sa/sar11  /var/log/sa/sar19  /var/log/sa/sar27
Each file represents a day of the month. For example, sa01 is the first day of the month, sa02 second day of the month, and 30 or 31 the last days. Each output is a full day and each hour reports every 10 min. When the month has finished each day is overridden with newly collected data. Allowing us to see only the last 30 days historically.
 Reading the files:
 
sar -q -f /var/log/sa[day of the month]
For this output, you should focus on metrics:
  • ldavg-1: Load average for the last minute
  • ldavg-5: Load average for the last 5 minutes
  • ldavg-15: Load average for the last 15 minutes
[root@qradar_box~]# sar -q -f /var/log/sa/sa10
Linux 3.10.0-1062.1.1.el7.x86_64 (qradar_box)      08/10/2020      _x86_64_        (16 CPU)

12:00:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
12:10:01 AM         9      2569      1.06      1.71      1.87         0
12:20:01 AM        10      2543      0.79      1.15      1.51         0
12:30:01 AM         5      2543      1.12      1.60      1.68         0
12:40:01 AM        18      2555      1.15      1.09      1.37         0
12:50:01 AM        17      2555      2.06      1.58      1.46         0
01:00:01 AM        11      2556      0.74      1.39      1.53         0
01:10:01 AM         3      2545      1.98      2.56      2.35         0
01:20:01 AM         5      2548      0.97      1.98      2.18         0
01:30:01 AM         8      2542      2.67      2.14      2.05         0
01:40:01 AM         3      2555      0.95      1.66      1.98         0
01:50:01 AM        16      2565      1.74      1.55      1.76         0
02:00:02 AM        22      2571     20.54      8.94      4.49         0
Average:           12      2555      3.05      3.58      3.66         0
Or
sar -q -f /var/log/sa/10 | less
In this log sample, you can compare the load average, this should provide a larger picture of the system's CPU load and performance.

Additional Information

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Version(s)","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
04 September 2020

UID

ibm16257833