Troubleshooting
Problem
QRadar server is receiving events but they are not being processed through the system and receiving real-time clock
(rtc)
error message "rtc
interrupts".Symptom
The following symptoms might also be present in the environment:
- A QRadar server, in Web Console UI > Admin > System and License Management, might display status "Unknown".
This unknown status symptom can occur if the hypervisor isn't giving it enough time on the stack to respond to ping requests. It can also occur if the related errors filled up the/var/log
partition faster then log rotation was able to keep up causing the services to be shut down to protect the environment. - On the problem server, services ecs-ec, ecs-ep, ecs-ec-ingress, or hostcontext are not in an active state.
The hypervisor not giving QRadar enough time on the stack causes errors to accumulate in critical services, causing the services to fail. - In an HA environment, the excessive errors might cause the service
ha_manager
to go offline and unmounted the store partition.
Trying to start the failed service, error stating "cannot access PARTITIONInput/output
error" might be received:ecs-ec[PID]: chown: cannot access ‘/store/jheap’: Input/output error ecs-ec[PID]: chmod: cannot access ‘/store/jheap’: Input/output error ecs-ec[PID]: mkdir: cannot create directory ‘/store/jheap/ecs-ec.ecs-ec’: Input/output error ecs-ec[PID]: chmod: cannot access ‘/store/jheap/ecs-ec.ecs-ec’: Input/output error systemd[1]: ecs-ec.service: control process exited, code=exited status=1 systemd[1]: Failed to start Event Correlation Services Event Collector.
df -h /store
NOTE: It is also possible to not see any of these symptoms and the hypervisor still be overloaded.
Cause
RTC interrupt messages are generated when the internal clock missed the hypervisor clock. Many
rtc
interrupt messages usually mean the hypervisor is overloaded.Diagnosing The Problem
- Checking the dmesg logs or messages for
rtc
errors:grep -i rtc /var/log/messages
blk_update_request: I/O error, dev sdb, sector N XFS (sdbN): metadata I/O error in "xlog_iodone" at daddr N len N error 5 XFS (sdbN): xfs_do_force_shutdown(0x2) called from line N of file fs/xfs/xfs_log.c. Return address = N kernel: hpet1: lost N rtc interrupts
- See whether the CPU last load average is greater than the number of cores on the VM:
lscpu | grep 'CPU(s):' | head -n 1 && uptime
- Check for more than 1G RAM 'available':
free -h
- Verify the disk '%iowait' is not high:
iostat -c
- Verify that the system meets recommended specifications for your event or flow load.
- If your system is getting the
rtc
interrupts, the Operating System is not reporting CPU, RAM, or Disk IO issues, and you meet the required system specifications then your issue is related to the hypervisor.
Resolving The Problem
Contact your system hypervisor admin to troubleshoot performance as the hypervisor is over loaded.
NOTE: Most hypervisors have default limits or restrictors on resources so virtual machines (VMs) don't consume all resources and some is left over for hypervisor services and tasks. For example, some have a limit set to 75% usage for CPU, RAM, and disk. Just because it's not at 100% doesn't mean your hypervisor isn't overloaded.
NOTE: If the
/var/log
partition filled as a result of a slow hypervisor, it can be cleaned up.Related Information
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"TS006291021","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
22 August 2022
UID
ibm16607781