(rtc)error message "
The following symptoms might also be present in the environment:
- A QRadar server, in Web Console UI > Admin > System and License Management, might display status "Unknown".
This unknown status symptom can occur if the hypervisor isn't giving it enough time on the stack to respond to ping requests. It can also occur if the related errors filled up the
/var/logpartition faster then log rotation was able to keep up causing the services to be shut down to protect the environment.
- On the problem server, services ecs-ec, ecs-ep, ecs-ec-ingress, or hostcontext are not in an active state.
The hypervisor not giving QRadar enough time on the stack causes errors to accumulate in critical services, causing the services to fail.
- In an HA environment, the excessive errors might cause the service
ha_managerto go offline and unmounted the store partition.
Trying to start the failed service, error stating "cannot access PARTITION
Input/outputerror" might be received:
ecs-ec[PID]: chown: cannot access ‘/store/jheap’: Input/output error ecs-ec[PID]: chmod: cannot access ‘/store/jheap’: Input/output error ecs-ec[PID]: mkdir: cannot create directory ‘/store/jheap/ecs-ec.ecs-ec’: Input/output error ecs-ec[PID]: chmod: cannot access ‘/store/jheap/ecs-ec.ecs-ec’: Input/output error systemd: ecs-ec.service: control process exited, code=exited status=1 systemd: Failed to start Event Correlation Services Event Collector.
df -h /store
NOTE: It is also possible to not see any of these symptoms and the hypervisor still be overloaded.
rtcinterrupt messages usually mean the hypervisor is overloaded.
Diagnosing The Problem
- Checking the dmesg logs or messages for
grep -i rtc /var/log/messages
blk_update_request: I/O error, dev sdb, sector N XFS (sdbN): metadata I/O error in "xlog_iodone" at daddr N len N error 5 XFS (sdbN): xfs_do_force_shutdown(0x2) called from line N of file fs/xfs/xfs_log.c. Return address = N kernel: hpet1: lost N rtc interrupts
- See whether the CPU last load average is greater than the number of cores on the VM:
lscpu | grep 'CPU(s):' | head -n 1 && uptime
- Check for more than 1G RAM 'available':
- Verify the disk '%iowait' is not high:
- Verify that the system meets recommended specifications for your event or flow load.
- If your system is getting the
rtcinterrupts, the Operating System is not reporting CPU, RAM, or Disk IO issues, and you meet the required system specifications then your issue is related to the hypervisor.
Resolving The Problem
/var/logpartition filled as a result of a slow hypervisor, it can be cleaned up.
Was this topic helpful?
22 August 2022