QRadar: Overloaded Hypervisor Causes Instability

Troubleshooting

Problem

QRadar server is receiving events but they are not being processed through the system and receiving real-time clock (rtc) error message "rtc interrupts".

Symptom

The following symptoms might also be present in the environment:

A QRadar server, in Web Console UI > Admin > System and License Management, might display status "Unknown".
This unknown status symptom can occur if the hypervisor isn't giving it enough time on the stack to respond to ping requests. It can also occur if the related errors filled up the /var/log partition faster then log rotation was able to keep up causing the services to be shut down to protect the environment.
On the problem server, services ecs-ec, ecs-ep, ecs-ec-ingress, or hostcontext are not in an active state.
The hypervisor not giving QRadar enough time on the stack causes errors to accumulate in critical services, causing the services to fail.

In an HA environment, the excessive errors might cause the service ha_manager to go offline and unmounted the store partition.
Trying to start the failed service, error stating "cannot access PARTITION Input/output error" might be received:

ecs-ec[PID]: chown: cannot access ‘/store/jheap’: Input/output error
ecs-ec[PID]: chmod: cannot access ‘/store/jheap’: Input/output error
ecs-ec[PID]: mkdir: cannot create directory ‘/store/jheap/ecs-ec.ecs-ec’: Input/output error
ecs-ec[PID]: chmod: cannot access ‘/store/jheap/ecs-ec.ecs-ec’: Input/output error
systemd[1]: ecs-ec.service: control process exited, code=exited status=1
systemd[1]: Failed to start Event Correlation Services Event Collector.

Upon further investigation, this partition is not mounted:

df -h /store

NOTE: It is also possible to not see any of these symptoms and the hypervisor still be overloaded.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"TS006291021","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Tips

QRadar: Overloaded Hypervisor Causes Instability

Troubleshooting

Problem

Symptom

Document Location

Log InLog in to view more of this document

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?