Troubleshooting
Problem
Symptom
Cause
You can confirm whether a real issue exists by comparing this graph with the EPS graph. If that graph does not show a similar dip at the same time, then the observed gap in the Event Processor Distribution graph is due to performance issues at the custom rule engine. That host is causing delays in correlation and writing events to disk.
Gaps in Event Rate EPS Graph
Gaps here are caused by ecs-ec on the Console host.
System Notifications related events are collected in each managed host and sent to the console, which in turn receives them through the ecs-ec-ingress service. It is the case for the StatFilter events, which report the EPS count. So, all these types of events go through the complete pipeline in the Console. If the console has performance degradation at the custom rule engine or device parsing these events might not be processed every minute, resulting in gaps in the EPS graph. The same happens if the console is exceeding the allocated license and dropping events, a common scenario because the console is usually configured with a low license. Therefore, it does not mean we are dropping events in the event processor showing the gap, but it is the console that might not be processing the event that reports the EPS count.
How can we validate whether performance degradation is impacting graph accuracy?
If you encounter this scenario, check the allocated license, and make sure it's enough for burst handling. License management - IBM Documentation
Accumulator errors on Console
The EPS graph takes the information from accumulated data. The accumulator service aggregates against all events seen in the previous 60 seconds and has 60 seconds to process that interval. If the accumulator service crashes due to an out of memory exception, the information from a few minutes might not be available, thus resulting in a gap in the graph.
Another reason accumulator might cause performance gaps is when it falls behind or fails to complete accumulation for all configured global views within 60 seconds. Accumulator is falling behind - IBM Documentation
The search used to accumulate EPS related data can sometimes cause this behavior when the default view is deleted. Inadequate tuning of global views or higher than usual event volume can also cause accumulator to fall behind. For more information about this issue, see IJ31082: 'ACCUMULATOR FALLING BEHIND' NOTIFICATIONS AFTER DEFAULT GLOBAL VIEWS FOR EVENT RATE AND FLOW RATE HAVE BEEN RECREATED
Errors at ingress preventing events from reaching ecs-ec
Though more rarely seen, make sure the "Stream" threads are loaded by ingress. These are the threads that take events from ecs-ec-ingress and pass them over to ecs-ec. If these threads fail to load, it blocks ingress from sending data along to ecs-ec on the same managed host.
To run theadTop.sh
- SSH to the Console.
- From the Console use SSH to log in to the Managed Host having the issue.
- Type the command:
/opt/qradar/support/threadTop.sh -p 7787 -e 'Stream*'
System Time: 4/11/2022 at 12:14:56.108 -------------- ----- ---------- ------------------------------------------ Server ID MSecs Name -------------- ----- ---------- ------------------------------------------ 7787 499 2 StreamProcessorThread 7787 89 0 StreamListenerThread -------------- ----- ---------- ------------------------------------------ 2 Total (2/2000)
Which is related to APAR IJ36277: QRADAR CAN FAIL TO PASS EVENTS FROM ECS-EC-INGRESS COLLECTION PROCCESS TO THE ECS-EC PROCESS
Environment
Resolving The Problem
The workaround for this behavior is:
- SSH to the Console as a root or SUDO user with sufficient privileges.
- From the Console SSH to the Managed Host with issues.
Important: Event Collection might be interrupted while the service restarts. Schedule a maintenance window before doing the next step. - Restart the ecs-ec-ingress by typing the command:
systemctl restart ecs-ec-ingress
Results
If you still fail to see the "Stream" threads in threadTop after restarting ecs-ec-ingress, open a case with IBM QRadar Support for further assistance.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
14 December 2022
UID
ibm16540284