Troubleshooting
Problem
In IBM QRadar, a Windows log source might have status ERROR with message:
- "Too many open files"
- "Connection error"
- "File not found"
- "Login failed"
In addition, ecs-ec-ingress service can have status restarting, failed, or running with time stamps from hours or days ago.
Symptom
- In some cases,
/var/log
partition fills because the SMB errors write to qradar.log and qradar.error faster than log rotation can cycle the files. In this case, instead of getting an out of memory, services are shut down by hostcontext to protect the system by design.
- Service ecs-ec-ingress is producing an out of memory error. It might take time to process the dump file and reload the
rpms
on a service restart. As a result, events might not process for sometime during the service restart.
NOTE: The service ecs-ec-ingress being hung due to SMB, can be confirmed by looking forsmb
in the out of memory errors:grep -i -e ingress -e OutOfMemoryError -e smb /var/log/qradar.error | less +G
[ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{HOSTNAME}][smb://X.X.X.X/{FOLDER_PATH}]] java.lang.OutOfMemoryError: Java heap space
Cause
Too Many Open Files or Out of Memory errors results from poorly configured remote polling log sources.
Many Log Sources in error status for Windows and SMB protocols, might need to be disabled until they are resolved to reduce the impact on the system. Remote polling, such as SMB, need to be configured to 900 seconds not the default 10 seconds. This increased time allows the system to process retry attempts, so when there is a connectivity issue it isn't forking off many threads while retry attempts are being processed by an existing thread.
Diagnosing The Problem
Before the out of memory or
/var/log
partition filling, several SMB log sources are producing errors:
grep -i smb /var/log/qradar.log | less +G
[ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{HOSTNAME}][smb://{FOLDER_PATH}]] java.lang.OutOfMemoryError: Java heap space
[ecs-ec-ingress.ecs-ec-ingress] [ReceiveThread] com.q1labs.frameworks.core.ThreadExceptionHandler: [INFO] [NOT:0000006000][X.X.X.X/- -] [-/- -]38,Finalizer thread in Native Code, WAITING, blocked-count: 162, blocked-time: N ms, wait-count: N, wait-time: N ms, user cpu: N nanos, sys/user cpu time: N nanos, Folder Monitor [{HOSTNAME}][smb://{FOLDER_PATH}] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@35f5122c
[ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{HOSTNAME}][smb://{FOLDER_PATH}]] com.q1labs.semsources.sources.windowsiis.WindowsIisTailProvider: [ERROR] [NOT:0000003000][X.X.X.X/- -] [-/- -]TailingException: null
[ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{HOSTNAME}][smb://{FOLDER_PATH}]] com.q1labs.semsources.sources.smbtail.foldermonitor.IFilesystemObject$FilesystemObjectException
Resolving The Problem
- Although it might all be Windows and SMB Log Sources in an error state, you can also compile a list of the erroneous log sources:
grep -i smb /var/log/qradar.error | less +G [ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{LOG_SOURCE_IDENTIFIER|IP|HOSTNAME}][smb://X.X.X.X/{FILE_PATH}]] Caused by: com.q1labs.semsources.sources.smbtail.io.jnq.JNQException: Unable to create/open - {FILE_PATH} (0xC0000043) [ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{LOG_SOURCE_IDENTIFIER|IP|HOSTNAME}][smb://X.X.X.X/{FILE_PATH}]] com.q1labs.semsources.sources.smbtail.io.SmbFileWithRetries: [ERROR] [NOT:0000003000][X.X.X.X/- -] [-/- -][smb://X.X.X.X/LogFiles/W3SVC13] exists(): Failed: Access error for file W3SVC13 status = -1073741790 (0xc0000022) (0xC0000022) [ecs-ec-ingress.ecs-ec-ingress] [Folder Monitor [{LOG_SOURCE_IDENTIFIER|IP|HOSTNAME}][smb://X.X.X.X/{FILE_PATH}]] com.q1labs.semsources.sources.windowsdhcp.WindowsDHCPTailProvider: [ERROR] [NOT:0000003000][X.X.X.X/- -] [-/- -]TailingException: Unable to create/open - j50.log status = -1073741757 (0xc0000043) (0xC0000043)
LOG_SOURCE_IDENTIFIER, IP, or HOSTNAME
would indicate the log source needing increased polling interval to process the retry attempts, or to be disabled. - Disable SMB Log Sources that are in error state that are not necessary.
- Increased polling to 900 seconds (15 minutes), or higher, for any throwing an error that needs to be left enabled.
- If ecs-ec-ingress is hung, restart ingress and hostcontext with the explicit command so other monitored services are not affected.
WARNING: Restarting ecs-ec-ingress temporarily stops event collection. Administrators with strict outage policies are advised to complete the next step during a scheduled maintenance window for their organization./opt/qradar/init/hostcontext -q restart systemctl restart ecs-ec-ingress
- If
/var/log
partition was full, resolve log roation issues. - After about 2 minutes, Hostcontext will start any services it stopped.
- Any log sources disabled, or left in error status, needs to be investigated by your Server Admin. Common causes of SMB in error status are folder or file no longer exist, permission changed, decommissioned, or the server is intermittently offline.
Related Information
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwt0AAA","label":"Log Source"}],"ARM Case Number":"TS004608033","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.4.3;7.5.0"}]
Was this topic helpful?
Document Information
Modified date:
03 May 2023
UID
ibm16589593