Troubleshooting
Problem
QRadar processes might stop processing data due to errors in /var/log/qradar.log "Too many open files"
The purpose of this article is to help the Administrator to identify when the operating system reaches its limit on the number of file descriptors available. This limit can include open files and socket connections.
Symptom
The following symptoms can be seen when the issue occurs:
Similar errors like the next ones are displayed in /var/log/qradar.log containing the name of the service it is related inside it:
ecs-ep[20911]: WARNING: RMI TCP Accept-7799: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=7799] throws
ecs-ep[20911]: java.net.SocketException: Too many open files (Accept failed)
Diagnosing The Problem
The Managed Host service affected stops processing data but the process is still running.
The process status can be confirmed by using the next command. Replace <service affected> with the process reported in the error:
systemctl status <service affected>
For this example, the process reported in the Symptom section of this article is ecs-ep:
systemctl status ecs-ep
To find the number of open files opened at any moment. The administrator can run the following command and confirm if the limit for open files reaches its limit.
- Ssh to the QRadar console.
- Replace the service name with the <service affected> based on the qradar.log event:
echo -n "Sockets: "; lsof -p $(systemctl status <service affected> | grep "Main PID" | awk '{print $3}') |grep -i " sock " |wc -l; echo -n "Files: "; lsof -p $(systemctl status <service affected> | grep "Main PID" | awk '{print $3}') |wc -l
- If the Administrator wants to dig into the currently opened files, they can create the lsof_with_timestamp.txt file that includes the full list of open files opened at a specific time:
lsof -r180m====%T==== -nP -p $(systemctl status <service affected> | grep 'Main PID' |awk '{print $3}') > lsof_with_timestamp.txt
-
A service restart or a "Deploy Full Configuration" resolve the issue temporarily, since it resets the file descriptors being used.
Warning: A Full Deploy might cause some service interruption. To know more about the impact, check the next link:
QRadar: Impact of Deploy Full Configuration on events, flows, and offenses
Resolving The Problem
In case the issue persists and the service keeps stopping periodically. The administrator can increase the limit that QRadar systems come by default, follow the next steps to increase the limit:
Note: Read first all the steps before they are applied on a production environment, if there is any doubt or question contact QRadar Support for assistance.
- Ssh to the QRadar console.
- Confirm current limit with the next command:
grep ULIMIT /store/configservices/staging/globalconfig/nva.conf
ULIMIT_MAX_OPEN_FILES=15360 ULIMIT_MAX_OPEN_FILES_DEFAULT=15360
- Create a backup for the nva.conf file with the next commands, the first commands creates a folder (in case it doesn't exist) where to put the backup file and the second command creates the backup:
mkdir /store/IBM_Support/ cp /store/configservices/staging/globalconfig/nva.conf /store/IBM_Support/
- Open the file /store/configservices/staging/globalconfig/nva.conf with a text editor like vi, search for the line with the string ULIMIT_MAX_OPEN_FILES and change the value of that line.
To avoid having the memory of the service compromised due to high usage, the recommendation is to increase the value in small batches, for example, change the default value from default value of 15360 to 20000. - Do a Full Deploy so all hosts can take the new value.
Result:
The limit for open files in the Managed Host is increased letting the host to process more files and socket connections. If the Administrator continues to experience issues, contact QRadar Support for assistance.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
22 May 2024
UID
ibm16600915