|QRadar 7.4.x||Critical Services Stop||Article|
|/||Yes, at 95%||Technote #0881470|
|/store||Yes, at 95%||Technote #0882066|
|/transient||Yes, at 95%||Technote #0882064|
|/storetmp||Yes, at 95%||Technote #0882068|
|/opt||Yes, at 95%||Technote #0882070|
If any of these partitions exceeds 90% usage, a warning notification is sent to the UI. You can also see a line logged to /var/log/qradar.log, such as the one seen below:
Apr 14 18:10:31 ::ffff:18.104.22.168 [hostcontext.hostcontext] [cb4eb5ec-2cae-4075-ab9b-48d2e63dafd5/SequentialEventDispatcher] com.q1labs.hostcontext.ds.DiskSpaceSentinel: [WARN] [NOT:0150064102][22.214.171.124/- -] [-/- -]System disk resources above warning threshold
Important: For the partitions listed in the table as critical for system functionality, system services will be stopped to avoid the partition becoming completely full and possibly causing further issues. A maximum threshold notification is sent to the UI and and can also be seen in qradar.log, as referenced below:
Apr 14 18:15:31 ::ffff:126.96.36.199 [hostcontext.hostcontext] [cb4eb5ec-2cae-4075-ab9b-48d2e63dafd5/SequentialEventDispatcher] com.q1labs.hostcontext.ds.DiskSpaceSentinel: [ERROR] [NOT:0150064100][188.8.131.52/- -] [-/- -]Disk usage on at least one disk has exceeded the maximum threshold level of 0.95. The following disks have exceeded the maximum threshold level: /transient, . Processes are being shut down to prevent data corruption. To minimize the disruption in service, reduce disk usage on this system.
While the other partitions denoted as non critical, the disk sentry check will give a warning when the threshold is met, and system processes will not stop and cause an outage.
For reference, when the system recovers back below the threshold, a notification is sent to the UI and the following message is seen in qradar.log:
Apr 14 18:18:31 ::ffff:184.108.40.206 [hostcontext.hostcontext] [cb4eb5ec-2cae-4075-ab9b-48d2e63dafd5/SequentialEventDispatcher] com.q1labs.hostcontext.ds.DiskSpaceSentinel: [INFO] [NOT:0150066100][220.127.116.11/- -] [-/- -]System disk resources back to normal levels
Diagnosing The Problem
The first step in diagnosing the problem is determining which partition has the problem. Using the df -h command, you can get the output of the partitions. An example output is seen below:
Filesystem Size Used Avail Use% Mounted on /dev/mapper/rootrhel-root 13G 2.9G 9.7G 23% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 20K 16G 1% /dev/shm tmpfs 16G 1.7G 15G 11% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/rootrhel-var 5.0G 208M 4.8G 5% /var /dev/sda3 32G 4.1G 28G 13% /recovery /dev/mapper/rootrhel-home 1014M 33M 982M 4% /home /dev/sda2 1014M 224M 791M 23% /boot /dev/mapper/rootrhel-tmp 3.0G 53M 3.0G 2% /tmp /dev/mapper/rootrhel-opt 13G 5.1G 7.5G 41% /opt /dev/mapper/rootrhel-storetmp 15G 34M 15G 1% /storetmp /dev/mapper/rootrhel-varlog 15G 3.6G 12G 24% /var/log /dev/mapper/storerhel-transient 40G 40G 236M 100% /transient /dev/mapper/rootrhel-varlogaudit 3.0G 205M 2.8G 7% /var/log/audit tmpfs 3.2G 0 3.2G 0% /run/user/0 /dev/drbd0 158G 78G 80G 50% /store
From here, you can see that the /transient partition is the one with the issue. Now that you have identified the partition having the issue, go to the Resolving The Problem section to find details about finding large files/directories on the partition. Also, be sure to review the linked article for your partition issue in the Cause section.
Resolving The Problem
General troubleshooting for large files or directories:
Generally speaking, there are a couple of reasons you may have high disk usage on your QRadar partition(s).
Large file(s) on the partition causing it to fill
Lots of smaller files build up over time and cause a certain directory on the partition to grow excessively
For the first situation, using the find command can help with this. Run find /partition -xdev -type f -size +200M | xargs ls -lhSr to get an output of all the files over 200MB on a specific partition. An example output can be seen below:
# find /transient -xdev -type f -size +200M | xargs ls -lhSr -rw-r--r-- 1 root root 39G Apr 14 19:25 /transient/bigfile.img
Note: You may need to modify the size threshold to a higher or lower value based on your output, but 200M is generally a good starting point.
For the second situation, you can utilize the du command to get recursive directory sizes for a specific partition or directory. Run:
du -xch /partition | sort -h
du -chaxd1 | sort -h
This will return with a recursive directory output for the /partition/directory you listed, sorted by the smallest to the largest.
You can use this output to identify which directory is consuming the most disk space on the partition, and then you can look into that directory to see which file(s) are there consuming the space.
For more information on finding large files consuming disk space in QRadar, see Technote 1988496 - QRadar: Finding files that use the most disk space.
12 June 2021