IBM Support

QRadar: Event Processor not sending logs due to disk space issues

Troubleshooting


Problem

In a distributed environment, an Event Processor (EP) cannot send logs to the Console if the ecs-ep process is down. The EP can disable processes if disk usage grows too high.

Cause

When disk space reaches 95% utilization, QRadar processes are automatically shut down, preventing the system from operating correctly.

Diagnosing The Problem

 Some basic troubleshooting tips
  1. Complete any search, add a filter by Event Processor, and from the View, list select Real Time (streaming).

    If you do not see any streaming events, this can indicate an issue with the ecs-ep process, which manages real-time event streaming from the Event Processor appliance to the Console.
  2. To verify that the ecs-ep process is running from the command line of the QRadar appliance, enter: service ecs-ep status

    If the service reports as ecs-ep is stopped the administrator should attempt to restart the process, using the following command:
    service ecs-ep start
  3. The most frequent cause of processes not running on configured systems in a deployment is due to the disk space issue. If this occurs, a system notification will be generated to alert the administrator to the issue.

    [root@qradar119 ~]# grep -i "disk usage" /var/log/qradar.error
    Nov  4 06:52:11 ::ffff:IP Address [hostcontext.hostcontext] [c0ac7072-70e9-40ea-9d87-62ac50d090c3/SequentialEventDispatcher] com.q1labs.hostcontext.ds.DiskSpaceSentinel: [ERROR] [NOT:0150064100][IP Address/- -] [-/- -]Disk usage on at least one disk has exceeded the maximum threshold level of 0.95. The following disks have exceeded the maximum threshold level: /store, Processes are being shut down to prevent data corruption. To minimize the disruption in service, reduce disk usage on this system.

     
  4. Other ways verify disk space on your appliances.

    To view disk usage:
    In QRadar 7.2.8
    # df -Th
    Filesystem     Type   Size  Used Avail Use% Mounted on
    /dev/sda7      ext4    20G  7.1G   12G  38% /
    tmpfs          tmpfs   12G     0   12G   0% /dev/shm
    /dev/sda1      ext4    97M   39M   53M  43% /boot
    /dev/sda8      xfs     30G   29G  2.0G  95% /store
    /dev/sda6      ext4   9.8G  152M  9.2G   2% /store/tmp
    /dev/sda9      xfs    7.7G  6.3G  1.4G  82% /store/transient
    /dev/sda5      ext4   9.9G  463M  8.9G   5% /var/log


    In QRadar 7.3
    [root@QRadar732 support]# df -h
    Filesystem                        Size  Used Avail Use% Mounted on
    /dev/mapper/rootrhel-root          13G  5.3G  7.3G  43% /
    devtmpfs                           16G     0   16G   0% /dev
    tmpfs                              16G   20K   16G   1% /dev/shm
    tmpfs                              16G   34M   16G   1% /run
    tmpfs                              16G     0   16G   0% /sys/fs/cgroup
    /dev/sda3                          32G  4.1G   28G  13% /recovery
    /dev/sda2                        1014M  163M  852M  17% /boot
    /dev/mapper/rootrhel-opt           13G  2.7G  9.9G  22% /opt
    /dev/mapper/rootrhel-tmp          3.0G   41M  3.0G   2% /tmp
    /dev/mapper/rootrhel-var          5.0G  175M  4.9G   4% /var
    /dev/mapper/rootrhel-home        1014M   33M  982M   4% /home
    /dev/mapper/storerhel-store       142G   33G  109G  24% /store
    /dev/mapper/rootrhel-varlog        15G  387M   15G   3% /var/log
    /dev/mapper/rootrhel-storetmp      15G   43M   15G   1% /storetmp
    /dev/mapper/rootrhel-varlogaudit  3.0G  131M  2.9G   5% /var/log/audit
    /dev/mapper/storerhel-transient    36G   36M   36G   1% /transient
    tmpfs                             3.1G     0  3.1G   0% /run/user/0


    To get the disk usage on all your appliances, use the following command:
    /opt/qradar/support/all_servers.sh -C -k "df -Th"

  5. Lastly, you can run /opt/qradar/support/deployment_info.sh. This script will collect all information about all systems in the deployment including disk space used, hardware, appliance type and serial number within a CSV file.

Resolving The Problem

Reduce storage utilization on the affected mount under 95%. To do this either change the Default Retention bucket setting to a shorter period or move data to an external storage device.

When you free up space on the appliance, services will automatically restart when the system detects that disk usage is below 92%.


Where do you find more information?

 

[{"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Log Activity","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.1;7.2","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
02 May 2019

UID

swg21690477