IBM Support

QRadar: Troubleshooting disk usage issues on NFS backup directories

Troubleshooting


Problem

How do I troubleshoot a QRadar host when an NFS mount for /store/backup reports incorrect disk usage?

Symptom

On a QRadar host that has an NFS mount it is observed in the CLI that the df -h /store/backup command output shows more disk usage than what is found in the mounted directory. For example, a system with an NFS volume shows a higher value under the "Used" column in the df -h /store/backup command output than what is reported by the du -hs /store/backup command.

Resolving The Problem

It is recommended to go through the following procedures in order until the issue is resolved:

File descriptors are still being held open for deleted files in /store/backup

  1. First log in to the CLI of the QRadar host. Run this command to determine whether any processes are holding file descriptors for deleted files in /store/backup:
    lsof /store/backup | egrep '(PID|deleted)'
    Example output with a deleted file that still has an open file descriptor:
    image-20220624143155-1
    Any deleted files are marked under the "NAME" column at the end of the file name. The PID (process ID) of the service holding the file descriptor is also seen. For more details on the PID, use this command syntax:
    ps -ef --forest | grep <PID>
  2. The service can be stopped, restarted, or you can kill the process to release the file descriptor being held. Use this command syntax to stop the process:
    kill -SIGTERM <PID>
    Any disk space used by the deleted file is released, made available, and counted as unused space. Confirm by running the df -h /store/backup command.
    Note: Be sure you understand what process you are killing and verify it is not a core process. If you do kill a core process, restart the process to recover.

Underlying data counts toward disk usage in the local /store/backup directory

  1. Use this command to unmount the backup volume (ensure you're not currently in that directory and that no files from that volume are in use otherwise unmounting fails):
    
    umount /store/backup
  2. Once it is unmounted, check the underlying local /store/backup directory. Run this command to determine whether there are any files there:
    du -sch .[!.]* /store/backup/* | sort -h
    The output from this command shows all files, including hidden files. If there are any files in this directory copy them to another directory, or delete them if you confirm they are not needed.
  3. Remount the NFS volume:
    mount /store/backup
  4. Check disk usage to ensure they match:
    du -hs /store/backup
    df -h /store/backup

Higher disk usage values come directly from the NFS server

Note: This procedure requires use of Wireshark or a similar application to review a .pcap file. These steps also require packets from the NFS server to be unencrypted.
 
  1. Initiate a packet capture from the command line of the QRadar host. This logs NFS packets from the NFS host to a .pcap file. Use this syntax:
    tcpdump -s262144 -i any host <NFS Host IP> -c20 -w /storetmp/nfs_test.pcap
    This captures 20 packets, which are expected to be enough to capture disk usage details from the NFS server. Adjust the "-c<packet number>" parameter as needed.
  2. Once the tcpdump command completes, the resulting file is here: /storetmp/nfs_test.pcap
    Use an SCP/SFTP client to pull this file from the QRadar host to your local computer.
  3. Use Wireshark™ or a similar application capable of reading .pcap files to review the nfs_test.pcap file.
  4. Find a packet from the NFS server that includes a GETATTR reply. The IP of the NFS server is seen under the "Source" column. That packet has the attributes of the actual disk volume as communicated by the NFS server. Select the packet and expand the "Network File System" section. Then, expand the following: Operations > Opcode: GETATTR > Attr mask
    The "Attr mask" branch shows the disk usage and total as reported directly by the NFS server
    Example (Wireshark):
    image-20220621133040-2
  5. The "space_avail", "space_free", and "space_total" values show the totals reported by the NFS server. These values are in bytes. Compare with the command output from df -h /store/backup to see usage under "Size" and "Avail" on the QRadar host. This output displays usage in gibibytes. In the example, this volume reports a rounded up total of 13GiB and 9.5GiB available.
  6. If the NFS is reporting higher disk usage than what the du -hs /store/backup output shows on the QRadar system, the issue is external. The issue is with the NFS server, or another host that uses the same mount is holding file descriptors that cannot be seen in the QRadar system.

Related Information

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtNAAQ","label":"Deployment"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
31 August 2022

UID

ibm16595749