IBM Support

QRadar: General Health checklist

Question & Answer


Question

How can I verify that my deployment is healthy?

Answer

To assess the general health of your deployment, it is helpful to have standard checks to follow in order to verify core functionality on your QRadar Console and Managed Hosts.

Is the QRadar Console UI accessible?

  • If you cannot access the UI and cannot access the system via SSH:
    • Use a IMM or Hypervisor session to log in to the console to confirm that the QRadar Console host is running and responsive. If you can connect here and the console is running, there might be a network infrastructure problem blocking your access. Contact your network administrator. 
  • If you cannot access the UI but can access the Console system via SSH:
    • Check systemctl status tomcat. If status is active or activating, the service is still initializing. In some cases, you may need to wait for 10 - 15 minutes after starting tomcat for the GUI to become accessible.

Are all Managed Hosts showing the expected Status?

  • If any of the Managed Hosts are not in Active status, check the following
    • From the Console command-line interface (CLI), try establishing an SSH connection to each non-Active Managed Host (MH). If the connection fails or times out:
    • Does a Full Deploy task complete successfully for all Managed Hosts?
      • After the Deploy task completes and returns status for all hosts, review any Managed Hosts that report Timed Out or otherwise fail to deploy
    • Check that all partitions have sufficient free disk space:
      • From the Console CLI, run /opt/qradar/support/all_servers.sh -C 'df -h' to view the current disk space usage on all Managed Hosts' partitions
        • Connect to the MH via SSH as 'root' and type: df -h. If the disk space usage ('Use%') for any partitions (other than /recovery) are above 85% free up space on the relevant partitions and check again. For more information on how to resolve space disk issues, please see our Support 101 page on Troubleshooting disk space issues.

Do all Dashboards populate?

Dashboards largely rely on ariel queries against accumulated data (also referred to as Global Views (GV) or Aggregated Data Views (ADV).
  • If some Dashboards are failing to populate, but others are working, we're likely seeing a problem with the individual Global Views. To troubleshoot corrupted Global Views, please note which Dashboards are affected.
  • If all Dashboards are failing to populate, check the statuses of these services:
    • On all hosts the accumulator service should be Active. From the Console CLI, run
      /opt/qradar/support/all_servers.sh -C 'systemctl is-active accumulator'
      • For any hosts where accumulator is not Active, try manually restarting the service with the command systemctl restart accumulator
    • The Console and all EP, FP, EP/FP, and Data node hosts should also have a running ariel service. On the Console ariel runs as ariel_proxy_server. All other relevant hosts will have ariel as ariel_query_server.
      • On the Console run:
        systemctl is-active ariel_proxy_server
      • To check ariel on the remaining Managed Hosts. Run this command from the Console CLI:
        /opt/qradar/support/all_servers.sh -a '16%,17%,18%,21%' 'systemctl is-active ariel_query_server'
      • If the ariel* service is stopped, try starting the relevant service manually with systemctl restart <service> on the affected systems.
        • If ariel fails on any system, you may not be able to retrieve event or flow data from that Managed Host, accumulated or otherwise.

Are Offenses generating and updating?

  • In the 'Last Event/Flow Seen' for the most recently updated Offenses, is the date/time fairly recent? 
    • Confirm whether ecs-ep is running on all 31xx, 18xx, 17xx, and 16xx hosts:
      • /opt/qradar/support/all_server.sh -C -a '31%,18%,17%,16%,software' "systemctl is-active ecs-ep"
      • If the service state is 'failed', try restarting the service:
        • /opt/qradar/support/all_servers.sh "systemctl restart ecs-ep"
      • If ecs-ep fails to start on any of systems, collect the logs for the Console system and any affected Managed Host.
    • If ecs-ep is running on all Managed Hosts (where expected):
      • Rule out the possibility that the system is working correctly and there are simply no events or flows that are triggering Rules for Offense creation or updates by creating a test rule to test Offense generation:
        • In the QRadar UI, click the Offenses tab, then select Rules.
        • Once the Rules display loads select Actions > New Event Rule.
        • Identify a Source IP (or IP range) in your environment that is consistently generating events
        • Configure the new event rule with this criteria, using the address or range identified above as the value for:
          • Apply Dummy Test Rule on events which are detected by the local system and when the source IP is one of the following IP addresses: Click next.
          • Set the Responses page with the following options:
            1. Check: Ensure the detected event is part of an offense.
            2. Check: Respond no more than 1 per 1 minute per source IP.
            3. Click Finish.
            4. Make sure the Rule is enabled.
            5. In the Log Activity tab, add a filter for the Source IP or range you configured in the Rule, and watch for incoming events.
        • When events show up for this IP or range, you should see that a new Offense is created and the events for the source IP or range are associated with that Offense. Disable the Rule once testing is complete.

Can you search and view events and flows?

  • Do you see new events and flows (if present in your environment) in Log Activity and Network Activity tabs while the View is set to Real Time?
  • Are your Console, Event Processors, and Event and Flow Processors receiving events?
    • In Log Activity, select Quick Searches and choose Event Processor Distribution.
      • Make sure all of your Console, Event Processors, and Event and Flow Processors are represented on the resulting search.
  • Can I run normal searches?

Is your Assets tab populated as expected?

  • Is the Assets page loading and populating as expected?
  • Is the Last Modified Time for the most recently modified asset relatively recent?

Can all users log in to the Console UI?

Additional checks

  • The checks described above should cover most core functionality. You can also check to make sure all expected QRadar processes are running on any QRadar system:
    • Run /opt/qradar/upgrade/util/setup/upgrades/wait_for_start.sh. Wait for at least 3 iterations (in case the system is still initializing) to see if all listed processes show up as Running. If any show as 'stopped' after a few iterations, break out of the utility (using ctrl-c) and note which processes were not running.
  • If using HA, check the state of your HA pairs:
  •  Check Notifications in UI for any messages about Predictive Disk Failure
  • If SSH or HTTPS connectivity is lost to your QRadar Console, local console access might be needed to further troubleshoot system health. As a best practice, make sure that this access is available in case of future problems:
    • If running on IBM hardware, is IMM accessible?
      • Can you connect with the IMM Remote Console?
    • If running as a virtual machine, can you reach local console through your hypervisor?
    • If running on non-IBM hardware, do you have provision for local console access.

   

You've experienced an issue with one of the troubleshooting steps, what should you do?
 

  1. Record all troubleshooting steps from any procedure that failed.
  2. If possible collect logs from the Console and the affected systems. Refer to the link below on how to collect logs.
      Getting logs from a QRadar deployment
  3. Open a case with IBM QRadar Support. Include logs and all information from troubleshooting step in the case.
  4. If your appliances are unavailable or not functional, you can indicate that you have a 'System down' issue.
  5. A QRadar Support representative will contact you using your preferred method of communication

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Component":"Upgrade","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
17 June 2024

UID

ibm10876874