IBM Support

QRadar: Hidden token causes High Availability (HA) pairs to fail

Troubleshooting


Problem

After a failed patch, a file with the name ha_manager_off is left in /etc/ and causes the primary node to be in UNKNOWN status and the secondary node to be OFFLINE.

Cause

Patch upgrade scripts create a hidden file to stop HA manager, but if the upgrade fails during one of the pre-checks, the file is not deleted.

Environment

QRadar appliances with High Availability.

Diagnosing The Problem

High availability pair shows status OFFLINE for the primary and UNKNOWN for the secondary on the GUI.
From the CLI, the HA status script returns:
[root@hostname-primary ha]# /opt/qradar/ha/bin/ha cstate
Local: R:PRIMARY S:UNKNOWN/INIT CS:NONE P:1.0 HBC:DOWN RTT:-1 I:65150 SI: Remote: R:SECONDARY S:OFFLINE/INIT CS:NONE P:0.0 HBC:DOWN RTT:0 I:0 SI:39146
In journalctl -xu ha_manager -e or systemctl status ha_manager, this message is triggered:
ha_manager[137531]: Sat Jun 19 21:18:04 -05 2021 [ha_check.sh] ERROR: HA Manager not starting, ha_manager_off file is present in /etc

Resolving The Problem

  1. On the secondary:
    In order to prevent undesired failovers, SSH to the Secondary appliance, touch the .local_ha_failed file, and restart ha_manager:
    touch /opt/qradar/ha/.local_ha_failed
    systemctl restart ha_manager
  2. On the primary, remove the file ha_manager_off and restart ha_manager:
    rm -fv /etc/ha_manager_off
    systemctl restart ha_manager
  3. Still on the primary, verify the status:
    [root@hostname-primary ha]# /opt/qradar/ha/bin/ha cstate
    Local: R:PRIMARY S:ACTIVE/INIT CS:NONE P:1.0 HBC:DOWN RTT:-1 I:65150 SI:
    Remote: R:SECONDARY S:FAILED/INIT CS:NONE P:0.0 HBC:DOWN RTT:0 I:0 SI:39146
  4. Once the primary shows as Active, you can now SSH to the secondary and remove the local_ha_failed and restart ha_manager:
    rm -fv /opt/qradar/ha/.local_ha_failed
    systemctl restart ha_manager
  5. Verify that the HA pair shows Active and Standby:
    [root@hostname-primary ha]# /opt/qradar/ha/bin/ha cstate
    Local: R:PRIMARY S:ACTIVE/ONLINE CS:NONE P:1:0 HBT:UP RTT:2 1:0 SI:4105589
    Remote: R:SECONDARY S:STANDBY/ONLINE CS:NONE P:1.0 HBC:UP RTT:2 I:11753 SI:1382557

Result:
The High Availability pair is showing as Active and Standby. If High Availability (HA) pair still fails, contact QRadar Support for assistance.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtXAAQ","label":"High Availability"},{"code":"a8m0z000000cwtdAAA","label":"Upgrade"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
01 June 2022

UID

ibm16589965