Troubleshooting
Problem
After a failed patch, a file with the name ha_manager_off is left in /etc/ and causes the primary node to be in UNKNOWN status and the secondary node to be OFFLINE.
Cause
Patch upgrade scripts create a hidden file to stop HA manager, but if the upgrade fails during one of the pre-checks, the file is not deleted.
Environment
QRadar appliances with High Availability.
Diagnosing The Problem
High availability pair shows status OFFLINE for the primary and UNKNOWN for the secondary on the GUI.
From the CLI, the HA status script returns:
[root@hostname-primary ha]# /opt/qradar/ha/bin/ha cstate
Local: R:PRIMARY S:UNKNOWN/INIT CS:NONE P:1.0 HBC:DOWN RTT:-1 I:65150 SI: Remote: R:SECONDARY S:OFFLINE/INIT CS:NONE P:0.0 HBC:DOWN RTT:0 I:0 SI:39146
In journalctl -xu ha_manager -e or systemctl status ha_manager, this message is triggered:
ha_manager[137531]: Sat Jun 19 21:18:04 -05 2021 [ha_check.sh] ERROR: HA Manager not starting, ha_manager_off file is present in /etc
Resolving The Problem
- On the secondary:
In order to prevent undesired failovers, SSH to the Secondary appliance, touch the .local_ha_failed file, and restart ha_manager:touch /opt/qradar/ha/.local_ha_failed systemctl restart ha_manager
-
emove the file ha_manager_off and restart ha_manager:On the primary, r
rm -fv /etc/ha_manager_off systemctl restart ha_manager
-
erify the status:Still on the primary, v
[root@hostname-primary ha]# /opt/qradar/ha/bin/ha cstate Local: R:PRIMARY S:ACTIVE/INIT CS:NONE P:1.0 HBC:DOWN RTT:-1 I:65150 SI: Remote: R:SECONDARY S:FAILED/INIT CS:NONE P:0.0 HBC:DOWN RTT:0 I:0 SI:39146
-
Once the primary shows as Active, you can now SSH to the secondary and remove the local_ha_failed and restart ha_manager:
rm -fv /opt/qradar/ha/.local_ha_failed systemctl restart ha_manager
- Verify that the HA pair shows Active and Standby:
[root@hostname-primary ha]# /opt/qradar/ha/bin/ha cstate Local: R:PRIMARY S:ACTIVE/ONLINE CS:NONE P:1:0 HBT:UP RTT:2 1:0 SI:4105589 Remote: R:SECONDARY S:STANDBY/ONLINE CS:NONE P:1.0 HBC:UP RTT:2 I:11753 SI:1382557
Result:
The High Availability pair is showing as Active and Standby. If High Availability (HA) pair still fails, contact QRadar Support for assistance.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtXAAQ","label":"High Availability"},{"code":"a8m0z000000cwtdAAA","label":"Upgrade"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
01 June 2022
UID
ibm16589965