IBM Support

Duplicated Node entries in crm_mon

Troubleshooting


Problem

Both host servers keep rebooting and there are duplicated Node entries in crm_mon. At this moment we find two entries for each node on crm_mon command. Node: NZ12345_01 (cfb280e1-04e9-4a44-a4a5-00972649be28): online Node: NZ12345_02 (7f9f1489-77d5-4988-9c7a-c4e95cb985bd): online Node: NZ12345_01 (a2e47890-d676-49b5-8f6c-cd80bfdaee38): OFFLINE Node: NZ12345_02 (59285ad4-2df7-48f0-b72a-c43db0568810): OFFLINE

Cause

The issue of duplicated Node entries in crm_mon are caused by removing or corrupting the /var/lib/hearbeat/hb_uuid file, and User should reconfigure heartbeat of system.

Resolving The Problem


1. login as root user, stop heartbeat on both host servers
#service heartbeat stop
#ssh ha2 service heartbeat stop

2. backup /var/lib/heartbeat/crm/cib.xml to /nzscratch folder

3. turn off the heartbeat in init.d
#chkconfig heartbeat off
#ssh ha2 chkconfig heartbeat off

4. run /nzlocal/scripts/heartbeat_config.sh
# ./heartbeat_config.sh
.
WARNING: THIS SCRIPT WILL DESTROY ANY EXISTING HEARTBEAT CONFIGURATION!
If you have modified it in any way, please backup your CIB first!
Are you SURE that you wish to continue? Type YES to proceed: YES
.
Configuration complete on HA2. Continuing on HA1.
5. start the heartbeat on both host servers
#service heartbeat start
#ssh ha2 service heartbeat start

6.check if duplicated node entries are removed.
#com_mon -i5

7. if issue resolve, turn on heartbeat service in init.d
#chkconfig heartbeat on; ssh ha2 chkconfig heartbeat on.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Cluster","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21695413