IBM Support

db2haicu hang at 'Adding HADR database ... to the domain' in primary database with DB2 V11.1 and RHEL 7.2-7.4

Technical Blog Post


Abstract

db2haicu hang at 'Adding HADR database ... to the domain' in primary database with DB2 V11.1 and RHEL 7.2-7.4

Body

Symptom
After db2haicu in standby database without any issue, if db2haicu hangs at 'Adding HADR database ... to the domain' in primary database, you might need to check if it's hitting RSCT APAR IJ00283 by checking db2diag.log and /var/log/messages like as follows.
 
1. db2haicu output in primary DB
==================================================================================================
...
Do you want to validate and automate HADR failover for the HADR database 'HADRDB'? [1]
1. Yes
2. No
1
Adding HADR database 'HADRDB' to the domain ...
Cluster node '172.16.xxx.xxx' was not found in the domain. Please re-enter the host name.
norco2
Cluster node '172.16.xxx.yyy' was not found in the domain. Please re-enter the host name.
norco1
Adding HADR database 'HADRDB' to the domain ...
(Hang)
==================================================================================================
 
2. db2diag.log in primary DB
==================================================================================================
2019-03-28-22.20.15.937285-420 I9590E438             LEVEL: Warning
PID     : 3557                 TID : 139905251481472 PROC : db2havend (db2ha)
INSTANCE: v111hadr             NODE : 000
HOSTNAME: testhost.com
FUNCTION: DB2 UDB, high avail services, db2haAddResource, probe:12346
DATA #1 : <preformatted>
Error adding resource db2_v111hadr_v111hadr_HADRDB-rs to group db2_v111hadr_v111hadr_HADRDB-rg, resource handle is NOT valid
==================================================================================================
 
3. /var/log/messages in primary DB
==================================================================================================
Mar 28 22:18:54 testhost hatsd[2699]: hadms: Loading watchdog softdog, timeout = 8000 ms.
Mar 28 22:18:54 testhost hatsd[2699]: hadms: Cannot find kernel module.
Mar 28 22:19:05 testhost hatsd[2699]: hadms: Loading watchdog softdog, timeout = 8000 ms.
Mar 28 22:19:05 testhost hatsd[2699]: hadms: Cannot find kernel module.
...
==================================================================================================
 
 
Cause
This happens because the previous RSCT checks uncompressed watchdog module(softdog.ko) which is changed to compressed watchdog module(softdog.ko.xz) in RHEL 7.2-7.4. For more detail, please check the following link.
 
 
Resolution
To fix this issue, please install the efix for RSCT APAR IJ00283.
(To check your RSCT version, please use '/usr/sbin/rsct/install/bin/ctversion -b')
 
If 'db2haicu -delete' is not working for wiping out the incomplete TSA env, please check the following.
 
Even after installing the efix, if you still see "hadms: Cannot find kernel module." in /var/log/messages, please open a ticket to RSCT support team for this.

By the way, if you are willing, please leave your result with your env information in the comment section like as follows for other users.

 
The efix is working
ENV: RHEL 7.2 / "DB2 v11.1.3.3", "s1804271300", "DYN1804271300AMD64" / TSA 4.1.0.3 / RSCT 3.2.1.2

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11140076