Reinitializing an HADR configuration to resolve errors in Db2
You can reinitialize a Db2 High Availability Disaster Recovery (HADR) configuration to resolve an error condition that prevents the primary and standby databases from connecting and achieving a peer state.
About this task
For various reasons, an HADR configuration can end up in an error state. In these situations, usually one copy of the database (primary or standby) is working correctly while the other copy is corrupted.
For example, after a worker node reboot, the old primary can sometimes fail to re-integrate if the peer window expires and then a subsequent takeover by force is issued by the HADR automation (governor). In such a scenario you will find a log entries in the governor log (/var/log/governor/governor.log) that are similar to the following example on the new primary (old standby):
2020-04-01 18:35:38,832 INFO 8991-47423382027648: child(13084) executing db2 takeover hadr on db BLUDB by force peer window only
2020-04-01 18:35:39,084 INFO 8991-47423382027648: SQL1770N Takeover HADR cannot complete. Reason code = "9".
2020-04-01 18:35:39,085 INFO 8991-47423382027648: we have the mandate to force takeover (window=300)
2020-04-01 18:35:39,086 INFO 8991-47423382027648: Result of DNS resolution of remote endpoint: 10.130.0.39
2020-04-01 18:35:40,096 INFO 8991-47423382027648: child(13151) executing db2 takeover hadr on db BLUDB by force
....
2020-04-01 18:35:54,686 INFO 8991-47423382027648: using cached role(PRIMARY) as of 0.369333982468 seconds ago (threshold 1)
2020-04-01 18:35:54,687 INFO 8991-47423382027648:
db2 role is PRIMARY,
db2 connect status is DISCONNECTED,
db2 state is DISCONNECTED
On the old primary (currently disconnected standby), you will see governor logs inside the Db2 database pod that are similar to the following:
2020-04-01 18:36:01,690 INFO 2668-47200027639168: db2 state is LOCAL_CATCHUP
2020-04-01 18:36:01,690 INFO 2668-47200027639168: startup: waiting on db2 to become peer with primary (waited 20 secs)
2020-04-01 18:36:11,701 INFO 2668-47200027639168: Calling db2pd
2020-04-01 18:36:12,260 INFO 2668-47200027639168: db2pd returned
2020-04-01 18:36:12,262 INFO 2668-47200027639168:
Database BLUDB not activated on database member 0 or this database name cannot be found in the local database directory.
Option -hadr requires -db <database> or -alldbs option and active database.
2020-04-01 18:36:12,262 INFO 2668-47200027639168: db2 state is None
2020-04-01 18:36:12,262 INFO 2668-47200027639168: startup: waiting on db2 to become peer with primary (waited 30 secs)
The old primary never integrates as the new standby after the rebooted host comes back online. In this situation, the only option is to reinitialize the HADR system by using the following procedure.