Restarting Db2 Warehouse HADR on the primary or principal standby databases after failover to an auxiliary standby database

With Governor running on the designated primary and principal standby databases, HADR is automatically restarted after a failover to another database. But in a scenario where the database unexpectedly becomes unavailable (for example, a site outage), the old and new primaries' log streams might diverge, resulting in HADR failing to start.

About this task

You would see this message in the db2 diaglog (located in the Db2u pod in ${DIAGPATH}/NODE000):

MESSAGE : ADM12500E  The HADR standby database cannot be made consistent with
          the primary database. The log stream of the standby database is
          incompatible with that of the primary database. To use this database
          as a standby, it must be recreated from a backup image or split
          mirror of the primary database.

If this scenario occurs, you should make an online backup from the current primary database and restore it to the standby database that is failing to start. You can do this by manually taking a backup from the current primary and re-running the setup_config_hadr script with --db-role standby.

Procedure

  1. Stop HADR on the database that cannot reintegrate:
    oc exec -it c-db2-primary-db2u-0 -- manage_hadr -stop
  2. Determine the database that is the current primary by using the manage_hadr tool with -status option.

    In the following example, db2wh-aux is the current primary database, after a forced takeover. Note that HADR_ROLE = PRIMARY.

    oc exec -it c-db2wh-aux-db2u-0 -- manage_hadr -status
    
    # Output:
    #######################################################################
    ###             Db2 Warehouse high availability and                 ###
    ###             disaster recovery (HADR) management                 ###
    #######################################################################
    
    
    Running HADR action -status on the database BLUDB ...
    ################################################################################
    ###                       The HADR status summary                            ###
    ################################################################################
    Database Member 0 -- Database BLUDB -- Active -- Up 0 days 00:00:39 -- Date 2021-05-28-03.47.31.838856
    
    ####### Primary - Standby 1 ######
                                HADR_ROLE = PRIMARY
    
  3. Exec into the current primary database Db2 Warehouse pod and switch to the database instance owner:
    oc exec -it c-db2wh-aux-db2u-0
    su - db2inst1
  4. Initiate an online backup of the database to the backup location (${BACKUPDIR} (/mnt/backup):
    db2 backup db BLUDB online to ${BACKUPDIR}
  5. Copy the keystore into the backup location:
    tar -cjvf ${BACKUPDIR}/keystore.tar -C ${KEYSTORELOC} .
  6. Update permissions on the backup directory so the Db2 Warehouse instance owner/group has read-write access:
    sudo chmod 755 -R  /mnt/backup
  7. Copy the Db2 Warehouse backup file from the current primary database to the standby database:
    ## Copy from current primary database to a directory on the host called /tmp/hadr
    oc rsync c-db2wh-aux-db2u-0:/mnt/backup/ /tmp/hadr
    
  8. Run the setup_config_hadr script again to restore the database.
    • Use standby for the --db-role to ensure that the database is reconfigured as a standby.
    • If the database that is being reinitialized is the former primary database, use the designated principal standby as primary, and use the designated primary as standby, leaving the auxiliary standby databases as auxiliaries:
      oc exec -it c-db2wh-primary-db2u-0 -- setup_config_hadr --db-role standby --primary-name db2wh-standby --standby-name db2wh-primary --primary-port 31384 --standby-port 32457 --aux1-name db2wh-aux --aux1-port 32649 --etcd-host my-etcd-client.my-etcd --etcd-port 2379 –multicluster
    • If the database that is being reinitialized is the former principal standby database, use the same parameters as used in the original setup:
      oc exec -it c-db2wh-standby-db2u-0 -- setup_config_hadr --db-role standby --primary-name db2wh-primary --standby-name db2wh-standby --primary-port 32457 --standby-port 31384 --aux1-name db2wh-aux --aux1-port 32649 --etcd-host my-etcd-client.my-etcd --etcd-port 2379 --multicluster
  9. Exec into the Db2 Warehouse pod again, and as the Db2 Warehouse instance owner, check the HADR configuration by setting HADR_LOCAL_SVC.
    db2 get db cfg for bludb | grep -i hadr_local_svc
    
    # Output:
    HADR local service name           (HADR_LOCAL_SVC) = 60007|31384
  10. Verify that the first port number is correct for its designated original role:
    • Primary: 60006
    • Standby: 60007
    • Aux1: 60008
    • Aux2: 60009

    If incorrect, edit the HADR configuration setting HADR_LOCAL_SVC so that it uses the correct port. Only update the first port number, and use the existing value for the second port number:

    db2 “update db cfg for BLUDB using HADR_LOCAL_SVC 60006|31384”
  11. Exit the pod, and start HADR on the database as a standby:
    oc exec -it c-db2wh-primary-db2u-0 -- manage_hadr -start_as standby