Applying rolling updates to a Pacemaker-managed Db2 high availability disaster recovery (HADR) environment

When you use the integrated High Availability (HA) feature with Pacemaker to automate HADR, extra steps are required to update the operating system or the Db2® database system software, upgrade the hardware, or change the database configuration parameters. Follow this procedure to perform a rolling update in a Pacemaker automated HADR environment.

Before you begin

Note: The following update procedures are for a Pacemaker automated HADR environment. If you want to perform rolling updates on an HADR environment that is not automated, see Applying rolling updates to a Db2 high availability disaster recovery (HADR) environment.
Before running the rolling update procedure, ensure that the following prerequisites are met:
  • You have configured HADR for your Pacemaker-managed Linux cluster, either on two hosts or a two-sites multiple standby cluster with same-site failover automation on four hosts in two sites.
  • The instances are running Db2 11.5.4 or later.
  • If you are updating to Db2 11.5.5, you have downloaded the associated Pacemaker stack here. When updating to Db2 11.5.6 and higher, the Pacemaker stack is installed by running the installFixPack command.
  • The HADR pair are in PEER state.

Restrictions

Use this procedure to perform a rolling update on your Db2 database system and update the Db2 database product software to a new fix pack level in a Pacemaker automated HADR environment. For example, applying a fix pack to a Db2 database product software.
  • The Db2 instances must be currently running at Db2 11.5.4 or later.

A rolling update cannot be used to upgrade a Db2 database system from an earlier version to a later version. For example, you cannot use this procedure to upgrade from Db2 Version 10.5 to Db2 11.5. To upgrade a Db2 server in an automated HADR environment, see Upgrading Db2 servers in a TSA automated HADR environment.

You cannot use this procedure to update the Db2 HADR configuration parameters. Updates to the HADR configuration parameters must be made separately. Because HADR requires the parameters on the primary and standby to be the same, both the primary and standby databases might need to be deactivated and updated at the same time.

The following procedure cannot be used to convert an existing Db2 HADR system using Tivoli SA MP (TSA) as a cluster manager to a newer Db2 level using Pacemaker as a cluster manager in a single step. Instead, the existing system should be first updated to a new Db2 level while maintaining TSA as the integrated manager. Once the update is complete, follow the steps outlined in Replacing an existing Tivoli SA MP-managed Db2 instance with a Pacemaker-managed HADR Db2 instance to use Pacemaker as the cluster manager.

The following procedure is only applicable when the existing Db2 HADR cluster is deployed using the Db2-provided Pacemaker cluster software stack. If the cluster to be updated uses Pacemaker provided by other vendors, all cluster resources should be removed following the procedures outlined from the Pacemaker supplier and recreated using the Db2 provided Pacemaker cluster software stack using the db2cm utility. See Configuring high availability with the Db2 cluster manager utility (db2cm).

Procedure

  1. On each standby host, ensure that all databases have their HADR_ROLE set as STANDBY:
    db2pd -hadr -db <database-name>
    • If the databases do not have the right role, run the following command on the primary host, for all databases that do not have the correct role:
      db2 takeover hadr on db <database-name>
  2. On each standby host, deactivate all databases to stop HADR while retaining the role:
    db2 deactivate db <database-name>
  3. On each standby host, stop all Db2 processes:
    db2stop force
  4. As the root user on each standby host, stop all Pacemaker and Coroysnc processes:
    systemctl stop pacemaker
    systemctl stop corosync
    systemctl stop corosync-qdevice
    Note: Only run the systemctl stop corosync-qdevice command if the Qdevice is configured.
  5. Apply the update on each standby host.
    • If not updating to a new Db2 fix pack nor updating to a new major version of the operating system, for example, from Red Hat Enterprise Linux (RHEL) 8 to RHEL 9, you can proceed to step 9 after the change has been applied.
    • If updating to a new major version of the operating system, for example, from RHEL 8 to RHEL 9, run the following command then proceed to step 9:
      db2InstallPCMK -i
    • If updating to a new Db2 fix pack, follow the Installing offline fix pack updates to existing Db2 database products (Linux®® and UNIX) procedure.
    Important: If updating to Db2 11.5.6 or later, step 6 through step 8 are no longer necessary as the installFixPack command takes care of these tasks.
  6. If updating to Db2 11.5.5, on each standby host, install the new Pacemaker and Corosync packages that are provided by IBM®:
    cd /<tarFilePath>/RPMS/
    • For RHEL systems:
      dnf upgrade noarch/*rpm <architecture>/*rpm
    • For SLES systems:
      zypper in --allow-unsigned-rpm noarch/*rpm <architecture>/*rpm
  7. If updating to Db2 11.5.5, as the root user on each standby host, copy the new db2cm utility from /<tarFilePath>/Db2/db2cm to /home/<inst_user>/sqllib/adm:
    cp /<tarFilePath>/Db2/db2cm /home/<inst_user>/sqllib/bin
    chmod 755 /home/<inst_user>/sqllib/bin/db2cm
  8. If updating to Db2 11.5.5, on each standby host, run the following as root to copy the resource agent scripts (db2hadr, db2inst, db2ethmon) from /<tarFilePath>/Db2agents into /usr/lib/ocf/resource.d/heartbeat/:
    /home/<inst_user>/sqllib/bin/db2cm -copy_resources /<tarFilePath>/Db2agents -host <host1>
    /home/<inst_user>/sqllib/bin/db2cm -copy_resources /<tarFilePath>/Db2agents -host <host2>
  9. As the root user on each standby host, start the Pacemaker and Corosync processes:
    systemctl start pacemaker
    systemctl start corosync
    systemctl start corosync-qdevice
    Note: Only run the systemctl start corosync-qdevice command if the Qdevice is configured.
  10. As the root user on each standby host, check the configuration, either manually or by running the crm_verify tool, if available:
    crm_verify -L -V
    Note: This prints any error in the configuration. If there is nothing wrong, nothing is printed.
  11. On each standby host, start all Db2 processes:
    db2start
  12. On each standby host, activate all databases:
    db2 activate db <database-name>
  13. On the principal standby host, run a role switch for all databases:
    db2 takeover hadr on db <database-name>
    • If applying a new Db2 fix pack, after the role switch, the old primary database disconnects because the new primary is running on a higher fix pack level.
  14. On the old primary host, repeat step 2 to step 12 to apply the update on this host.
    Note: Exclude step 8 since this step is redundant if you have already done it the first time through.
    Important: Step 15 to step 19 are only necessary if updating from Db2 11.5.5 to Db2 11.5.6 or later.
  15. Update the migration-threshold meta attribute for each database by deleting the existing attribute, and setting it with the new value.
    Delete the existing attribute:
    crm resource meta <database resource name> delete migration-threshold 
    Then set the new attribute:
    crm resource meta <database resource name> set migration-threshold 1 
    The following example shows the command syntax for updating the migration threshold for an automated database named CORAL:
    crm resource meta db2_db2inst1_db2inst1_CORAL delete migration-threshold 
    crm resource meta db2_db2inst1_db2inst1_CORAL set migration-threshold 1
  16. Update the failure-timeout attribute for each database:
    crm resource meta <database resource name> set failure-timeout 10
    The following example shows the command syntax for updating the failure-timeout attribute for a database named CORAL:
    crm resource meta db2_db2inst1_db2inst1_CORAL set failure-timeout 10
  17. Ensure that the migration-threshold and failure-timeout attributes have been updated for each database:
    crm config show <database resource-clone>
    The following example shows the command syntax for viewing the updated resource configuration for an automated database named CORAL:
    crm config show db2_db2inst1_db2inst1_CORAL-clone 
    ms db2_db2inst1_db2inst1_CORAL-clone db2_db2inst1_db2inst1_CORAL \ 
    
    meta resource-stickiness=5000 migration-threshold=1 ordered=true promotable=true is-managed=true failure-timeout=10
  18. Update the cluster configuration to set symmetric-cluster to true:
    crm configure property symmetric-cluster=true
  19. Update the Corosync configuration to use millisecond timestamps. This can be done while the cluster is online.
    Edit the corosync.conf file:
    crm corosync edit
    Update the timestamp setting under the logging directive to hires instead of on. The final directive should look like the following:
    logging { 
             to_logfile: yes 
    
             logfile: /var/log/cluster/corosync.log 
    
             to_syslog: yes 
    
             timestamp: hires 
    
             function_name: on 
    
             fileline: on 
    } 
    Push the change to the remote host:
    crm corosync push <remote hostname>
    Lastly, refresh Corosync so it uses the new configuration:
    crm corosync reload
  20. On the original primary host for each database, run a failback operation to set the HADR roles back to their original state:
    db2 takeover hadr on db <database-name>
  21. Verify that all the databases are in the PEER state:
    db2pd -hadr -db <database-name>
    Important: Step 22 through step 24 are only necessary if updating to a new Db2 fix pack and the Qdevice is configured.
  22. If updating to new Db2 fix pack and the Qdevice is configured, as root user on the Qdevice host, stop the corosync-qnetd process:
    systemctl stop corosync-qnetd
  23. If updating to new Db2 fix pack and the Qdevice is configured, as root user on the Qdevice host, update the corosync-qnetd package provided by IBM, depending on the version of Db2:
    • For RHEL systems on Db2 11.5.5 and older:
      dnf upgrade /<tarFilePath>/RPMS/<architecture>/corosync-qnetd
    • For SLES systems on Db2 11.5.5 and older:
      zypper in --allow-unsigned-rpm /<tarFilePath>/RPMS/<architecture>/corosync-qnetd
    • For RHEL systems on Db2 11.5.6 and newer:
      dnf upgrade <Db2_image>/db2/<platform>/pcmk/Linux/<OS_distribution>/<architecture>/corosync-qnetd
    • For SLES systems on Db2 11.5.6 and newer:
      zypper in --allow-unsigned-rpm <Db2_image>/db2/<platform>/pcmk/Linux/<OS_distribution>/<architecture>/corosync-qnetd
  24. If updating to new Db2 fix pack and the Qdevice is configured, as root user on the Qdevice host, update the qnetd process:
    systemctl start corosync-qnetd
  25. Confirm that the cluster is in a healthy state:
    crm resource show
    Note: This might take Pacemaker around a minute to complete.

    The following example shows an output from running the crm resource show command:

     db2_db2tea1_eth1 (ocf::heartbeat:db2ethmon):  Started
     db2_kedge1_eth1  (ocf::heartbeat:db2ethmon):  Started
     db2_kedge1_db2inst1_0  (ocf::heartbeat:db2inst):  Started
     db2_db2tea1_db2inst2_0 (ocf::heartbeat:db2inst):  Started
     db2_kedge1_db2inst2_0  (ocf::heartbeat:db2inst):  Started
     Clone Set: db2_db2inst2_db2inst2_CORAL-clone [db2_db2inst2_db2inst2_CORAL] (promotable)
         Masters: [ db2tea1 ]
         Slaves: [ kedge1 ]
     Clone Set: db2_db2inst2_db2inst2_CORAL2-clone [db2_db2inst2_db2inst2_CORAL2] (promotable)
         Masters: [ db2tea1 ]
         Slaves: [ kedge1 ]
     db2_db2tea1_db2inst1_0 (ocf::heartbeat:db2inst):  Started
     Clone Set: db2_db2inst1_db2inst1_CORAL-clone [db2_db2inst1_db2inst1_CORAL] (promotable)
         Masters: [ db2tea1 ]
         Slaves: [ kedge1 ]
     Clone Set: db2_db2inst1_db2inst1_CORAL2-clone [db2_db2inst1_db2inst1_CORAL2] (promotable)
         Masters: [ db2tea1 ]
         Slaves: [ kedge1 ]

    No resources should be in the unmanaged state and all resources should be started on the expected role.