Applying rolling updates to a Pacemaker-managed Db2 high availability disaster recovery (HADR) environment

When you use the integrated High Availability (HA) feature with Pacemaker to automate HADR, extra steps are required to update the operating system or the Db2® database system software, upgrade the hardware, or change the database configuration parameters. Follow this procedure to perform a rolling update in a Pacemaker automated HADR environment.

Before you begin

Note: The following update procedures are for a Pacemaker automated HADR environment. If you want to perform rolling updates on an HADR environment that is not automated, see Applying rolling updates to a Db2 high availability disaster recovery (HADR) environment.
You must have the following prerequisites ready to perform the steps that are described in the procedures sections:
  • Two Db2 instances
  • Two Db2 servers
  • The instances are originally running at version 12.1 Mod Pack 4 or later
  • The instances are configured with Pacemaker controlling HADR failover
  • If you are updating to version 11.5.5, download the associated Pacemaker stack here. When updating to version 11.5.6 and higher, the Pacemaker stack will be installed by the installFixPack command.
Note: All Db2 fix pack updates, hardware upgrades, and software upgrades must be implemented in a test environment prior to applying them to your production system.

The HADR pair must be in PEER state prior to starting the rolling update.

Restrictions

Use this procedure to perform a rolling update on your Db2 database system and update the Db2 database product software to a new fix pack level in a Pacemaker automated HADR environment. For example, applying a fix pack to a Db2 database product software.
  • The Db2 instances must be currently running at version 12.1 Mod Pack 4 or later.

A rolling update cannot be used to upgrade a Db2 database system from an earlier version to a later version. For example, you cannot use this procedure to upgrade from Db2 Version 10.5 to Db2 version 12.1. To upgrade a Db2 server in an automated HADR environment, see Upgrading Db2 servers in a TSA automated HADR environment.

You cannot use this procedure to update the Db2 HADR configuration parameters. Updates to the HADR configuration parameters must be made separately. Because HADR requires the parameters on the primary and standby to be the same, both the primary and standby databases might need to be deactivated and updated at the same time.

The following procedure cannot be used to convert an existing Db2 HADR system using Tivoli® SA MP (TSA) as a cluster manager to a newer Db2 level using Pacemaker as a cluster manager in a single step. Instead, the existing system should be first updated to a new Db2 level while maintaining TSA as the integrated manager. Once the update is complete, follow the steps outlined in Replacing an existing Tivoli SA MP-managed Db2 instance with a Pacemaker-managed HADR Db2 instance to use Pacemaker as the cluster manager.

The following procedure is only applicable when the existing Db2 HADR cluster is deployed using the Db2-provided Pacemaker cluster software stack. If the cluster to be updated uses Pacemaker provided by other vendors, all cluster resources should be removed following the procedures outlined from the Pacemaker supplier and recreated using the Db2 provided Pacemaker cluster software stack using the db2cm utility. See Configuring high availability with the Db2 cluster manager utility (db2cm).

Procedure

  1. Make sure all databases have their HADR_ROLE set as STANDBY in the standby node using the following command:
    db2pd -hadr -db <database-name>
    • If the databases do not have the right role, run the following command for all databases that do not have the correct role, from the primary node:
      db2 takeover hadr on db <database-name>
  2. Deactivate all databases for all instances on the standby node. Use the following command to stop HADR, but retain its the role:
    db2 deactivate db <database-name>
  3. Stop all Db2 processes for all instances on the standby node using the following command:
    db2stop force
  4. As root user on the standby node, stop all Pacemaker and Coroysnc processes using the following commands:
    systemctl stop pacemaker
    systemctl stop corosync
    systemctl stop corosync-qdevice
    Note: Only run the systemctl stop corosync-qdevice command if the Qdevice is configured.
  5. Apply any fixpacks needed. For information on how to do that, refer to Installing offline fix pack updates to existing Db2 database products (Linux®® and UNIX).
    Important: Starting in version 11.5.6, step 6 through step 8 are no longer necessary as the installFixPack command takes care of these tasks.
  6. As root user on the standby node, install the new Pacemaker and Corosync packages, provided by IBM®, using the following commands:
    cd /<tarFilePath>/RPMS/
    • For RHEL systems:
      dnf upgrade noarch/*rpm <architecture>/*rpm
    • For SLES systems:
      zypper in --allow-unsigned-rpm noarch/*rpm <architecture>/*rpm
  7. As the root user, copy the new db2cm utility from /<tarFilePath>/Db2/db2cm to /home/<inst_user>/sqllib/adm using the following commands:
    cp /<tarFilePath>/Db2/db2cm /home/<inst_user>/sqllib/bin
    chmod 755 /home/<inst_user>/sqllib/bin/db2cm
  8. As the root user on the standby node, copy the resource agent scripts (db2hadr, db2inst, db2ethmon) from /<tarFilePath>/Db2agents into /usr/lib/ocf/resource.d/heartbeat/ for both hosts using the following commands:
    /home/<inst_user>/sqllib/bin/db2cm -copy_resources /<tarFilePath>/Db2agents -host <host1>
    /home/<inst_user>/sqllib/bin/db2cm -copy_resources /<tarFilePath>/Db2agents -host <host2>
  9. As the root user on the standby node, start the Pacemaker and Corosync processes using the following commands:
    systemctl start pacemaker
    systemctl start corosync
    systemctl start corosync-qdevice
    Note: Only run the systemctl start corosync-qdevice command if the Qdevice is configured.
  10. As the root user on the standby node, check the configuration manually or by using the crm_verify tool (if available):
    crm_verify -L -V
    Note: This will print any error in the configuration. If there is nothing wrong, nothing will be printed.
  11. Start all Db2 processes on all instances on the standby node using the following command:
    db2start
  12. Activate all databases on all instances on the standby node. Use the following command to resume HADR, but retain the role:
    db2 activate db <database-name>
  13. Perform a role switch.
    • On the standby node, issue the following command for all databases:
      db2 takeover hadr on db <database-name>
    • The old primary node disconnects because the new primary node is on a higher fix pack level.
  14. On the old primary node (now the standby node) repeat step 2 to step 12 to apply the fixpack and upgrade Pacemaker.
    Note: Exclude step 8 since this step is redundant if you have already done it the first time through.
    Important: Step 15 to step 19 are only necessary if updating from version 11.5.5 to version 11.5.6 or later.
  15. Update the migration-threshold meta attribute for each database by deleting the existing attribute, and setting it with the new value.
    Delete the existing attribute by running:
    crm resource meta <database resource name> delete migration-threshold 
    Then set the new attribute by running:
    crm resource meta <database resource name> set migration-threshold 1 
    An example for an automated database named CORAL, the following command would be run to update the migration threshold:
    crm resource meta db2_db2inst1_db2inst1_CORAL delete migration-threshold 
    crm resource meta db2_db2inst1_db2inst1_CORAL set migration-threshold 1
  16. Update the failure-timeout attribute for each database by running the following command:
    crm resource meta <database resource name> set failure-timeout 10
    An example for an automated database named CORAL, the following command would be run to update the failure-timeout:
    crm resource meta db2_db2inst1_db2inst1_CORAL set failure-timeout 10
  17. Ensure that the migration-threshold and failure-timeout attributes have been updated for each database by running the following command:
    crm config show <database resource-clone>
    An example for an automated database named CORAL, the following commands would be run to see the updated resource configuration:
    crm config show db2_db2inst1_db2inst1_CORAL-clone 
    ms db2_db2inst1_db2inst1_CORAL-clone db2_db2inst1_db2inst1_CORAL \ 
    
    meta resource-stickiness=5000 migration-threshold=1 ordered=true promotable=true is-managed=true failure-timeout=10
  18. Update the cluster configuration to set symmetric-cluster to true by running the following command:
    crm configure property symmetric-cluster=true
  19. Update the Corosync configuration to use millisecond timestamps. This can be done while the cluster is online.
    Edit the corosync.conf file by running:
    crm corosync edit
    Update the timestamp setting under the logging directive to hires instead of on. The final directive should look like the following:
    logging { 
             to_logfile: yes 
    
             logfile: /var/log/cluster/corosync.log 
    
             to_syslog: yes 
    
             timestamp: hires 
    
             function_name: on 
    
             fileline: on 
    } 
    Push the change to the remote host by running:
    crm corosync push <remote hostname>
    Lastly, refresh Corosync so it uses the new configuration by running:
    crm corosync reload
  20. Perform a failback to set the HADR roles back to their original state.
    • On the new standby (old primary) node, issue the following command for all databases:
      db2 takeover hadr on db <database-name>
  21. Verify that all the databases are in the PEER state via using the following command:
    db2pd -hadr -db <database-name>
  22. If the Qdevice is configured, stop the corosync-qnetd process on the Qdevice host as the root user using the following command:
    systemctl stop corosync-qnetd
  23. If the Qdevice is configured, update the corosync-qnetd package provided by IBM on the Qdevice host as the root user using the following command, depending on the version of Db2:
    • For RHEL systems on version 11.5.5 and older:
      dnf upgrade /<tarFilePath>/RPMS/<architecture>/corosync-qnetd*
    • For SLES systems on version 11.5.5 and older:
      zypper in --allow-unsigned-rpm /<tarFilePath>/RPMS/<architecture>/corosync-qnetd*
    • For RHEL systems on version 11.5.6 and newer:
      dnf upgrade <Db2_image>/db2/<platform>/pcmk/Linux/<OS_distribution>/<architecture>/corosync-qnetd*
    • For SLES systems on version 11.5.6 and newer:
      zypper in --allow-unsigned-rpm <Db2_image>/db2/<platform>/pcmk/Linux/<OS_distribution>/<architecture>/corosync-qnetd*
  24. As the root user, if the Qdevice is configured, start the qnetd process on the Qdevice host using the following command:
    systemctl start corosync-qnetd
  25. Confirm that the cluster is in a healthy state. You can run the following command as the root user to check:
    <db2inst home>/sqllib/bin/db2cm -status
    Note: This might take Pacemaker around a minute to complete.

    The following is an example of the output when you run the db2cm -status command:

     db2_db2tea1_eth1 (ocf::heartbeat:db2ethmon):  Started
     db2_kedge1_eth1  (ocf::heartbeat:db2ethmon):  Started
     db2_kedge1_db2inst1_0  (ocf::heartbeat:db2inst):  Started
     db2_db2tea1_db2inst2_0 (ocf::heartbeat:db2inst):  Started
     db2_kedge1_db2inst2_0  (ocf::heartbeat:db2inst):  Started
     Clone Set: db2_db2inst2_db2inst2_CORAL-clone [db2_db2inst2_db2inst2_CORAL] (promotable)
         Promoted: [ db2tea1 ]
         Unpromoted: [ kedge1 ]
     Clone Set: db2_db2inst2_db2inst2_CORAL2-clone [db2_db2inst2_db2inst2_CORAL2] (promotable)
         Promoted: [ db2tea1 ]
         Unpromoted: [ kedge1 ]
     db2_db2tea1_db2inst1_0 (ocf::heartbeat:db2inst):  Started
     Clone Set: db2_db2inst1_db2inst1_CORAL-clone [db2_db2inst1_db2inst1_CORAL] (promotable)
         Promoted: [ db2tea1 ]
         Unpromoted: [ kedge1 ]
     Clone Set: db2_db2inst1_db2inst1_CORAL2-clone [db2_db2inst1_db2inst1_CORAL2] (promotable)
         Promoted: [ db2tea1 ]
         Unpromoted: [ kedge1 ]

    No resources should be in the unmanaged state and all resources should be started on the expected role.