When you use the integrated High Availability (HA) feature with Pacemaker to automate
HADR, extra steps are required to update the operating system or the Db2® database system
software, upgrade the hardware, or change the database configuration parameters. Follow this
procedure to perform a rolling update in a Pacemaker automated HADR
environment.
Before you begin
You must have the following prerequisites ready to perform the steps that are described in the
procedures sections:
- Two Db2
instances
- Two Db2
servers
- The instances are originally running at version 12.1 Mod
Pack 4 or later
- The instances are configured with Pacemaker controlling
HADR failover
- If you are updating to version 11.5.5, download the associated Pacemaker stack here. When updating to version 11.5.6 and higher, the Pacemaker stack will be
installed by the installFixPack command.
Note: All Db2
fix pack updates, hardware upgrades, and software upgrades must be implemented in a test environment
prior to applying them to your production system.
The HADR pair must be in PEER state prior to starting the rolling update.
Restrictions
Use this procedure to perform a rolling update on your
Db2 database system and
update the
Db2
database product software to a new fix pack level in a
Pacemaker automated HADR
environment. For example, applying a fix pack to a
Db2 database product software.
- The Db2
instances must be currently running at version 12.1 Mod
Pack 4 or later.
A rolling update cannot be used to upgrade a Db2 database system from
an earlier version to a later version. For example, you cannot use this procedure to upgrade from
Db2 Version 10.5
to Db2
version 12.1. To upgrade a Db2 server in an
automated HADR environment, see Upgrading Db2 servers in a TSA automated HADR environment.
You cannot use this procedure to update the Db2 HADR configuration
parameters. Updates to the HADR configuration parameters must be made separately. Because HADR
requires the parameters on the primary and standby to be the same, both the primary and standby
databases might need to be deactivated and updated at the same time.
The following procedure cannot be used to convert an existing Db2 HADR system using
Tivoli®
SA MP (TSA)
as a cluster manager to a newer Db2 level using Pacemaker as a cluster
manager in a single step. Instead, the existing system should be first updated to a new Db2 level while
maintaining TSA as the integrated manager. Once the update is complete, follow the steps outlined in
Replacing an existing Tivoli SA MP-managed Db2 instance with a Pacemaker-managed HADR Db2 instance to use Pacemaker as the cluster
manager.
The following procedure is only applicable when the existing Db2 HADR cluster is
deployed using the Db2-provided Pacemaker cluster
software stack. If the cluster to be updated uses Pacemaker provided by
other vendors, all cluster resources should be removed following the procedures outlined from the
Pacemaker
supplier and recreated using the Db2 provided Pacemaker cluster
software stack using the db2cm utility. See Configuring high availability with the Db2 cluster manager utility (db2cm).
Procedure
-
Make sure all databases have their HADR_ROLE set as
STANDBY in the standby node using the following command:
db2pd -hadr -db <database-name>
- Deactivate all databases for all instances on the standby node. Use the
following command to stop HADR, but retain its the role:
db2 deactivate db <database-name>
- Stop all Db2 processes for all
instances on the standby node using the following command:
- As root user on the standby node, stop all Pacemaker and Coroysnc
processes using the following commands:
systemctl stop pacemaker
systemctl stop corosync
systemctl stop corosync-qdevice
Note: Only run the systemctl stop corosync-qdevice
command if the Qdevice is
configured.
- Apply any fixpacks needed. For information on how to do that, refer to Installing offline fix
pack updates to existing Db2 database products (Linux®® and UNIX).
Important: Starting in
version 11.5.6,
step 6 through
step 8 are no longer necessary as the
installFixPack command takes care of these tasks.
- As root user on the standby node, install the new Pacemaker and Corosync
packages, provided by IBM®, using the following commands:
- As the root user, copy the new db2cm utility from
/<tarFilePath>/Db2/db2cm to
/home/<inst_user>/sqllib/adm using the following commands:
cp /<tarFilePath>/Db2/db2cm /home/<inst_user>/sqllib/bin
chmod 755 /home/<inst_user>/sqllib/bin/db2cm
- As the root user on the standby node, copy the resource agent scripts
(db2hadr, db2inst, db2ethmon) from
/<tarFilePath>/Db2agents into
/usr/lib/ocf/resource.d/heartbeat/ for both hosts using the following
commands:
/home/<inst_user>/sqllib/bin/db2cm -copy_resources /<tarFilePath>/Db2agents -host <host1>
/home/<inst_user>/sqllib/bin/db2cm -copy_resources /<tarFilePath>/Db2agents -host <host2>
- As the root user on the standby node, start the Pacemaker and Corosync
processes using the following commands:
systemctl start pacemaker
systemctl start corosync
systemctl start corosync-qdevice
Note: Only run the systemctl start corosync-qdevice
command if the Qdevice is
configured.
- As the root user on the standby node, check the configuration manually or by using the
crm_verify tool (if available):
crm_verify -L -V
Note: This will print any error in the configuration. If there is nothing wrong, nothing will be
printed.
- Start all Db2 processes on all
instances on the standby node using the following command:
- Activate all databases on all instances on the standby node. Use the
following command to resume HADR, but retain the role:
db2 activate db <database-name>
- Perform a role switch.
- On the old primary node (now the standby node) repeat step 2 to step 12 to apply the fixpack and upgrade
Pacemaker.
Note: Exclude
step 8 since this step is
redundant if you have already done it the first time through.
Important: Step 15 to
step 19 are only necessary if updating from
version 11.5.5 to
version 11.5.6 or later.
- Update the
migration-threshold meta
attribute for each database by deleting the existing attribute, and setting it with the new
value.
Delete the existing attribute by
running:
crm resource meta <database resource name> delete migration-threshold
Then set the new attribute by
running:
crm resource meta <database resource name> set migration-threshold 1
An example for an automated database named
CORAL
, the following command would be
run to update the migration
threshold:
crm resource meta db2_db2inst1_db2inst1_CORAL delete migration-threshold
crm resource meta db2_db2inst1_db2inst1_CORAL set migration-threshold 1
- Update the
failure-timeout
attribute for each
database by running the following command:
crm resource meta <database resource name> set failure-timeout 10
An example for an automated database named
CORAL
, the following command would be
run to update the
failure-timeout
:
crm resource meta db2_db2inst1_db2inst1_CORAL set failure-timeout 10
- Ensure that the
migration-threshold
and
failure-timeout
attributes have been updated for each database by running the
following command:
crm config show <database resource-clone>
An example for an automated database named
CORAL
, the following commands would
be run to see the updated resource
configuration:
crm config show db2_db2inst1_db2inst1_CORAL-clone
ms db2_db2inst1_db2inst1_CORAL-clone db2_db2inst1_db2inst1_CORAL \
meta resource-stickiness=5000 migration-threshold=1 ordered=true promotable=true is-managed=true failure-timeout=10
- Update the cluster configuration to set
symmetric-cluster
to true by running the following
command:
crm configure property symmetric-cluster=true
- Update the Corosync configuration to use millisecond
timestamps. This can be done while the cluster is online.
Edit the
corosync.conf file by
running:
crm corosync edit
Update the timestamp setting under the logging directive to
hires
instead of
on
. The final directive should look like the following:
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: hires
function_name: on
fileline: on
}
Push the change to the remote host by
running:
crm corosync push <remote hostname>
Lastly, refresh Corosync so it uses the new configuration by
running:
crm corosync reload
- Perform a failback to set the HADR roles back to their original state.
- Verify that all the databases are in the PEER state via using the following
command:
db2pd -hadr -db <database-name>
- If the Qdevice is configured, stop the
corosync-qnetd
process on the
Qdevice host as the root user using the following command:
systemctl stop corosync-qnetd
- If the Qdevice is configured, update the
corosync-qnetd
package provided
by IBM on the Qdevice host as the root user using the following
command, depending on the version of Db2:
- For RHEL systems on version 11.5.5 and
older:
dnf upgrade /<tarFilePath>/RPMS/<architecture>/corosync-qnetd*
- For SLES systems on version 11.5.5 and
older:
zypper in --allow-unsigned-rpm /<tarFilePath>/RPMS/<architecture>/corosync-qnetd*
- For RHEL systems on version 11.5.6 and
newer:
dnf upgrade <Db2_image>/db2/<platform>/pcmk/Linux/<OS_distribution>/<architecture>/corosync-qnetd*
- For SLES systems on version 11.5.6 and
newer:
zypper in --allow-unsigned-rpm <Db2_image>/db2/<platform>/pcmk/Linux/<OS_distribution>/<architecture>/corosync-qnetd*
- As the root user, if the Qdevice is configured, start the
qnetd
process on
the Qdevice host using the following command:
systemctl start corosync-qnetd
- Confirm that the cluster is in a healthy state. You can run the following command as the
root user to check:
<db2inst home>/sqllib/bin/db2cm -status
Note: This might take Pacemaker around a minute
to complete.
The following is an example of the output when you run the db2cm -status
command:
db2_db2tea1_eth1 (ocf::heartbeat:db2ethmon): Started
db2_kedge1_eth1 (ocf::heartbeat:db2ethmon): Started
db2_kedge1_db2inst1_0 (ocf::heartbeat:db2inst): Started
db2_db2tea1_db2inst2_0 (ocf::heartbeat:db2inst): Started
db2_kedge1_db2inst2_0 (ocf::heartbeat:db2inst): Started
Clone Set: db2_db2inst2_db2inst2_CORAL-clone [db2_db2inst2_db2inst2_CORAL] (promotable)
Promoted: [ db2tea1 ]
Unpromoted: [ kedge1 ]
Clone Set: db2_db2inst2_db2inst2_CORAL2-clone [db2_db2inst2_db2inst2_CORAL2] (promotable)
Promoted: [ db2tea1 ]
Unpromoted: [ kedge1 ]
db2_db2tea1_db2inst1_0 (ocf::heartbeat:db2inst): Started
Clone Set: db2_db2inst1_db2inst1_CORAL-clone [db2_db2inst1_db2inst1_CORAL] (promotable)
Promoted: [ db2tea1 ]
Unpromoted: [ kedge1 ]
Clone Set: db2_db2inst1_db2inst1_CORAL2-clone [db2_db2inst1_db2inst1_CORAL2] (promotable)
Promoted: [ db2tea1 ]
Unpromoted: [ kedge1 ]
No resources should be in the unmanaged
state and all resources should be
started on the expected role.