IBM Support

Setting up two-node Db2 HADR Pacemaker cluster with fencing on Google Cloud

How To


Summary

This document provides an alternate Db2 HADR configuration with Pacemaker without using a third lightweight host as a quorum device arbitrator.

Objective

The objective of this document is to detail an alternative to the two-node HADR + quorum device best practice Pacemaker solution detailed in the IBM Documentation here: 

Quorum devices support on Pacemaker - IBM Documentation

On Google Cloud, you do not necessarily need to configure a quorum device on a third host.  Instead, you can configure fencing as described in this document. 

The advantage of configuring a two-node HADR Pacemaker cluster with fencing is that it removes the requirement of a third host for the quorum device, thus reducing on-going cost.  The disadvantage is that the longer recovery time from primary host failure due to the added time it takes to successfully fence the failed host from the cluster. Based on internal tests, it can take up to 6 times longer to recover from a primary host failure with fencing compared to using a quorum device host.  To compensate for this effect, the HADR_PEER_WINDOW value of all databases must be set to at least 300 seconds.

The choice of configuration must be based on your specific business requirements by taking recovery time and cost of implementation into account. 

Fencing on Google Cloud is done with the fence_gce agent.

This support is available from Db2 11.5.9.0 running on Red Hat Enterprise Linux Version 8.6 or higher or SUSE Linux Enterprise Server Release 15.4 or higher. The hardware architecture supported is the X86 architecture. The fencing agent for Google Cloud must be the one provided through the IBM Marketing Registration Services website instead of other versions available elsewhere.

Environment

Refer to the following IBM Documentation for a list of platforms supported by Pacemaker, these same restrictions apply here: 

Restrictions on Pacemaker - IBM Documentation

Refer to the “Configuring a clustered environment using the Db2 cluster manager (db2cm) utility” page of the IBM Documentation to deploy the automated HADR solution as usual: 

Creating an HADR Db2 instance on a Pacemaker-managed Linux cluster - IBM Documentation

A prerequisite is the Google Cloud guest environment installed on all nodes in the cluster. This guest environment is automatically deployed with each Google-provided public image and is set up automatically. If you are using a custom image, ensure that the guest environment is set up according to this Google documentation: Guest environment  |  Compute Engine Documentation  |  Google Cloud

In addition, you need to launch Google Cloud Shell and authorize the gcloud utility. Details are described here:

Launch Cloud Shell  |  Google Cloud

gcloud  |  Google Cloud CLI Documentation

To be able to set up the pacemaker cluster with fencing, it is beneficial to follow a consistent namespace for the different components. In our case, the names of the entities are derived from the cluster name and the Db2 instance in the cluster. In this document, we use placeholders for the entities that you need to replace with the entities in your environment.

The fencing agent interacts with the Google Cloud infrastructure to start, stop, or reboot the virtual machines by a Service Account. It is best practice to create a dedicated Service Account, a custom role with the minimal set of privileges and assign this role to the service account. The fencing agent uses this service account to interact with the Google Cloud backend. The fencing agent uses an access key that is stored on each virtual machine in the cluster for authentication.

If you need to derive from this setup with capabilities like rotating keys or centralized keystore, familiarize yourself with Google Cloud Identity and Access Management. You can use the following links as a starting point:

Identity and Access Management  |  IAM  |  Google Cloud

Identities for workloads  |  IAM Documentation  |  Google Cloud

Create and delete service account keys  |  IAM Documentation  |  Google Cloud

Best practices for managing service account keys  |  IAM Documentation  |  Google Cloud

Steps

1. Install fencing agent

Navigate to this website: https://www-01.ibm.com/marketing/iwm/platform/mrs/assets?source=mrs-db2pcmk&_ga=2.31425788.296289340.1604345966-344484498.1579133947 

Download the latest version of the Google Cloud fencing agent, for example Db2_RHEL8_GCE_fence-agents-4.12.1.tar.gz from the IBM Marketing Registration Services website.

Unpack the archive by using the following commands:

gunzip Db2_RHEL8_GCE_fence-agents-4.12.1.tar.gz

tar -zxf  Db2_RHEL8_GCE_fence-agents-4.12.1.tar.gz

These commands create the directory Db2_RHEL8_GCE_fence-agents-4.12.1

Install the rpm

Switch to the created directory, followed by the operating System identifier, and issue the following command:

For SLES:

zypper install --allow-unsigned-rpm *.rpm

For RHEL:

dnf install *.rpm

Note: The fencing agents must be installed on both nodes in the cluster.

2. Decide on namespaces and IP addresses

Compile a list of all host names, including virtual host names, and update your DNS servers to enable proper IP address to host-name resolution. If a DNS server doesn't exist or you can't update and create DNS entries, you need to use the local host files of the individual virtual machines. For an introduction to DNS, refer to: Internal DNS  |  Compute Engine Documentation  |  Google Cloud

If you're using host files entries, make sure that the entries are applied to all virtual machines in the environment. Also, compile a list of names for the different entities required according to the example shown in the following table.

Entity

Name

Google Cloud Project

db2pcmk

Google Cloud Region

europe-west1

Db2 Instance

db2gp1

Db2 database

GP1

Hostname/tag of cluster node 0

pcmkdb01

Hostname/tag of cluster node 1

pcmkdb02

Pacemaker Cluster Name

GP1cluster

Google Cloud Service Account

db2gp1-service-account

Google Cloud Role Name for Fencing

db2gp1fencer

Google Cloud Service Account email

db2gp1[email protected]

3. Create Service Account

Before setting up the fencing agent, create the required Google Backend Services. This configuration can be done by using the graphical user interface in the Google Cloud console or the Google Cloud Shell. In this document, we use the Google Cloud Shell. For a reference of Google Cloud Shell Commands, refer to: gcloud  |  Google Cloud CLI Documentation

In Cloud Shell, create a Service Account for your Google Cloud project that is used to interact between the Pacemaker software and the Google Cloud Backend components.

gcloud iam service-accounts create db2gp1-service-account --display-name="db2gp1-service-account-for-fencing" --project="db2pcmk2023" 

4. Create a Role with required permissions

In Cloud Shell, create a role and assign the required permissions. The required permissions are:

  • compute.instances.get
  • compute.instances.list
  • compute.instances.reset
  • compute.instances.start
  • compute.instances.stop
  • compute.zoneOperations.get
  • logging.logEntries.create
  • compute.zoneOperations.list 

gcloud iam roles create db2gp1fencer --project=db2pcmk2023 --title=db2gp1fencer --description="Perform Pacemaker fencing actions on db2gp1" --stage=GA --permissions=compute.instances.get,compute.instances.list,compute.instances.reset,compute.instances.start,compute.instances.stop,compute.zoneOperations.get,logging.logEntries.create,compute.zoneOperations.list 

5. Assign Role to the Service Account.

In Cloud Shell, assign the previously created custom role to the Google Service Account used for the Fencing Agent.


gcloud projects add-iam-policy-binding db2pcmk2023 --member serviceAccount:[email protected] --role=projects/db2pcmk2023/roles/db2gp2fencer --condition=None

Download Access Key File.

The Key file is used to authenticate the service account in the Google Cloud backend. The file must be downloaded to both nodes in the cluster and is used as an optional argument when the fencing agent in the pacemaker cluster is configured. You can use the gcloud command either in the Google Cloud Console or on one of the virtual machines in the cluster directly.

If you use the command in the Google Cloud Console, you can store the Key File locally on your workstation and upload it to the virtual machines afterward.

A more convenient way is to execute the gcloud command on one of the nodes in the cluster, save the Key File immediately on this node and copy it to the second node. The Key File can be located in the directory “fence_auth” in the home directory of the Db2 instance owner. To prepare the directory structure, perform the following command sequence on both nodes in the cluster.

If the setup in your organization requires a different setup for the key management, refer to:

Best practices for managing service account keys  |  IAM Documentation  |  Google Cloud

Secret Manager  |  Google Cloud

The following example shows how to execute the cloud command on the virtual machines itself.

To store the key file, create a new directory, for example /etc/db2pcmk_fence as user root on both nodes in the cluster.

mkdir /etc/db2pcmk_fence
chmod 700 /etc/db2pcmk_fence
cd /etc/db2pcmk_fence

In Cloud Shell. Generate and download the access key file for the service account used for the fencing agent.

Note: This command must be executed only on one of the virtual machines. In our case, we use the primary database server pcmkdb01. 
The following example shows how to execute the cloud command on the virtual machines itself.

gcloud iam service-accounts keys create db2gp1.json --iam-account=db2gp1-service-account@db2pcmk2023.iam.gserviceaccount.com 


Once this file is downloaded to the virtual machine, copy the file to the second node and change the permissions. 
 

scp db2gp1.json pcmkdb02:/etc/db2pcmk_fence
ssh pcmkdb02 "chmod 700 /etc/db2pcmk_fence/db2gp1.json"

6. Prepare the Cluster for Fencing

Log in as user root 
- Create a systemd drop-in file: systemctl edit corosync.service and add the lines:
        [Service]
        ExecStartPre=/bin/sleep 60
- Reload the systemd daemon: systemctl daemon-reload
- Stop the Pacemaker cluster on both hosts:  crm cluster stop 
- Check and add the wait_for_all: 1 clause to the /etc/corosync/corosync.conf file on both hosts: 
 
       Quorum { 
             provider: corosync_votequorum 
             two_node: 1
            wait_for_all: 1 
        } 

-    Increase the value for token from 10.000 to 20.000 in the /etc/corosync/corosync.conf file on both hosts: 
 

		totem {
		    version: 2
		    cluster_name: GP1cluster
		    transport: knet
		    token: 20000
		    crypto_cipher: aes256
		    crypto_hash: sha256
		}

-    Start the Pacemaker cluster on both hosts: crm cluster start 
-    Enable the fencing-related properties: 

crm  configure property stonith-enabled=true 
crm configure property no-quorum-policy=stop
crm configure property priority-fencing-delay=60
Note: There is an error displayed that can be ignored and the commit must be confirmed by entering 'y'. 
7. Configure the fence_gce fence agent resource 
With Db2 integrated Pacemaker, the recommended setup is to use only one Fencing Agent Instance. The Fencing Agent resource starts on one host only. 
-    Log in as user root 
-    On one of the hosts, create the fencing agent primitive as follows:
crm configure primitive fence_db2_gcp_db2gp1 stonith:fence_gce 
op monitor interval=300s timeout=120s
op start interval=0 timeout=60s
params serviceaccount="/etc/db2pcmk_fence/db2gp1.json"
pcmk_host_list="pcmkdb01,pcmkdb02" zone=europe-west1-b project=db2pcmk2023
pcmk_reboot_timeout=300 
pcmk_monitor_retries=4 
pcmk_delay_max=30
meta is-managed=true 

 
Finally, set the resource state to managed:
crm configure manage fence_db2_gcp_db2gp1

Additional Information

Managing, Starting and Stopping fencing resources in the cluster
With the db2cm utility, you can manage the cluster. However, the db2cm utility manages resources created by db2cm only. As the Google Cloud fencing agent is not created with the db2cm utility, you must manage the fencing agents manually. So, before you use db2cm to disable the cluster, you must set the fencing agent to an unmanaged state separately.
To do so, unmanage the resources and check the status first by using following commands:
 
crm resource unmanage fence_db2_gcp_db2gp1
crm resource status 
After the fencing agent is in status “unmanaged”, you can use the db2cm utility to disable the cluster. The same is true when you enable a cluster with the db2cm utility. The fencing agent needs to be enabled separately.  To do so, check the status of the cluster after, set the resource to managed state, and start the resource if required with these commands:
crm resource status fence_db2_gcp_db2gp1
crm resource manage fence_db2_gcp_db2gp1
crm resource start fence_db2_gcp_db2gp1
Remove fencing resources from the cluster
With the db2cm utility, you can add and remove resources in the cluster or remove the cluster completely. However, the db2cm utility removes resources created by db2cm only.  As the fencing agent is not created with the db2cm utility, you must remove the fencing agent manually if you want to operate the cluster without the fencing agent and also prior deleting the cluster entirely. 
To do so, perform the following steps after the fencing agent is unmanaged:
 
crm configure property Stonith enabled=false
crm configure property no-quorum-policy=ignore
crm configure delete fence_db2_gcp_db2gp1 -force 
crm resource refresh
If you permanently remove the fencing agent from the cluster, remove the configuration for the service account and the role assignment in the Google Cloud backend.

Document Location

Worldwide

Operating System

Cross Brand:Linux

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"ARM Category":[{"code":"a8m3p0000006xc1AAA","label":"High Availability-\u003EPacemaker"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
14 November 2023

UID

ibm17071303