IBM Support

Setting up two-node Db2 HADR Pacemaker cluster with fencing on Microsoft Azure for Db2 V11.5.8.0

How To


Summary

This document provides an alternate Db2 Pacemaker configuration without using a third lightweight host as a quorum device (Qdevice) arbitrator. It discusses the pros and cons of the alternate setup with fencing only, thereby guiding the choice to the user based on cost vs recovery time trade-off.

Objective

The objective of this document is to detail an alternative to the two-node Db2 Pacemaker with quorum device best practice solution detailed in the IBM Documentation here: Quorum devices support on Pacemaker - IBM Documentation
The procedure outlined in this document can be used for both HADR and Mutual Failover cluster configurations on Azure with Db2 V11.5.8.0. For prior Db2 versions, refer to the original document.
On Microsoft Azure, you do not necessarily need to configure a quorum device on a third host. Instead, you can configure fencing as described in this document.
The advantage of configuring a two-node Pacemaker cluster with fencing is that it removes the requirement of a third host for the quorum device, thus reducing on-going cost.  
The disadvantage is that the longer recovery time from primary host failure due to the added time it takes to successfully fence the failed host from the cluster. Based on our internal test result in a controlled environment, it can take up to 6 times longer to recover from a primary host failure with fencing compared to when a quorum device host is used. To compensate for this fact on HADR clusters, the HADR_PEER_WINDOW value of all databases must be set to at least 300 seconds. There are no additional configuration changes required to compensate for long fencing times on Mutual Failover clusters, however fail-over automation will not occur until fencing is completed.
The choice of configuration must be based on your specific business requirements by taking recovery time and cost of implementation into account.
Fencing on Azure is done by using the fence_azure_arm agent. The fence_azure_arm agent is an open source I/O fencing agent for Azure, which uses the Azure SDK to connect to Azure.
In Db2 V11.5.5.0, the fencing agent was included as part of the Pacemaker cluster software package in the following IBM website. Starting from Db2 V11.5.6.0, the Pacemaker software stack becomes part of the standard Db2 installation image. Only the fencing agent remains in the following IBM website. Irrespective of which Db2 version is used, the fencing agent for Azure must be the one provided through the following site instead of other versions available elsewhere.
Db2 does not include the Fencing Agent for Azure in the Db2 installation image. You must download the package for the Azure Fencing Agent from following website:  
To install the fencing agent, perform the following steps:
1. Download the latest version of the Azure fencing agent from above website.
For RHEL download: Db2_RHEL_Azure_fence_agents_4.11.0-4.tar.gz
For SLES download: Db2_SLES_Azure_fence_agents_4.9.0.tar.gz
2. Unpack the archive by using the following command: tar -zxf Db2_RHEL_Azure_fence_agents_4.11.0-4.tar.gz
The above creates the directory Db2_RHEL_Azure_fence_agents_4.11.0-4.
3. Install the rpm
Switch to the above directory, followed by the operating system identifier and issue the following command:
For SLES: zypper install --allow-unsigned-rpm *.rpm
For RHEL: dnf install *.rpm
Note:  
The fencing agents must be installed on both nodes in the cluster.  
Once installed, follow the steps below to configure a two host HADR Pacemaker cluster with fencing on Microsoft Azure.

Environment

Refer to the following IBM Documentation page for a list of platforms supported by Pacemaker, these same restrictions apply here: Restrictions on Pacemaker - IBM Documentation

Steps

Prerequisites
1. Install the Azure CLI command which is used by the db2cm command to get certain details required for configuring fencing. Refer to the following document from Microsoft for installing the Azure CLI:
Setup
1. Refer to the “Configuring high availability with the Db2 cluster manager utility (db2cm)” page of the IBM Documentation to deploy either the automated HADR solution or Mutual Failover configuration: Configuring high availability with the Db2 cluster manager utility (db2cm) - IBM Documentation
2. Create service principal for the Azure fencing agent to authorize against for access to the Azure API. Follow these steps to create the service principal.
  •     Go to https://portal.azure.com
  •     Open the “Azure Active Directory” on the right menu, or search for it in the search bar.
  •     Click ”App registrations”.
  •     Click “New registration”.
  •     Enter a Name, for example “PCMK1”  .
  •     Select "Accounts in this organization directory only".
  •     Select Application Type "Web", enter a sign-on URL (for example http://localhost) and click "Add".
  •     The sign-on URL is not used for the Pacemaker setup and can be any valid URL like http://localhost.  
  •     Select “Certificates and secrets”, then click “New client secret”.
  •     Enter a description for a new key, select an appropriate expiration date and click “Add”.
  •     Make a note on the Value. It is used as the password for the service principal.
  •     Select “Overview”. Make a note on the Application ID. It is used as the username (login ID in the steps below) of the service principal.
Create a custom role for the fence agent
The service principal does not have permissions to access your Azure resources by default. You need to give the service principal permissions to start and stop (power-off) all virtual machines of the cluster. If you did not already create the custom role, you can create it using the Azure CLI on one of the machines in the cluster.  
Use the following content for the input file. You need to adapt the content to your subscriptions that is, replace <subscriptionid1> and <subscriptionid2> with the IDs of your subscription. If you only have one subscription, remove the second entry.
1. Save the following contents to a file, such as fence_role.json
{
    "name":"Linux Fence Agent Role",
    "IsCustom": true,
    "description": "Azure ARM fencing role",
    "assignableScopes": [
        "/subscriptions/<subcriptionid1>/resourceGroups/HADR-Pacemaker-1",
        "/subscriptions/<subcriptionid2>/resourceGroups/HADR-Pacemaker-2”
    ],
    "permissions": [
        {
            "actions": [
                "Microsoft.Compute/*/read",
                "Microsoft.Compute/virtualMachines/powerOff/action",
                "Microsoft.Compute/virtualMachines/start/action"
            ],
            "notActions": [],
            "dataActions": [],
            "notDataActions": []
        }
    ]
}
2. Create the role using the Azure CLI
az role definition create --role-definition <path-to-file>/fence_role.json
3. Assign the custom role to the service principal from the Azure portal.
Assign the custom role "Linux Fence Agent Role" that was created in the last section to the service principal on both nodes in the cluster. Do not use the Owner role anymore!
  •     Go to https://portal.azure.com
  •     Open “All resources” in the menu on the left.
  •     Select the virtual machine of the first cluster node
  •     Click “Access control (IAM)”
  •     Click “Add role assignment”
  •     Select the role "Linux Fence Agent Role"
  •     Enter the name of the application, e.g “PCMK1” you created above
  •     Click Save
Repeat the steps above for the second cluster node.
4. Stop the Db2 instance on both hosts:
db2stop force
5. Log in as root as it is required for steps 5. through 9.
6. Stop the Pacemaker cluster on both hosts:
crm cluster stop
7. Edit the /etc/corosync/corosync.conf file on both hosts and increase the token timeout value from 10000 to 30000.
 
totem {
    version: 2
    cluster_name: pa2dom
    transport: knet
    token: 30000
    crypto_cipher: aes256
    crypto_hash: sha256
}
Note: For details about the timeout value for missed tokens, refer to this Documentation: Maintenance and updates - Azure virtual machines | Microsoft Docs
8. Start the Pacemaker cluster on both hosts:
crm cluster start
9. Monitor the “crm status” output, once both hosts report online, start the Db2 instance on both hosts.
10. For HADR automated databases, ensure the HADR_PEER_WINDOW is set to at least 300 seconds for all automated databases. Run the following for each primary database.
db2 update db cfg for <database name> using HADR_PEER_WINDOW 300
Then activate the standby and primary database.
11. Set the ‘DB2_AZURE_SP_LOGIN’ and ‘DB2_AZURE_SP_PASSWD’ environment variables. Note that these environment variables need only be set before you run the db2cm command to configured the fence agent.

export DB2_AZURE_SP_LOGIN=<Service Principal Application ID>
export DB2_AZURE_SP_PASSWD=<Service Principal Client Secret Value>
12. Configure the fencing agent in the cluster using the db2cm utility.
db2cm -create -azure -fence
13. Monitor the “crm status” output and verify that the fencing agent is started. The result should be similar to the following example.
* fence_db2_azure (stonith:fence_azure_arm): Started Host-1
Remove fencing resources from the cluster
1. Remove the fence agent from the resource model using the db2cm utility.
db2cm -delete -azure –fence
2. Confirm the fence agent has been removed from the configuration using the db2cm –list command.
3. If you permanently remove the fencing agent from the cluster, remove the custom role "Linux Fence Agent Role" from the Service Principal on both nodes in the cluster.
  •     Go to https://portal.azure.com
  •     Open “All resources” in the menu on the left.
  •     Select the virtual machine of the first cluster node and click “Access control (IAM)”
  •     Click “Role assignment” and remove the application for fencing for example “PCMK1” from the "Linux Fence Agent Role"
Repeat for the second cluster node.
4. If there are no other clusters using the fencing role, it can also be deleted using the Azure CLI.
az role definition delete --name "Linux Fence Agent Role"

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"ARM Category":[{"code":"a8m3p0000006xc1AAA","label":"High Availability-\u003EPacemaker"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
25 October 2022

UID

ibm16829813