How To
Summary
This document provides an alternate Db2 HADR configuration with Pacemaker without using a third lightweight host as a quorum device (Qdevice) arbitrator. It discusses the pros and cons of the alternative setup with fencing only, thereby guiding the choice to the user based on cost vs recovery time tradeoff.
Objective
The objective of this document is to detail an alternative to the two-node HADR + quorum device best practice Pacemaker solution detailed in the IBM Documentation here: Quorum devices support on Pacemaker - IBM Documentation
On Microsoft Azure, you do not necessarily need to configure a quorum device on a third host. Instead, you can configure fencing as described in this document.
The advantage of configuring a two-node HADR Pacemaker cluster with fencing is that it removes the requirement of a third host for the quorum device, thus reducing on-going cost.
The disadvantage is that the longer recovery time from primary host failure due to the added time it takes to successfully fence the failed host from the cluster. Based on our internal test result in a controlled environment, it can take up to 6 times longer to recover from a primary host failure with fencing compared to when using a quorum device host. To compensate for this, the HADR_PEER_WINDOW value of all databases must be set to at least 300 seconds.
The choice of configuration should be based on your specific business requirements by taking recovery time and cost of implementation into account. Fencing on Microsoft Azure is done via the fence_azure_arm agent. The fence_azure_arm is a Fencing agent for Azure Resource Manager. It uses Azure SDK for Python to connect to Azure.
Db2 does not include the Fencing Agent for Azure in the Db2 installation image. You must download the package for the Azure Fencing Agent from following website:
To install the fencing agent, perform the following steps:
1. Download the latest version of the Azure fencing agent, e.g. Db2_Azure_fence_agent_4.7.1-3_noarch.tar.gz from above website.
2. Unpack the archive by using the following command: tar -zxf Db2_Azure_fence_agent_4.7.1-3_noarch.tar.gz
The above creates the directory Db2_Azure_fence_agent_4.7.1-3_noarch.
3. Install the rpm
Switch to the above directory, followed by the operating System identifier and issue the following command:
For SLES: zypper install --allow-unsigned-rpm *.rpm
For RHEL: dnf install *.rpm
Note:
The fencing agents must be installed on both nodes in the cluster.
Once installed, follow the steps below to configure a two host HADR Pacemaker cluster with fencing on Microsoft Azure.
Environment
Steps
1. Refer to the “Configuring a clustered environment using the Db2 cluster manager (db2cm) utility” page of the IBM Documentation to deploy the automated HADR solution: Configuring a clustered environment using the Db2 cluster manager (db2cm) utility - IBM Documentation
2. Create Azure Fence agent STONITH device that uses a service principal to authorize against Microsoft Azure. Follow these steps to create a service principal.
- Go to https://portal.azure.com
- Open the “Azure Active Directory” on the right menu.
Go to Properties and make a note of the Directory ID. This is the tenant ID. - Click ”App registrations”
- Click “New registration”
- Enter a Name, for example “PCMK1”
- Select "Accounts in this organization directory only"
- Select Application Type "Web", enter a sign-on URL (for example http://localhost) and click Add
The sign-on URL is not used for the Pacemaker setup and can be any valid URL like http://localhost. - Select “Certificates and secrets”, then click “New client secret”
- Enter a description for a new key, select "Never expires" and click “Add”
- Make a note on the Value. It is used as the password for the service principal
- Select “Overview”. Make a note on the Application ID. It is used as the username (login ID in the steps below) of the service principal
For details, refer to the following tutorial: https://docs.microsoft.com/en-us/azure/active-directory-b2c/tutorial-register-applications?tabs=app-reg-ga
Create a custom role for the fence agent
The service principal does not have permissions to access your Azure resources by default. You need to give the service principal permissions to start and stop (power-off) all virtual machines of the cluster. If you did not already create the custom role, you can create it using PowerShell or Azure CLI on one of the machines in the cluster.
Use the following content for the input file. You need to adapt the content to your subscriptions that is, replace 12345678-9abc-def1-2345-6789abcdef12 and 87654321-cba9c-1fed-5432-21fedcba9876 with the IDs of your subscription. If you only have one subscription, remove the second entry.
“assignableScopes”.
{
"properties": {
"roleName": "Linux Fence Agent Role",
"description": "Allows to power-off and start virtual machines",
"assignableScopes": [
"/subscriptions/12345678-9abc-def1-2345-6789abcdef12",
"/subscriptions/87654321-cba9c-1fed-5432-21fedcba9876"
],
"permissions": [
{
"actions": [
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/powerOff/action",
"Microsoft.Compute/virtualMachines/start/action"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
}
}
Assign the custom role to the service principal
Assign the custom role "Linux Fence Agent Role" that was created in the last section to the service principal on both nodes in the cluster. Do not use the Owner role anymore!
- Go to https://portal.azure.com
- Open “All resources” in the menu on the left.
- Select the virtual machine of the first cluster node
- Click “Access control (IAM)”
- Click “Add role assignment”
- Select the role "Linux Fence Agent Role"
- Enter the name of the application, e.g “PCMK1” you created above
- Click Save
Repeat the steps above for the second cluster node.
3. Stop the Db2 instance on both hosts:
db2stop force
4. Log in as root as it is required for steps 5. through 9.
5. Stop the Pacemaker cluster on both hosts:
crm cluster stop
6. Edit the /etc/corosync/corosync.conf file on both hosts.
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 0
}
totem {
version: 2
cluster_name: pa2dom
transport: knet
token: 30000
crypto_cipher: aes256
crypto_hash: sha256
}
7. Start the Pacemaker cluster on both hosts:
crm cluster start
8. Monitor the “crm status” output, once both hosts report online, start the Db2 instance on both hosts and re-activate all HADR databases.
9. Enable the stonith-enabled property:
crm configure property stonith-enabled=true
Note: The following error will be show and indicates a configuration mismatch. You can ignore this error as the fencing agent will be created in the next step.
ERROR: (unpack_resources) error: Resource start-up disabled since no STONITH resources have been defined
(unpack_resources) error: Either configure some or disable STONITH with the stonith-enabled option
(unpack_resources) error: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
10. Set no-quorum-policy to stop.
crm configure property no-quorum-policy=stop
11. Configure the fencing agent in the cluster by running the following two commands:
Note 1: Replace the value for subscriptionId, resourceGroup, tenantId, Login, passwd with the appropriate value from your Azure account. This also applies to the azr-rd01 and azr-rd02 values above, those should be replaced with your actual hostnames.
Note 2: Option “pcmk_host_map” is ONLY required in the command, if the hostnames and the Azure VM names are NOT identical. Specify the mapping in the format “hostname:vm-name”. Refer to the bold section in the command.
12. Monitor the “crm status” output and verify that the fencing agent is started. The result should be similar to the following example.
* rsc_st_azure (stonith:fence_azure_arm): Started azr-rd02
13. Restart your Db2 instance by issuing the db2start command on both hosts. Furthermore, ensure that the HADR_PEER_WINDOW for all your automated HADR databases is set to at least 300 seconds.
14. Re-activate the HADR databases to re-enable automation for your HADR databases.
After this, the cluster is enabled to use the Azure fencing agent and we recommend performing a series of tests to validate that the setup works as planned.
Managing, Starting and Stopping fencing resources in the cluster
With the db2cm utility, you can manage the cluster and enable or disable the cluster. The db2cm utility however only manages resources, created by db2cm. As the Azure fencing agent is not created using db2cm, you must manage the fencing agents manually.
So, before using db2cm to disable the cluster, you must set the Azure fencing agent to unmanaged state separately prior disabling the cluster with db2cm. To do so, unmanage the resources and check the status first using following commands:
crm resource unmanage rsc_st_azure
crm resource status
As a result, you will see the status for the fencing agent similar to the following:
* rsc_st_azure (stonith:fence_azure_arm): Started azr-rd02 (unmanaged)
After the fencing agent is in status “unmanaged”, you can use the db2cm utility to disable the cluster:
/db2/db2pa2/sqllib/bin/db2cm -disable -all
The same is true when you enable a cluster with the db2cm utility. The azure fencing agents may need to be enabled separately. To do so, check the status of the cluster after it has been started with “db2cm -enable -all”, set the resource to managed and start the resources if required using following commands:
crm resource status rsc_st_azure
crm resource manage rsc_st_azure
crm resource start rsc_st_azure
Remove fencing resources from the cluster
With the db2cm utility, you can add and remove resources in the cluster or remove the cluster completely. The db2cm utility however only removes resources, created by db2cm. As the Azure fencing agent is not created using db2cm, you must remove the fencing agent manually if you want to operate the cluster without Azure fencing agents and also prior deleting the cluster entirely.
To do so, perform following steps:
crm configure property stonith-enabled=false
crm configure delete rsc_st_azure –force
crm resource refresh
If you want to delete the entire cluster, you can use db2cm -delete -cluster
If you permanently remove the fencing agent from the cluster, remove the custom role "Linux Fence Agent Role" from the Service Principal on both nodes in the cluster.
- Go to https://portal.azure.com
- Open “All resources” in the menu on the left.
- Select the virtual machine of the first cluster node and click “Access control (IAM)”
- Click “Role assignment” and remove the application for fencing for example “PCMK1” from the "Linux Fence Agent Role"
Repeat the steps above for the second cluster node.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
10 November 2022
UID
ibm16465977