Fencing agent configuration for z/VM management nodes
Introduction
The z/VM® fencing agent is used for the management nodes in
cluster that are z/VM virtual machines. This topic introduces the
detailed steps on how to configure z/VM fencing agent.
when a multi-node cluster is available, logon to any z/VM
management node and run
pcs stonith describe
fence_zvmip
to get fencing agent
fence_zvmip
detail information.
Create user and set up SMAPI authorization
-
Create user by 3270.
Example:
USER ZCLUSTER xxxxxxxx 32M 128M G INCLUDE IBMDFLT IPL CMS MDISK 0191 3390 AUTOV 10 T513T1 MRNote: Logon user ID by 3270 and change the expired password that is generated when create this user ID to make authorization validate pass.
-
Add to z/VM Directory by 3270.
DIRM ADD ZCLUSTER -
Set LPAR Authorization Policy.
Authorization_Policy = Authorization_Policy_AuthlistOnly -
Configure file VSMWORK1 NAMELIST.
Make sure
NAMELISTSetting has the following priority::nick.ZVM_FENCE :list. IMAGE_ACTIVATE IMAGE_DEACTIVATE IMAGE_STATUS_QUERY CHECK_AUTHENTICATION -
Configure AUTHLIST.
Following information in
pcs stonith describe fence_zvmipconfigure the AUTHLISTFollowing the description of fencing agent `fence_zvmip` detail information, to use this agent the z/VM SMAPI service needs to be configured to allow the virtual machine running this agent to connect to it and issue the image_recycle operation. This involves updating the VSMWORK1 AUTHLIST VMSYS:VSMWORK1. file. The entry should look something similar to this: Column 1 Column 66 Column 131 | | | V V V XXXXXXXX ALL IMAGE_CHARACTERISTICS Where XXXXXXX is the name of the virtual machine used in the authuser field of the request. This virtual machine also has to be authorized to access the system's directory manager.User ID
AUTHLISTconfiguration exampleCOLUMN 1 COLUMN 66 COLUMN 131 | | | | | | | | | V V V DO.NOT.REMOVE DO.NOT.REMOVE DO.NOT.REMOVE ZCLUSTER ALL ZVM_FENCE
Notes: step 1 to step 4 setup accomplished out of IBM® Cloud Infrastructure Center. Configuration takes effect on all virtual machines in LAPR.
Validate up SMAPI authorization
-
Ssh to the virtual machine within the same LPAR as the cluster management node, that must have installed
smcliand enabledauthlist. -
Get the user and password setup in step 1 in Create user and set up SMAPI authorization.
-
smcli Check_Authentication -u ZCLUSTER -p < passsword >command validates authorization in this virtual machine and check results.Command result examples.
Authentication successful
[root@compute ~]# smcli Check_Authentication -u ZCLUSTER -p < passsword > DonePassword does not match with user ID
[root@compute ~]# smcli Check_Authentication -u ZCLUSTER -p < passsword > Validating userid/password pair ... Failed Return Code: 120 Reason Code: 0 Description: ULGSMC5120E Authentication error; userid or password not valid API issued : Check_AuthenticationAuthentication configuration failed
[root@compute ~]# smcli Check_Authentication -u ZCLUSTER -p < passsword > Validating userid/password pair ... Failed Return Code: 100 Reason Code: 16 Description: ULGSMC5100E Request not authorized by server API issued : Check_AuthenticationNotes: The SMAPI authorization failure leads to the fencing agent being in stop or failure status, it can even lead to the user ID being revoked.
Enable stonith monitor
Ssh login to management node in cluster and enable stonith by command **`pcs property set stonith-enabled=true`** then check configuration by command **`pcs property config`**,
if result includes **`stonith-enabled: true`** means that stonith is enabled.
Example
```
[root@root@management ~]# pcs property config Cluster Properties: cluster-infrastructure: corosync cluster-name: hatest7 cluster-recheck-interval: 30min dc-version: 2.1.2-4.el8-ada5c3b36e2 have-watchdog: false no-quorum-policy: ignore placement-strategy: balanced start-failure-is-fatal: false stonith-enabled: true
## Create Resources
Create fence_zvmip fencing agent by command :
`pcs stonith create zvm-smapi1 fence_zvmip ip=zvm-smapi1.example.com username="ZCLUSTER" password="< passsword >" pcmk_host_map="node1:RHELHA-1;node2:RHELHA-2;node3:RHELHA-3"`
**Notes**:
1. `pcmk_host_map="<NODENAME>:<IMAGE>[;<NODENAME>:<IMAGE>...]" ` the **`NODENAME`** reference **`Node List`** item in result of command **`pcs cluster status`**.
Example:
nodes: **`node1-ip-address node2-ip-address node3-ip-address`**.
```
[root@management ~]# pcs cluster status
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: node1-ip-address (version 2.1.2-4.el8_6.3-ada5c3b36e2) - partition with quorum
* Last updated: Wed Jun 7 03:04:06 2023
* Last change: Fri Jun 2 03:58:34 2023 by hacluster via crmd on node1-ip-address
* 3 nodes configured
* 105 resource instances configured (9 DISABLED)
Node List:
* Online: [ node1-ip-address node2-ip-address node3-ip-address ]
PCSD Status:
node2-ip-address: Online
node1-ip-address: Online
node3-ip-address: Online
```
2. `pcmk_host_map="<NODENAME>:<IMAGE>[;<NODENAME>:<IMAGE>...]" `. **`IMAGE`** is user ID of management nodes in cluster, ssh login management node in cluster and get user ID value by **`vmcp q userid`**
Example:
```
[root@management ~]# vmcp q userid
IAAS06A0 AT BOEM5403
```
IAAS06A0 is user ID of the management node.
3. **`pcmk_reboot_action and pcmk_off_action`**
These two parameters can be set value from **`reboot`** and **`off`**. The **`reboot`** value means restart OS when triggering the operation, **`Off`** values shutoff OS when triggering off the operation.
**`pcmk_reboot_action`** default value is **`reboot`**, **`pcmk_off_action`** default value is **`off`**.
**`pcmk_reboot_action/pcmk_off_action`** value can be specified to change default value when created or updated by **`pcmk_reboot_action=off`** or **`pcmk_off_action=reboot`**.
4. The Fencing agent needs to be created based on the management node's location. The single fencing agent for cluster is ok when all the management nodes are located within same LPAR, but if the management nodes are located in different LPARs multiple fencing agents are needed.
The [Create user and set up SMAPI authorization](#introduction__create-user-and-set-up-smapi-authorization) step needs to be performed on all LPARs.
Example:
```
Node List:
* Online: [ node1-ip-address node2-ip-address node3-ip-address ]
```
Node 'node1-ip-address' query information from step 2 is **`IAAS06A0 AT BOEM5403`**.
Node 'node2-ip-address' query information from step 2 is **`IAAS0609 AT BOEM5404`**.
Node 'node3-ip-address' query information from step 2 is **`IAAS0610 AT BOEM5404`**.
There are 3 management nodes that are located in 2 different LPARS, **`BOEM5403`**, and **`BOEM5404`**, so you need create 2 fencing agents to accomplish Stonith.
`pcs stonith create zvm-smapi-BOEM5403 fence_zvmip ip=zvm-smapi1.BOEM5403.com username="ZCLUSTER" password="<passsword>" pcmk_host_map="node1-ip-address:IAAS06A0"` and `pcs stonith create zvm-smapi-BOEM5404 fence_zvmip ip=zvm-smapi1.BOEM5404.com username="ZCLUSTER" password="<passsword>" pcmk_host_map="node2-ip-address:IAAS0609;node3-ip-address:IAAS0610"`
**Note**:
Reference result of command **` pcs stonith -h`** to get detail fencing agent Usage.
## Update Resources
Update fencing agent configuration, reference detail information by **`pcs stonith update -h`**.
Example:
`pcs stonith update fencing-agent pcmk_host_map="node1:RHELHA-1;node2:RHELHA-2;node3:RHELHA-3"`
## Show fencing agent configuration
Show fencing agent configuration by **`pcs stonith config`**.
Example:
[root@management ~]# pcs stonith config
Resource: wngzhetest (class=stonith type=fence_zvmip)
Attributes: inet4_only=1 ip=zvm-smapi1.BOEM5403.com password="<passsword>" pcmk_host_map=node1-ip-address:IAAS06A0;node2-ip-address:IAAS0609;node3-ip-address:IAAS0610 username=ZCLUSTER
Operations: monitor interval=60s (wngzhetest-monitor-interval-60s)
## Manually fence node in cluster
Manually fence command **`pcs stonith fence node3-ip-address`**, triggers **`pcmk_reboot_action`** for node **`node3-ip-address`** in cluster. Manually fence command **`pcs stonith fence node3-ip-address --off`** trigger the **`pcmk_off_action`** to node **`node3-ip-address`** in cluster.
The **`pcmk_reboot_action/pcmk_off_action`** configuration value setting when create or update.
Example:
pcs stonith create wngzhetest fence_zvmip inet4_only=1 ip=zvm-smapi1.BOEM5403.com username="ZCLUSTER" password="< passsword >" pcmk_host_map="node1-ip-address:IAAS06A0;node2-ip-address:IAAS069F;node3-ip-address:IAAS0610" pcmk_reboot_action=off
Or
pcs stonith update wngzhetest fence_zvmip pcmk_off_action=reboot
## Automate fence node in cluster
The Fence agent automates fence node in cluster when nodes offline to avoid problem of brain split based on these configurations.
First refer to [Enable stonith monitor](#introduction__enable-stonith-monitor) to enable stonith monitor. Then refer to [Create Resources](#introduction__create-resources) to create fencing agent.
Finally, refer to [Check fencing agent status](#introduction__check-fencing-agent-status) to check that the fencing agent status is started.
Started fencing agent monitor the management nodes in cluster and perform action **`pcmk_reboot_action`** when find node offline.
The **`pcmk_reboot_action`** action type refers step 3 **`pcmk_reboot_action and pcmk_off_action`** part in [Create Resources](#introduction__create-resources).
## Enable/disable fencing agent
Stop fencing agent by command **` pcs stonith disable fencing-agent-id`**,
Start fencing agent by command **`pcs stonith enable fencing-agent-id`**.
## Check fencing agent status
Show the fencing agent status by command **`pcs stonith status`**
example:
Fencing agent is stopped.
```
[root@management ~]# pcs stonith status
* wngzhetest (stonith:fence_zvmip): Stopped (disabled)
```
Fencing agent is started.
```
[root@management ~]# pcs stonith status
* wngzhetest (stonith:fence_zvmip): Started node2-ip-address
```
## Delete fencing agent
Use **`pcs stonith delete -h`** to delete the fencing agent.