Fencing agent configuration for z/VM management nodes

Introduction

The z/VM® fencing agent is used for the management nodes in cluster that are z/VM virtual machines. This topic introduces the detailed steps on how to configure z/VM fencing agent. when a multi-node cluster is available, logon to any z/VM management node and run pcs stonith describe fence_zvmip to get fencing agent fence_zvmip detail information.

Create user and set up SMAPI authorization

  1. Create user by 3270.

    Example:

    USER ZCLUSTER xxxxxxxx 32M 128M G
    INCLUDE IBMDFLT
    IPL CMS
    MDISK 0191 3390 AUTOV 10 T513T1 MR
    

    Note: Logon user ID by 3270 and change the expired password that is generated when create this user ID to make authorization validate pass.

  2. Add to z/VM Directory by 3270.

    DIRM ADD ZCLUSTER

  3. Set LPAR Authorization Policy.

    Authorization_Policy = Authorization_Policy_AuthlistOnly

  4. Configure file VSMWORK1 NAMELIST.

    Make sure NAMELIST Setting has the following priority:

    :nick.ZVM_FENCE
    :list.
    IMAGE_ACTIVATE
    IMAGE_DEACTIVATE
    IMAGE_STATUS_QUERY
    CHECK_AUTHENTICATION
    
  5. Configure AUTHLIST.

    Following information in pcs stonith describe fence_zvmip configure the AUTHLIST

    Following the description of fencing agent `fence_zvmip` detail information, to use this agent the z/VM SMAPI service needs to be configured to allow the virtual machine running this agent to connect to it and issue
    the image_recycle operation.  This involves updating the VSMWORK1 AUTHLIST VMSYS:VSMWORK1. file. The entry should look something similar to
    this:
    
    Column 1                   Column 66                Column 131
       |                          |                        |
       V                          V                        V
    XXXXXXXX                     ALL                    IMAGE_CHARACTERISTICS
    
    Where XXXXXXX is the name of the virtual machine used in the authuser field of the request. This virtual machine also has to be authorized
    to access the system's directory manager.
    

    User ID AUTHLIST configuration example

    COLUMN 1              COLUMN 66           COLUMN 131
      |                      |                   |
      |                      |                   |
      |                      |                   |
      V                      V                   V
    DO.NOT.REMOVE        DO.NOT.REMOVE         DO.NOT.REMOVE
    ZCLUSTER                ALL                  ZVM_FENCE
    

Notes: step 1 to step 4 setup accomplished out of IBM® Cloud Infrastructure Center. Configuration takes effect on all virtual machines in LAPR.

Validate up SMAPI authorization

  1. Ssh to the virtual machine within the same LPAR as the cluster management node, that must have installed smcli and enabled authlist .

  2. Get the user and password setup in step 1 in Create user and set up SMAPI authorization.

  3. smcli Check_Authentication -u ZCLUSTER -p < passsword > command validates authorization in this virtual machine and check results.

    Command result examples.

    Authentication successful

    [root@compute ~]# smcli Check_Authentication -u ZCLUSTER -p < passsword &gt
    Done
    

    Password does not match with user ID

     [root@compute ~]# smcli Check_Authentication -u ZCLUSTER -p < passsword &gt
     Validating userid/password pair ...
     Failed
     Return Code: 120
     Reason Code: 0
     Description: ULGSMC5120E Authentication error; userid or password not valid
     API issued : Check_Authentication
    

    Authentication configuration failed

    [root@compute ~]# smcli Check_Authentication -u ZCLUSTER  -p < passsword &gt
    Validating userid/password pair ...
    Failed
    Return Code: 100
    Reason Code: 16
    Description: ULGSMC5100E Request not authorized by server
    API issued : Check_Authentication
    

    Notes: The SMAPI authorization failure leads to the fencing agent being in stop or failure status, it can even lead to the user ID being revoked.

Enable stonith monitor

Ssh login to management node in cluster and enable stonith by command **`pcs property set stonith-enabled=true`** then check configuration by command **`pcs property config`**,
if result includes **`stonith-enabled: true`** means that stonith is enabled. 
Example
```

[root@root@management ~]# pcs property config Cluster Properties: cluster-infrastructure: corosync cluster-name: hatest7 cluster-recheck-interval: 30min dc-version: 2.1.2-4.el8-ada5c3b36e2 have-watchdog: false no-quorum-policy: ignore placement-strategy: balanced start-failure-is-fatal: false stonith-enabled: true


## Create Resources
  Create fence_zvmip fencing agent by command :
  `pcs stonith create zvm-smapi1 fence_zvmip ip=zvm-smapi1.example.com username="ZCLUSTER" password="< passsword &gt" pcmk_host_map="node1:RHELHA-1;node2:RHELHA-2;node3:RHELHA-3"`

**Notes**:
1. `pcmk_host_map="<NODENAME>:<IMAGE>[;<NODENAME>:<IMAGE>...]" ` the **`NODENAME`** reference  **`Node List`** item in result of command **`pcs cluster status`**.
  
  Example:
  nodes: **`node1-ip-address node2-ip-address node3-ip-address`**.

  ```
  [root@management ~]#  pcs cluster status
  Cluster Status:
   Cluster Summary:
     * Stack: corosync
     * Current DC: node1-ip-address (version 2.1.2-4.el8_6.3-ada5c3b36e2) - partition with quorum
     * Last updated: Wed Jun  7 03:04:06 2023
     * Last change:  Fri Jun  2 03:58:34 2023 by hacluster via crmd on node1-ip-address
     * 3 nodes configured
     * 105 resource instances configured (9 DISABLED)
  Node List:
    * Online: [ node1-ip-address node2-ip-address node3-ip-address ]
  PCSD Status:
    node2-ip-address: Online
    node1-ip-address: Online
    node3-ip-address: Online
  ```

2. `pcmk_host_map="<NODENAME>:<IMAGE>[;<NODENAME>:<IMAGE>...]" `. **`IMAGE`** is user ID of management nodes in cluster, ssh login management node in cluster and get user ID value by **`vmcp q userid`**

  Example: 
  ```
  [root@management ~]# vmcp q userid
  IAAS06A0 AT BOEM5403
  ```
  IAAS06A0 is user ID of the management node.

3. **`pcmk_reboot_action and pcmk_off_action`**
  These two parameters can be set value from **`reboot`** and **`off`**. The **`reboot`** value means restart OS when triggering the operation, **`Off`** values shutoff OS when triggering off the operation.
  **`pcmk_reboot_action`** default value is **`reboot`**, **`pcmk_off_action`** default value is **`off`**.
  **`pcmk_reboot_action/pcmk_off_action`** value can be specified to change default value when created or updated by **`pcmk_reboot_action=off`** or **`pcmk_off_action=reboot`**. 

4. The Fencing agent needs to be created based on the management node's location. The single fencing agent for cluster is ok when all the management nodes are located within same LPAR, but if the management nodes are located in different LPARs multiple fencing agents are needed.
 The [Create user and set up SMAPI authorization](#introduction__create-user-and-set-up-smapi-authorization) step needs to be performed on all LPARs.
 Example:
  ```
     Node List:
    * Online: [ node1-ip-address node2-ip-address node3-ip-address ] 
  ```
  Node 'node1-ip-address' query information from step 2 is **`IAAS06A0 AT BOEM5403`**.

  Node 'node2-ip-address' query information from step 2 is **`IAAS0609 AT BOEM5404`**.

  Node 'node3-ip-address' query information from step 2 is **`IAAS0610 AT BOEM5404`**.

  There are 3 management nodes that are located in 2 different LPARS, **`BOEM5403`**, and **`BOEM5404`**, so you need create 2 fencing agents to accomplish Stonith.
  `pcs stonith create zvm-smapi-BOEM5403  fence_zvmip ip=zvm-smapi1.BOEM5403.com username="ZCLUSTER" password="<passsword>" pcmk_host_map="node1-ip-address:IAAS06A0"` and `pcs stonith create zvm-smapi-BOEM5404  fence_zvmip ip=zvm-smapi1.BOEM5404.com username="ZCLUSTER" password="<passsword>" pcmk_host_map="node2-ip-address:IAAS0609;node3-ip-address:IAAS0610"`  

**Note**:
  Reference result of command **` pcs stonith -h`** to get detail fencing agent Usage. 

## Update Resources
Update fencing agent configuration, reference detail information by **`pcs stonith update -h`**.
Example:
  `pcs stonith update fencing-agent  pcmk_host_map="node1:RHELHA-1;node2:RHELHA-2;node3:RHELHA-3"`

## Show fencing agent configuration
 Show fencing agent configuration by **`pcs stonith config`**.
 Example:
    [root@management ~]#  pcs stonith config
    Resource: wngzhetest (class=stonith type=fence_zvmip)
    Attributes: inet4_only=1 ip=zvm-smapi1.BOEM5403.com password="<passsword>" pcmk_host_map=node1-ip-address:IAAS06A0;node2-ip-address:IAAS0609;node3-ip-address:IAAS0610 username=ZCLUSTER
    Operations: monitor interval=60s (wngzhetest-monitor-interval-60s)

## Manually fence node in cluster
Manually fence command **`pcs stonith fence node3-ip-address`**, triggers **`pcmk_reboot_action`** for node **`node3-ip-address`** in cluster. Manually fence command **`pcs stonith fence node3-ip-address --off`** trigger the **`pcmk_off_action`** to node **`node3-ip-address`** in cluster.
The **`pcmk_reboot_action/pcmk_off_action`** configuration value setting when create or update.
Example:

pcs stonith create wngzhetest fence_zvmip inet4_only=1 ip=zvm-smapi1.BOEM5403.com username="ZCLUSTER" password="< passsword &gt" pcmk_host_map="node1-ip-address:IAAS06A0;node2-ip-address:IAAS069F;node3-ip-address:IAAS0610" pcmk_reboot_action=off

Or

pcs stonith update wngzhetest fence_zvmip pcmk_off_action=reboot

## Automate fence node in cluster
The Fence agent automates fence node in cluster when nodes offline to avoid problem of brain split based on these configurations. 
First refer to [Enable stonith monitor](#introduction__enable-stonith-monitor) to enable stonith monitor. Then refer to [Create Resources](#introduction__create-resources) to create fencing agent.
Finally, refer to [Check fencing agent status](#introduction__check-fencing-agent-status) to check that the fencing agent status is started.

Started fencing agent monitor the management nodes in cluster and perform action **`pcmk_reboot_action`** when find node offline.
The **`pcmk_reboot_action`** action type refers step 3 **`pcmk_reboot_action and pcmk_off_action`** part in [Create Resources](#introduction__create-resources).

## Enable/disable fencing agent
Stop fencing agent by command **` pcs stonith disable fencing-agent-id`**, 
Start fencing agent by command **`pcs stonith enable fencing-agent-id`**.

## Check fencing agent status
Show the fencing agent status by command **`pcs stonith status`**
example:
Fencing agent is stopped.
 ```
 [root@management ~]# pcs stonith status
   * wngzhetest (stonith:fence_zvmip):   Stopped (disabled)
 ```
Fencing agent is started.
  ```
 [root@management ~]# pcs stonith status
   * wngzhetest (stonith:fence_zvmip):   Started node2-ip-address
 ```

## Delete fencing agent
Use **`pcs stonith delete -h`** to delete the fencing agent.