Virtual machine failover

In IBM Fusion, you can achieve automatic failover during unplanned node failures by using MachineHealthCheck and Self Node Remediation operators.

About this task

A node can be shutdown in a planned graceful way or unexpectedly because of reasons such as power outage or other external factors. A node shutdown could lead to workload failure if the node is not drained before the shutdown. A node shutdown can be either graceful or non-graceful.

Without VM failover automation, a worker node failure leaves the VMs on that node in a state of uncertainty, waiting for recovery. This downtime can cause the application to become unavailable until the node is back online. After you set up the failover mentioned in this procedure, the VMs running on the failed node are automatically migrated to another available healthy node. It ensures high availability and minimizes downtime without user intervention. It does not cover live migration for maintenance or manual node migrations. For more information about live migration in IBM FusionIBM Fusion HCI, see Live migration for virtual machines.

In the event of a non-graceful node shutdown, you can also manually initiate a failover by stopping the kubelet service on the node.
sudo systemctl stop kubelet.service && sleep 360s && sudo systemctl start kubelet.service
Important: Do not use the examples that are mentioned in this procedure as it is; instead, replace it with values relevant to you.

Procedure

  1. Check whether the Virtualization operator is in a healthy state on the cluster.
  2. Install the Self Node Remediation (SNR) operator by Red Hat® on the cluster.

    Self Node Remediation Operator ( SNR) is used as remediation operators to fence and recover the affected nodes and workloads.The node health check operator automatically installs this operator. If it is not installed, go to Operator hub and install the Self Node Remediation Operator.

    For more information about self node remediation, see Red Hat Documentation.

    To deploy in an offline environment, mirror these operator images in the Red Hat operator catalog index image.

  3. Update the SelfNodeRemediationConfig Custom Resource (CR) and create the SelfNodeRemediationTemplate (CR) with strategy as OutOfServiceTaint.
    By default, when Self Node Remediation operator is installed, it creates the default remediation template that uses the automatic strategy.
    Note: Update this strategy to OutOfServiceTaint. You want the nodes to be tainted to out of service if any node goes down.
    Do the following steps to update the Self Node Remediation Operator strategy:
    1. In the Operators > Installed Operators, search and open the Self Node Remediation Operator.
    2. Go to the Self Node Remediation Template tab.
    3. Click Create SelfNodeRemediationTemplate to create the CR of SNR operator in the openshift-machine-api namespace.
      By default, the SelfNodeRemediationTemplate is created in the openshift-workload-availability namespace. Create the template within the openshift-machine-api.
    4. Update spec.template.spec.remediationStrategy to OutOfServiceTaint.
      Create a template under openshift-machine-api namespace similar to the following example:
      apiVersion: self-node-remediation.medik8s.io/v1alpha1
      kind: SelfNodeRemediationTemplate
      metadata:
        annotations:
          remediation.medik8s.io/multiple-templates-support: 'true'
        name: self-node-remediation-oost-strategy-template
        namespace: openshift-machine-api
        labels:
          remediation.medik8s.io/default-template: 'true'
      spec:
        template:
          spec:
            remediationStrategy: OutOfServiceTaint
    5. Go to the Self Node Remediation Config tab.
    6. Open the self node remediation config file to update it.
    7. Open the YAML tab.
      When the SNR is installed, it creates the SelfNodeRemediationConfig CR that contains the configuration information.

      Example:

      apiVersion: self-node-remediation.medik8s.io/v1alpha1
      kind: SelfNodeRemediationConfig
      metadata:
        generation: 2
      name: self-node-remediation-config
        namespace: openshift-workload-availability
        uid: b0c8e0b7-e433-47e3-94d2-e5ebf98da085
      spec:
        apiServerTimeout: 5s
        peerApiServerTimeout: 5s
        hostPort: 30001
        isSoftwareRebootEnabled: false
        watchdogFilePath: /dev/watchdog
        peerDialTimeout: 5s
        peerUpdateInterval: 15m
        apiCheckInterval: 15s
        peerRequestTimeout: 5s
        maxApiErrorThreshold: 3
      Note: By default, when the SNR is installed, it creates the SelfNodeRemediationConfigCR. For more information about the configuration, see Red Hat documentation. Update the spec.isSoftwareRebootEnabled state to false so that the node does not reboot. During node failure, the node must be tainted to prevent scheduling of new pods without rebooting.
  4. From the OpenShift menu, go to Compute > MachineHealthChecks and create a new Machine Health Check (MHC) under your namespace.
    The Machine Health Check (MHC) checks the health of nodes and triggers the remediation agents if the node status is Unknown or NotReady for a defined amount of time. For more information about the procedure, see Red Hat Documentation.
  5. Open the MachineHealthChecks file details and click YAML.
    Example MachineHealthCheck CR:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineHealthCheck
    metadata:
      name: machinehealthcheck
      namespace: openshift-machine-api
    spec:
      maxUnhealthy: 40% # Allow up to 40% of worker nodes to be unhealthy at once before remediation
      nodeStartupTimeout: 20m #node does not join the cluster after the nodeStartupTimeout, the machine is remediated.
      remediationTemplate:
        apiVersion: self-node-remediation.medik8s.io/v1alpha1
        kind: SelfNodeRemediationTemplate
        name: self-node-remediation-oost-strategy-template
        namespace: openshift-machine-api
      selector:
        matchLabels:
          machine.openshift.io/cluster-api-cluster: <cluster_name>
          machine.openshift.io/cluster-api-machine-role:
          machine.openshift.io/cluster-api-machine-type:
          machine.openshift.io/cluster-api-machineset:
      unhealthyConditions:
        - status: Unknown
          timeout: 10s # Trigger remediation if the node is NotReady for more than 10 seconds
          type: Ready
        - status: 'False'
          timeout: 10s
          type: Ready
    status:
    Note: For example, matchLabels for a m13 cluster is as follows and can be found in Compute > MachinesSet details.
    • machine.openshift.io/cluster-api-cluster=rackm13-rbzr6
    • machine.openshift.io/cluster-api-machine-role=worker
    • machine.openshift.io/cluster-api-machine-type=worker
    • machine.openshift.io/cluster-api-machineset=rackm13-rbzr6-worker-0
    • When you configure a MachineHealthCheck, update the values under the matchLabels based on the following guidance:
      machine.openshift.io/cluster-api-cluster
      Use cluster machineset name. To find the name, run the following command:
      oc get machineset -A
      machine.openshift.io/cluster-api-machine-role
      Specify the machine role. For example, master or worker.
      machine.openshift.io/cluster-api-machine-type
      Indicate the machine role. Use worker
      machine.openshift.io/cluster-api-machineset
      Combine cluster name, a custom label, and the zone. The syntax is <cluster_name>-<label>-<zone>.
    • The maxUnhealthy field controls when remediation is allowed, based on how many nodes are unhealthy in the cluster.MaxUnhealthy field in MachineHealthCheck specifies either an absolute number or a percentage of the total Machines checked by MachineHealthCheck.
      When maxUnhealthy is set to a fixed number, for example, 2:
      • Remediation occurs only whenever the number of unhealthy nodes are within the allowed limit. It must not be more than two. For example, If the cluster has five nodes and maxUnhealthy is set to 2, then whenever three nodes become unhealthy, remediation is not triggered.
      • If there are zero, one, or two unhealthy nodes, then remediation is allowed.
      • If there are three or more unhealthy nodes, then remediation is skipped.
      When maxUnhealthy is set as a percentage, for example, 40%:
      • The allowed number of unhealthy nodes is calculated as a percentage of total nodes. For instance, if a cluster consists of five nodes and maxUnhealthy is set to 40%, remediation is not be triggered when three nodes become unhealthy, as this matches the defined limit of 40% (3 nodes).
      • If there are unhealthy nodes ≤ 40% of the total nodes, then remediation is allowed.
      • If there are unhealthy nodes >40% of the total nodes, then remediation is skipped.
      The following table serves as a reference for the same example information:
      maxUnhealthy Setting Cluster Size Allowed Unhealthy Nodes Remediation Triggered When
      2 (fixed number) 5 nodes Up to 2 nodes ≤ 2 nodes unhealthy
      40% (percentage) 5 nodes Up to 2 nodes (40%) ≤ 2 nodes unhealthy
      Note: Remediation is triggered only when the count of unhealthy nodes remains within the threshold set by maxUnhealthy, whether specified as a fixed value or a percentage of the total nodes.

      This configuration ensures that the health check is applied only to the machines that match these specific labels.