KVM Hypervisor

Live Guest Migration of Red Hat OpenShift worker nodes using as a shared file system

One deployment scenario of Red Hat OpenShift on IBM Z is to install the complete RHOCP cluster in an LPAR using the KVM hypervisor (see Figure 1). For the cluster to survive a reboot of the LPAR, it would be preferable if the RHOCP nodes could be live migrated to another LPAR allowing for continuous operation of the RHOCP cluster. It is described how this can be achieved by using the two LPAR also as a 2-node Scale Cluster and the Scale file system as a shared file system containing the image files of RHOCP nodes. A worker node serves as example to demonstrate live migration (see Figure 2).

Figure 1. RHOCP installation with all nodes on a private LAN in an LPAR
OCP installation with all nodes on a private Lan in a LPAR
Figure 2. RHOCP installation with one node in a different LPAR
RHOCP installation with one node in a different LPAR

Prerequisites

  1. A minimum of two LPARs with RHEL 9.X
  2. A Scale cluster consisting of these LPARs and Scale file system that is configured on this cluster
  3. A SAN storage device that is attached to all the LPARs
  4. An RHOCP cluster installed in one of the LPARs with a private Linux bridge for the intra cluster communication

Overview of the required steps

  1. Connect the two LPARs with, for example, a VXLAN tunnel
  2. Configure (at least) one worker node to use the Scale file system for its virtual disk image
  3. Migrate the worker node to the other LPAR
  4. Verify that the cluster still works and the nodes can communicate with each other

Detailed steps

  1. Connect the two LPARs with, for example, a VXLAN tunnel

    LPAR 1

    nmcli connection add type bridge con-name boessl6 ifname boessl6 ipv4.method disabled ipv6.method disabled
    
    nmcli connection add type vxlan slave-type bridge con-name boessl6-vxlan10 ifname vxlan10 id 10 local 172.20.145.57 remote 172.20.145.56 master boessl6
    
    nmcli connection up boessl6
    
    ifconfig  boessl7 192.168.128.1

    LPAR 2

    nmcli connection add type bridge con-name boessl7 ifname boessl7 ipv4.method disabled ipv6.method disabled
    
    nmcli connection add type vxlan slave-type bridge con-name boessl7-vxlan10 ifname vxlan10 id 10 local 172.20.145.56 remote 172.20.145.57 master boessl7
    
    nmcli connection up boessl7
    
    ifconfig  boessl7 192.168.128.2
  2. Configure (at least) one worker node to use the Scale file system for its virtual disk image
    1. Shutdown the worker node

      Copy the disk images

      mkdir -p /gpfs/fs1/var/lib/libvirt/images/ocp413/
      cp /var/lib/libvirt/images/ocp413/worker.ign /gpfs/fs1/var/lib/libvirt/images/ocp413/
      
      cp /var/lib/libvirt/images/ocp413/worker2.qcow2 /gpfs/fs1/var/lib/libvirt/images/ocp413/

      Change the domain XML

      @@ -22,13 +22,13 @@
           <emulator>/usr/libexec/qemu-kvm</emulator>
           <disk type='file' device='disk'>
             <driver name='qemu' type='qcow2'/>
      -      <source file='/var/lib/libvirt/images/ocp413/worker2.qcow2'/>
      +      <source file='/gpfs/fs1/var/lib/libvirt/images/ocp413/worker2.qcow2'/>
             <target dev='vda' bus='virtio'/>
             <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
           </disk>
           <disk type='file' device='disk'>
             <driver name='qemu' type='raw'/>
      -      <source file='/var/lib/libvirt/images/ocp413/worker.ign' startupPolicy='optional'/>
      +      <source file='/gpfs/fs1/var/lib/libvirt/images/ocp413/worker.ign' startupPolicy='optional'/>
             <target dev='vdb' bus='virtio'/>
             <readonly/>
             <serial>ignition</serial>
    2. Boot the worker node

      Migrate the worker node to the other LPAR

      virsh migrate ocp413-worker2 qemu+ssh://boessl6.gpfs.boe/system

      Migrate the worker node back to the original LPAR

      virsh migrate ocp413-worker2 qemu+ssh://boessl7.gpfs.boe/system
    3. Verify that the cluster still works and the nodes can communicate with each other.