Restoring the monitor pods in Fusion Data Foundation

Use this information to manually restore the monitor pods Fusion Data Foundation, when necessary.

About this task

Restore the monitor pods if all three of them go down, and when Fusion Data Foundation is not able to recover the monitor pods automatically.

Note: This is a disaster recovery procedure and must be performed under the guidance of the IBM support team. Contact IBM support team.

Procedure

  1. Scale down the rook-ceph-operator and ocs-operator.
    oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
    oc scale deployment ocs-operator --replicas=0 -n openshift-storage
  2. Create a backup of all deployments in openshift-storage namespace.
    mkdir backup
    cd backup
    oc project openshift-storage
    for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; done
  3. Patch the MDS deployments to remove the livenessProbe parameter and run it with the command parameter as sleep.
    for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done
  4. Retrieve the monstore cluster map from all the OSDs.
    1. Create the recover_mon.sh script.
      !/bin/bash
      ms=/tmp/monstore
      
      rm -rf $ms
      mkdir $ms
      
      for osd_pod in $(oc get po -l app=rook-ceph-osd -oname -n openshift-storage); do
      
        echo "Starting with pod: $osd_pod"
      
        podname=$(echo $osd_pod|sed 's/pod\///g')
        oc exec $osd_pod -- rm -rf $ms
        oc cp $ms $podname:$ms
      
        rm -rf $ms
        mkdir $ms
      
        echo "pod in loop: $osd_pod ; done deleting local dirs"
      
        oc exec $osd_pod -- ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-$(oc get $osd_pod -ojsonpath='{ .metadata.labels.ceph_daemon_id }') --op update-mon-db --no-mon-config --mon-store-path $ms
        echo "Done with COT on pod: $osd_pod"
      
        oc cp $podname:$ms $ms
      
        echo "Finished pulling COT data from pod: $osd_pod"
      done
    2. Run the recover_mon.sh script.
      chmod +x recover_mon.sh
      ./recover_mon.sh
  5. Patch the MON deployments, and run it with the command parameter as sleep.
    1. Edit the MON deployments.
      for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; done
    2. Patch the MON deployments to increase the initialDelaySeconds.
      oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
      oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
      oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
  6. Copy the previously retrieved monstore to the mon-a pod.
    oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/
  7. Navigate into the MON pod and change the ownership of the retrieved monstore.
    oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
     chown -R ceph:ceph /tmp/monstore
  8. Copy the keyring template file before rebuilding the mon db.
    oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
     cp /etc/ceph/keyring-store/keyring /tmp/keyring
     cat /tmp/keyring
      [mon.]
        key = AQCleqldWqm5IhAAgZQbEzoShkZV42RiQVffnA==
        caps mon = "allow *"
      [client.admin]
        key = AQCmAKld8J05KxAArOWeRAw63gAwwZO5o75ZNQ==
        auid = 0
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow *"
        caps osd = "allow *”
  9. Identify the keyring of all other Ceph daemons (MGR, MDS, RGW, Crash, CSI and CSI provisioners) from its respective secrets.
    oc get secret rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-keyring -ojson  | jq .data.keyring | xargs echo | base64 -d
    
    [mds.ocs-storagecluster-cephfilesystem-a]
    key = AQB3r8VgAtr6OhAAVhhXpNKqRTuEVdRoxG4uRA==
    caps mon = "allow profile mds"
    caps osd = "allow *"
    caps mds = "allow"
    Example keyring file, /etc/ceph/ceph.client.admin.keyring:
    [mon.]
            key = AQDxTF1hNgLTNxAAi51cCojs01b4I5E6v2H8Uw==
            caps mon = "allow "
    [client.admin]
            key = AQDxTF1hpzguOxAA0sS8nN4udoO35OEbt3bqMQ==
            caps mds = "allow "
            caps mgr = "allow *"
            caps mon = "allow *"
            caps osd = "allow *"
    [mds.ocs-storagecluster-cephfilesystem-a]
            key = AQCKTV1horgjARAA8aF/BDh/4+eG4RCNBCl+aw==
            caps mds = "allow"
            caps mon = "allow profile mds"
            caps osd = "allow *"
    [mds.ocs-storagecluster-cephfilesystem-b]
            key = AQCKTV1hN4gKLBAA5emIVq3ncV7AMEM1c1RmGA==
            caps mds = "allow"
            caps mon = "allow profile mds"
            caps osd = "allow *"
    [client.rgw.ocs.storagecluster.cephobjectstore.a]
            key = AQCOkdBixmpiAxAA4X7zjn6SGTI9c1MBflszYA==
            caps mon = "allow rw"
            caps osd = "allow rwx"
    [mgr.a]
            key = AQBOTV1hGYOEORAA87471+eIZLZtptfkcHvTRg==
            caps mds = "allow *"
            caps mon = "allow profile mgr"
            caps osd = "allow *"
    [client.crash]
            key = AQBOTV1htO1aGRAAe2MPYcGdiAT+Oo4CNPSF1g==
            caps mgr = "allow rw"
            caps mon = "allow profile crash"
    [client.csi-cephfs-node]
            key = AQBOTV1hiAtuBBAAaPPBVgh1AqZJlDeHWdoFLw==
            caps mds = "allow rw"
            caps mgr = "allow rw"
            caps mon = "allow r"
            caps osd = "allow rw tag cephfs *="
    [client.csi-cephfs-provisioner]
            key = AQBNTV1hHu6wMBAAzNXZv36aZJuE1iz7S7GfeQ==
            caps mgr = "allow rw"
            caps mon = "allow r"
            caps osd = "allow rw tag cephfs metadata="
    [client.csi-rbd-node]
            key = AQBNTV1h+LnkIRAAWnpIN9bUAmSHOvJ0EJXHRw==
            caps mgr = "allow rw"
            caps mon = "profile rbd"
            caps osd = "profile rbd"
    [client.csi-rbd-provisioner]
            key = AQBNTV1hMNcsExAAvA3gHB2qaY33LOdWCvHG/A==
            caps mgr = "allow rw"
            caps mon = "profile rbd"
            caps osd = "profile rbd"
    Important:
    • For client.csi related keyring, refer to the previous keyring file output and add the default caps after fetching the key from its respective Fusion Data Foundation secret.
    • OSD keyring is added automatically post recovery.
  10. Navigate into the mon-a pod, and verify that the monstore has monmap.
    1. Navigate into the mon-a pod.
      oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
    2. Verify that the monstore has monmap.
       ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
       monmaptool /tmp/monmap --print
  11. Optional: If the monmap is missing then create a new monmap.
     monmaptool --create --add <mon-a-id><mon-a-ip> --add <mon-b-id><mon-b-ip> --add <mon-c-id><mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>
    <mon-a-id>
    Is the ID of the mon-a pod.
    <mon-a-ip>
    Is the IP address of the mon-a pod.
    <mon-b-id>
    Is the ID of the mon-b pod.
    <mon-b-ip>
    Is the IP address of the mon-b pod.
    <mon-c-id>
    Is the ID of the mon-c pod.
    <mon-c-ip>
    Is the IP address of the mon-c pod.
    <fsid>
    Is the file system ID.
  12. Verify the monmap.
     monmaptool /root/monmap --print
  13. Import the monmap.
    Important: Use the previously created keyring file.
     ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmap
     chown -R ceph:ceph /tmp/monstore
  14. Create a backup of the old store.db file.
    mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corrupted
    mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corrupted
    mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corrupted
  15. Copy the rebuild store.db file to the monstore directory.
    mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.db
    chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.db
  16. After rebuilding the monstore directory, copy the store.db file from local to the rest of the MON pods, where <id> is the ID of the MON pod.
    oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.db
    oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>
  17. Navigate into the rest of the MON pods and change the ownership of the copied monstore, where <id> is the ID of the MON pod.
    oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)
    chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.db
  18. Revert the patched changes.
    Important: Ensure that the MON, MGR, and OSD pods are up and running.
    For MON deployments
    Use the following command, where <mon-deployment.yaml> is the MON deployment yaml file.
    oc replace --force -f <mon-deployment.yaml>
    For OSD deployments:
    Use the following command, where <osd-deployment.yaml> is the OSD deployment yaml file.
    oc replace --force -f <osd-deployment.yaml>
    For MGR deployments
    Use the following command, where <mgr-deployment.yaml> is the MGR deployment yaml file.
    oc replace --force -f <mgr-deployment.yaml>
  19. Scale up the rook-ceph-operator and ocs-operator deployments.
    oc -n openshift-storage scale deployment ocs-operator --replicas=1

What to do next

  1. Check the Ceph status to confirm that CephFS is running.
    ceph -s
    Example output:
    cluster:
       id:     f111402f-84d1-4e06-9fdb-c27607676e55
       health: HEALTH_ERR
                1 filesystem is offline
                1 filesystem is online with fewer MDS than max_mds
                3 daemons have recently crashed
    
       services:
         mon: 3 daemons, quorum b,c,a (age 15m)
         mgr: a(active, since 14m)
         mds: ocs-storagecluster-cephfilesystem:0
         osd: 3 osds: 3 up (since 15m), 3 in (since 2h)
    
       data:
         pools:   3 pools, 96 pgs
         objects: 500 objects, 1.1 GiB
         usage:   5.5 GiB used, 295 GiB / 300 GiB avail
         pgs:     96 active+clean
    Important: If the filesystem is offline or MDS service is missing, you need to restore the CephFS. For more information, see Restoring the CephFS.
  2. Check the Multicloud Object Gateway (MCG) status. It should be active, and the backingstore and bucketclass should be in Ready state.
    noobaa status -n openshift-storage
    Note: If the MCG is not in the active state, and the backingstore and bucketclass not in the Ready state, you need to restart all the MCG related pods. For more information, see Restoring the Multicloud Object Gateway.