Use this information to manually restore the monitor pods Fusion Data Foundation, when necessary.
About this task
Restore the monitor pods if all three of them go down, and when Fusion Data Foundation is not able to recover the monitor pods
automatically.
Note: This is a disaster recovery procedure and must be performed under the guidance of the IBM
support team. Contact IBM support team.
Procedure
- Scale down the
rook-ceph-operator
and
ocs-operator
.
oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
oc scale deployment ocs-operator --replicas=0 -n openshift-storage
-
Create a backup of all deployments in
openshift-storage
namespace.
mkdir backup
cd backup
oc project openshift-storage
for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; done
- Patch the MDS deployments to remove the livenessProbe parameter and
run it with the command parameter as sleep.
for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done
- Retrieve the
monstore
cluster map from all the OSDs.
- Create the recover_mon.sh script.
!/bin/bash
ms=/tmp/monstore
rm -rf $ms
mkdir $ms
for osd_pod in $(oc get po -l app=rook-ceph-osd -oname -n openshift-storage); do
echo "Starting with pod: $osd_pod"
podname=$(echo $osd_pod|sed 's/pod\///g')
oc exec $osd_pod -- rm -rf $ms
oc cp $ms $podname:$ms
rm -rf $ms
mkdir $ms
echo "pod in loop: $osd_pod ; done deleting local dirs"
oc exec $osd_pod -- ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-$(oc get $osd_pod -ojsonpath='{ .metadata.labels.ceph_daemon_id }') --op update-mon-db --no-mon-config --mon-store-path $ms
echo "Done with COT on pod: $osd_pod"
oc cp $podname:$ms $ms
echo "Finished pulling COT data from pod: $osd_pod"
done
- Run the recover_mon.sh script.
chmod +x recover_mon.sh
./recover_mon.sh
- Patch the MON deployments, and run it with the command parameter as
sleep.
- Edit the MON deployments.
for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; done
- Patch the MON deployments to increase the
initialDelaySeconds.
oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
- Copy the previously retrieved
monstore
to the mon-a pod.
oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/
- Navigate into the MON pod and change the ownership of the retrieved
monstore
.
oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
chown -R ceph:ceph /tmp/monstore
- Copy the keyring template file before rebuilding the mon db.
oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
cp /etc/ceph/keyring-store/keyring /tmp/keyring
cat /tmp/keyring
[mon.]
key = AQCleqldWqm5IhAAgZQbEzoShkZV42RiQVffnA==
caps mon = "allow *"
[client.admin]
key = AQCmAKld8J05KxAArOWeRAw63gAwwZO5o75ZNQ==
auid = 0
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *”
- Identify the keyring of all other Ceph daemons (MGR, MDS, RGW, Crash, CSI and CSI
provisioners) from its respective
secrets.
oc get secret rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-keyring -ojson | jq .data.keyring | xargs echo | base64 -d
[mds.ocs-storagecluster-cephfilesystem-a]
key = AQB3r8VgAtr6OhAAVhhXpNKqRTuEVdRoxG4uRA==
caps mon = "allow profile mds"
caps osd = "allow *"
caps mds = "allow"
Example keyring file,
/etc/ceph/ceph.client.admin.keyring:
[mon.]
key = AQDxTF1hNgLTNxAAi51cCojs01b4I5E6v2H8Uw==
caps mon = "allow "
[client.admin]
key = AQDxTF1hpzguOxAA0sS8nN4udoO35OEbt3bqMQ==
caps mds = "allow "
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"
[mds.ocs-storagecluster-cephfilesystem-a]
key = AQCKTV1horgjARAA8aF/BDh/4+eG4RCNBCl+aw==
caps mds = "allow"
caps mon = "allow profile mds"
caps osd = "allow *"
[mds.ocs-storagecluster-cephfilesystem-b]
key = AQCKTV1hN4gKLBAA5emIVq3ncV7AMEM1c1RmGA==
caps mds = "allow"
caps mon = "allow profile mds"
caps osd = "allow *"
[client.rgw.ocs.storagecluster.cephobjectstore.a]
key = AQCOkdBixmpiAxAA4X7zjn6SGTI9c1MBflszYA==
caps mon = "allow rw"
caps osd = "allow rwx"
[mgr.a]
key = AQBOTV1hGYOEORAA87471+eIZLZtptfkcHvTRg==
caps mds = "allow *"
caps mon = "allow profile mgr"
caps osd = "allow *"
[client.crash]
key = AQBOTV1htO1aGRAAe2MPYcGdiAT+Oo4CNPSF1g==
caps mgr = "allow rw"
caps mon = "allow profile crash"
[client.csi-cephfs-node]
key = AQBOTV1hiAtuBBAAaPPBVgh1AqZJlDeHWdoFLw==
caps mds = "allow rw"
caps mgr = "allow rw"
caps mon = "allow r"
caps osd = "allow rw tag cephfs *="
[client.csi-cephfs-provisioner]
key = AQBNTV1hHu6wMBAAzNXZv36aZJuE1iz7S7GfeQ==
caps mgr = "allow rw"
caps mon = "allow r"
caps osd = "allow rw tag cephfs metadata="
[client.csi-rbd-node]
key = AQBNTV1h+LnkIRAAWnpIN9bUAmSHOvJ0EJXHRw==
caps mgr = "allow rw"
caps mon = "profile rbd"
caps osd = "profile rbd"
[client.csi-rbd-provisioner]
key = AQBNTV1hMNcsExAAvA3gHB2qaY33LOdWCvHG/A==
caps mgr = "allow rw"
caps mon = "profile rbd"
caps osd = "profile rbd"
Important:
- For
client.csi
related keyring, refer to the previous keyring file output and
add the default caps
after fetching the key from its respective Fusion Data Foundation secret.
- OSD keyring is added automatically post recovery.
- Navigate into the mon-a pod, and verify that the
monstore
has
monmap
.
- Navigate into the mon-a pod.
oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
- Verify that the
monstore
has monmap
.
ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
monmaptool /tmp/monmap --print
- Optional: If the
monmap
is missing then create a new
monmap
.
monmaptool --create --add <mon-a-id><mon-a-ip> --add <mon-b-id><mon-b-ip> --add <mon-c-id><mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>
-
<mon-a-id>
- Is the ID of the mon-a pod.
-
<mon-a-ip>
- Is the IP address of the mon-a pod.
-
<mon-b-id>
- Is the ID of the mon-b pod.
-
<mon-b-ip>
- Is the IP address of the mon-b pod.
-
<mon-c-id>
- Is the ID of the mon-c pod.
-
<mon-c-ip>
- Is the IP address of the mon-c pod.
-
<fsid>
- Is the file system ID.
- Verify the
monmap
.
monmaptool /root/monmap --print
- Import the
monmap
.
Important: Use the previously created keyring file.
ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmap
chown -R ceph:ceph /tmp/monstore
- Create a backup of the old
store.db
file.
mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corrupted
mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corrupted
mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corrupted
- Copy the rebuild store.db file to the monstore
directory.
mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.db
chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.db
- After rebuilding the monstore directory, copy the
store.db file from local to the rest of the MON pods, where
<id> is the ID of the MON pod.
oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.db
oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>
- Navigate into the rest of the MON pods and change the ownership of the copied
monstore
, where <id> is the ID of the MON pod.
oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)
chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.db
- Revert the patched changes.
Important: Ensure that the MON, MGR, and OSD pods are up and running.
- For MON deployments
- Use the following command, where <mon-deployment.yaml> is the MON
deployment yaml
file.
oc replace --force -f <mon-deployment.yaml>
- For OSD deployments:
- Use the following command, where <osd-deployment.yaml> is the OSD
deployment yaml
file.
oc replace --force -f <osd-deployment.yaml>
- For MGR deployments
- Use the following command, where <mgr-deployment.yaml> is the MGR
deployment yaml
file.
oc replace --force -f <mgr-deployment.yaml>
- Scale up the
rook-ceph-operator
and ocs-operator
deployments.
oc -n openshift-storage scale deployment ocs-operator --replicas=1
What to do next
- Check the Ceph status to confirm that CephFS is running.
ceph -s
Example
output:
cluster:
id: f111402f-84d1-4e06-9fdb-c27607676e55
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds
3 daemons have recently crashed
services:
mon: 3 daemons, quorum b,c,a (age 15m)
mgr: a(active, since 14m)
mds: ocs-storagecluster-cephfilesystem:0
osd: 3 osds: 3 up (since 15m), 3 in (since 2h)
data:
pools: 3 pools, 96 pgs
objects: 500 objects, 1.1 GiB
usage: 5.5 GiB used, 295 GiB / 300 GiB avail
pgs: 96 active+clean
Important: If the filesystem is offline or MDS
service is missing, you need to restore the CephFS. For more information, see
Restoring the CephFS.
- Check the Multicloud Object Gateway (MCG) status. It should be active, and the backingstore and
bucketclass should be in Ready
state.
noobaa status -n openshift-storage
Note: If the MCG is not in the active
state, and the backingstore and bucketclass not in the
Ready
state, you need to
restart all the MCG related pods. For more information, see
Restoring the Multicloud Object
Gateway.