Restoring ceph-monitor quorum in Fusion Data Foundation
In some circumstances, the ceph-mons might lose quorum. If the
mons cannot form quorum again, there is a manual procedure to get the quorum going
again. The only requirement is that, at least one mon must be healthy. Use this
information to remove the unhealthy mons from quorum and form a quorum again with a
single mon, then bring the quorum back to the original size.
About this task
An example use case is if you have three mons and lose quorum, you need to
remove the two bad mons from quorum, notify the good mon that it
is the only mon in quorum, and then restart the good mon.
Procedure
-
Stop the
rook-ceph-operatorso that themonsare not failed over when you are modifying themonmap.oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0 - Inject a new
monmap.Note: You must inject themonmapvery carefully. If run incorrectly, your cluster could be permanently destroyed. The Cephmonmapkeeps track of themonquorum. Themonmapis updated to only contain the healthy mon. In this example, the healthy mon isrook-ceph-mon-b, while the unhealthymonsarerook-ceph-mon-aandrook-ceph-mon-c.- Take a backup of the current
rook-ceph-mon-bdeployment:oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml - Open the YAML file and copy the command and
args (arguments) from the
moncontainer.These are part of the containers list, as shown in the following example. This is needed for themonmapchanges.[...] containers: - args: - --fsid=41a537f2-f282-428e-989f-a9e07be32e47 - --keyring=/etc/ceph/keyring-store/keyring - --log-to-stderr=true - --err-to-stderr=true - --mon-cluster-log-to-stderr=true - '--log-stderr-prefix=debug ' - --default-log-to-file=false - --default-mon-cluster-log-to-file=false - --mon-host=$(ROOK_CEPH_MON_HOST) - --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) - --id=b - --setuser=ceph - --setgroup=ceph - --foreground - --public-addr=10.100.13.242 - --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db - --public-bind-addr=$(ROOK_POD_IP) command: - ceph-mon [...] - Cleanup the copied command and args fields
to form a pastable command as follows:
# ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IPNote: Make sure to remove the single quotes around the--log-stderr-prefixflag and the parenthesis around the variables being passedROOK_CEPH_MON_HOST,ROOK_CEPH_MON_INITIAL_MEMBERS, andROOK_POD_IP). - Patch the
rook-ceph-mon-bdeployment to stop the working of thismonwithout deleting themonpod.oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}' - Perform the following steps on the
mon-bpod:- Connect to the pod of a healthy
monand run the following command:oc -n openshift-storage exec -it <mon-pod> bash - Set the variable.
monmap_path=/tmp/monmap - Extract the
monmapto a file, by pasting the cephmoncommand from the goodmondeployment and adding the--extract-monmap=${monmap_path}flag.# ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP \ --extract-monmap=${monmap_path} - Review the contents of the
monmap.monmaptool --print /tmp/monmap - Remove the bad
monsfrom themonmap.monmaptool ${monmap_path} --rm <bad_mon>In this example we removemon0andmon2:monmaptool ${monmap_path} --rm a # monmaptool ${monmap_path} --rm c - Inject the modified
monmapinto the goodmon, by pasting the cephmoncommand and adding the--inject-monmap=${monmap_path}flag as follows:# ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP \ --inject-monmap=${monmap_path} - Exit the shell to continue.
- Connect to the pod of a healthy
- Take a backup of the current
- Edit the Rook
configmaps.- Edit the
configmapthat the operator uses to track themons.oc -n openshift-storage edit configmap rook-ceph-mon-endpoints - Verify that in the data element you see three
monssuch as the following (or more depending on yourmoncount):data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789 - Delete the bad
monsfrom the list to end up with a single goodmon. For example:data: b=10.100.13.242:6789 - Save the file and exit.
- Adapt a
Secretwhich is used for themonsand other components.-
Set a value for the variable good_mon_id.
For example:good_mon_id=b -
You can use the
oc patchcommand to patch therook-ceph-configsecret and update the two key/value pairsmon_hostandmon_initial_members.mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'Note: If you are usinghostNetwork: true, you need to replace themon_hostvar with the node IP themonis pinned to (nodeSelector). This is because there is norook-ceph-mon-*service created in that “mode”.
-
- Edit the
- Restart the
mon. You need to restart the goodmonpod with the originalceph-moncommand to pick up the changes.- Use the
oc replacecommand on the backup of themondeployment YAML file:oc replace --force -f rook-ceph-mon-b-deployment.yamlNote: Option--forcedeletes the deployment and creates a new one. - Verify the status of the cluster. The status should show one
monin quorum. If the status looks good, your cluster should be healthy again.
- Use the
- Delete the two mon deployments that are no longer expected to be in quorum.
For example:
oc delete deploy <rook-ceph-mon-1> oc delete deploy <rook-ceph-mon-2>In this example the deployments to be deleted are
rook-ceph-mon-aandrook-ceph-mon-c. - Restart the operator.
- Start the rook operator again to resume monitoring the health of the cluster.
Note: It is safe to ignore the errors that a number of resources already exist.
The operator automatically adds moreoc -n openshift-storage scale deployment rook-ceph-operator --replicas=1monsto increase the quorum size again depending on themoncount.
- Start the rook operator again to resume monitoring the health of the cluster.