CephMgrIsAbsent
Not having a Ceph manager runningpts the monitoring of the cluster. Persistent Volume Claim (PVC) creation and deletion requests should be resolved as soon as possible.
Impact: High
Diagnosis
Verify that therook-ceph-mgr
pod is failing, and
restart if necessary. If the Ceph mgr pod restart fails, follow the general pod troubleshooting steps to resolve the issue.- Verify that the Ceph mgr pod is failing, using the oc get pods | grep mgr command.
- Describe the Ceph mgr pod for more details, using the oc describe
pods/<pod_name> command, where <pod_name> specifies the
rook-ceph-mgr
pod name from the previous step. - Analyze the errors related to resource issues.
- Delete the pod, and wait for the pod to restart, using the oc get pods | grep mgr command.
Use the following steps for general pod troubleshooting:
- pod status: pending
-
- Check for resource issues, pending Persistent Volume Claims (PVCs), node assignment, and kubelet
problems, using the following commands:
- oc project openshift-storage
- oc get pod | grep rook-ceph-mgr
- Set
MYPOD
as the variable for the pod that is identified as the problem pod, specifying the name of the pod that is identified as the problem pod for <pod_name>:Examine the output for a {ceph-component} that is in the pending state, not running or not ready MYPOD=<pod_name>
- Look for the resource limitations or pending PVCs. Otherwise, check for the node assignment, using the oc get pod/${MYPOD} -o wide command.
- Check for resource issues, pending Persistent Volume Claims (PVCs), node assignment, and kubelet
problems, using the following commands:
- pod status: NOT pending, running, but NOT ready
- Check the readiness of the probe, using the oc describe pod/${MYPOD} command.
- pod status: NOT pending, but NOT running
- Check for application or image issues, using the oc logs pod/${MYPOD}
command.Important: If a node was assigned, check the kubelet on the node.
Mitigation
- (Optional) Debugging log information
- Run the following command to gather the debugging information for the Ceph
cluster:
oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:v4.6