CephMgrIsAbsent

Not having a Ceph manager runningpts the monitoring of the cluster. Persistent Volume Claim (PVC) creation and deletion requests should be resolved as soon as possible.

Impact: High

Diagnosis

Verify that the rook-ceph-mgr pod is failing, and restart if necessary. If the Ceph mgr pod restart fails, follow the general pod troubleshooting steps to resolve the issue.
  1. Verify that the Ceph mgr pod is failing, using the oc get pods | grep mgr command.
  2. Describe the Ceph mgr pod for more details, using the oc describe pods/<pod_name> command, where <pod_name> specifies the rook-ceph-mgr pod name from the previous step.
  3. Analyze the errors related to resource issues.
  4. Delete the pod, and wait for the pod to restart, using the oc get pods | grep mgr command.
Use the following steps for general pod troubleshooting:
pod status: pending
  1. Check for resource issues, pending Persistent Volume Claims (PVCs), node assignment, and kubelet problems, using the following commands:
    • oc project openshift-storage
    • oc get pod | grep rook-ceph-mgr
  2. Set MYPOD as the variable for the pod that is identified as the problem pod, specifying the name of the pod that is identified as the problem pod for <pod_name>:
    Examine the output for a {ceph-component} that is in the pending state, not running or not ready
    MYPOD=<pod_name>
  3. Look for the resource limitations or pending PVCs. Otherwise, check for the node assignment, using the oc get pod/${MYPOD} -o wide command.
pod status: NOT pending, running, but NOT ready
Check the readiness of the probe, using the oc describe pod/${MYPOD} command.
pod status: NOT pending, but NOT running
Check for application or image issues, using the oc logs pod/${MYPOD} command.
Important: If a node was assigned, check the kubelet on the node.

Mitigation

(Optional) Debugging log information
Run the following command to gather the debugging information for the Ceph cluster:
oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:v4.6