CephDataRecoveryTakingTooLong
Data recovery is slow. Check whether all the Object Storage Devices (OSDs) are up and running.
Impact: High
Diagnosis
- pod status: pending
-
- Check for resource issues, pending Persistent Volume Claims (PVCs), node assignment, and kubelet
problems, using the following commands:
- oc project openshift-storage
- oc get pod | grep rook-ceph-osd
- Set
MYPOD
as the variable for the pod that is identified as the problem pod, specifying the name of the pod that is identified as the problem pod for <pod_name>:Examine the output for a {ceph-component} that is in the pending state, not running or not ready MYPOD=<pod_name>
- Look for the resource limitations or pending PVCs. Otherwise, check for the node assignment, using the oc get pod/${MYPOD} -o wide command.
- Check for resource issues, pending Persistent Volume Claims (PVCs), node assignment, and kubelet
problems, using the following commands:
- pod status: NOT pending, running, but NOT ready
- Check the readiness of the probe, using the oc describe pod/${MYPOD} command.
- pod status: NOT pending, but NOT running
- Check for application or image issues, using the oc logs pod/${MYPOD}
command.Important: If a node was assigned, check the kubelet on the node.
Mitigation
- (Optional) Debugging log information
- Run the following command to gather the debugging information for the Ceph
cluster:
oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:v4.6