A disk device is not accessible on one of the hosts and its corresponding Object Storage Device (OSD) is marked out by the Ceph cluster. This alert is raised when a Ceph node fails to recover within 10 minutes.

Impact: High


Determine the failed node
  1. Get the list of worker nodes, and check for the node status:
    oc get nodes --selector='node-role.kubernetes.io/worker','!node-role.kubernetes.io/infra'
  2. Describe the node which is of NotReady status to get more information on the failure, using the following command:
    oc describe node <node_name>