CephMonVersionMismatch
Typically this alert triggers during an upgrade that is taking a long time.
Impact: Medium
Diagnosis
Check the
ocs-operator
subscription status and the operator pod health to check
if an operator upgrade is in progress.- Check the
ocs-operator
subscription health.oc get sub $(oc get pods -n openshift-storage | grep -v ocs-operator) -n openshift-storage -o json | jq .status.conditions
The status condition types are CatalogSourcesUnhealthy, InstallPlanMissing, InstallPlanPending, and InstallPlanFailed. The status for each type should be False.
Example output:
The example output shows a[ { "lastTransitionTime": "2021-01-26T19:21:37Z", "message": "all available catalogsources are healthy", "reason": "AllCatalogSourcesHealthy", "status": "False", "type": "CatalogSourcesUnhealthy" } ]
False
status for typeCatalogSourcesUnHealthly
, which means that the catalog sources are healthy. - Check the OCS operator pod status to see if there is an OCS operator upgrading in
progress.
If you determine that theoc get pod -n openshift-storage | grep ocs-operator OCSOP=$(oc get pod -n openshift-storage -o custom-columns=POD:.metadata.name --no-headers | grep ocs-operator) echo $OCSOP oc get pod/${OCSOP} -n openshift-storage oc describe pod/${OCSOP} -n openshift-storage
ocs-operator
is in progress, wait for 5 minutes and this alert should resolve itself. If you have waited or see a different error status condition, continue troubleshooting.
Mitigation
- (Optional) Debugging log information
- Run the following command to gather the debugging information for the Ceph
cluster:
oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:v4.6