Troubleshooting
Problem
Upgrading from a version such as 4.8.42 to a later version, the cluster operators state they are at the latest version, but the "machine-config" and "monitoring" cluster operators have the "degraded" status.
Cause
The Machine Config Operator component in charge of managing each individual node is the Machine Config Daemon, which runs as a daemonset on openshift-machine-config-operator. If the system state differs in anyway from what it expects, it sets the machineconfigpool as Degraded and also reflects that in machineconfiguration.openshift.io/state node annotation. It also stops taking any action in order to not break anything.
Diagnosing The Problem
- Check to see whether any machine configurationpools are degraded:
$ oc get node NAME STATUS ROLES AGE VERSION master-0.ocp.example.net Ready master 34d v1.17.1+9d33dd3 master-1.ocp.example.net Ready master 34d v1.17.1+9d33dd3 master-2.ocp.example.net Ready master 34d v1.17.1+9d33dd3 worker-0.ocp.example.net Ready worker 34d v1.17.1+9d33dd3 worker-1.ocp.example.net Ready worker 34d v1.17.1+9d33dd3 worker-2.ocp.example.net Ready worker 34d v1.17.1+912792b <-------- Degraded
- Look in the logs for an error message similar to:
Marking Degraded due to: unexpected on-disk state validating against rendered-worker-<node>: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<image>", have "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<image>"
Resolving The Problem
Procedure to resolve the issue.
- Locate the correct osImageURL for that node from the machine-config-daemon logs. An example would be similar to,
quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:328a1e57fe5281f4faa300167cdf63cfca1f28a9582aea8d6804e45f4c0522a8.
- Access the failing node
$ oc debug node/[node_name] sh-4.4# chroot /host
- If you are using a proxy server, export any proxy variables needed.
sh-4.4# export HTTP_PROXY=myproxy.com:80 sh-4.4# export HTTPS_PROXY=myproxy.com:80
- Where ${IMAGE} is the image obtained in step 1, run the command:
sh-4.4# /run/bin/machine-config-daemon pivot "${IMAGE}"
- The running image from step 4 needs to look similar to:
sh-4.4# /run/bin/machine-config-daemon pivot 'quay.io/openshift-release-dev/ocp- v4.0-art- dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4' I1124 15:52:40.270660 36886 run.go:18] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-736585590 --registry- config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4 I1124 15:52:57.831857 36886 rpm-ostree.go:261] Running captured: rpm-ostree status --json I1124 15:52:58.300189 36886 rpm-ostree.go:179] Previous pivot: quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:12c8c2c4fb915e49e2f1a42f5761b6f8cf1ee84393d22a3fe143bdabc98c05a8 I1124 15:52:58.706233 36886 rpm-ostree.go:211] Pivoting to: 46.82.202011061621-0 (944e410d59634e95ebffd364b148a1ac4008b1d323459f37cbca97d689722366) I1124 15:52:58.706282 36886 rpm-ostree.go:243] Executing rebase from repo path /run/mco-machine-os-content/os-content-736585590/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4 and checksum 944e410d59634e95ebffd364b148a1ac4008b1d323459f37cbca97d689722366 I1124 15:52:58.706296 36886 rpm-ostree.go:261] Running captured: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-736585590 /srv/repo:944e410d59634e95ebffd364b148a1ac4008b1d323459f37cbca97d689722366 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4 --custom-origin-description Managed by machine-config-operato
Note: If the output running rpm-ostree rebase displays a similar error, go to step 6.error: error running rpm-ostree rebase --experimental /run/mco-machine-os-content /os-content-736585590 /srv/repo:944e410d59634e95ebffd364b148a1ac4008b1d323459f37cbca97d689722366 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4 --custom-origin-description Managed by machine-config-operator: error: No enabled repositories
- Run the rebase command adding the -C parameter.|
sh-4.4# rpm-ostree rebase -C --experimental /run/mco-machine-os-content/<path in the error> --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art- dev@sha256:<SHA image> --custom-origin-description "Managed by machine-config- operator"
- Restart the node by running the command:
sh-4.4# reboot
Related Information
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTDPP","label":"IBM Cloud Pak for Security"},"ARM Category":[{"code":"a8m3p0000000rbnAAA","label":"Administration Task"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
31 October 2022
UID
ibm16831313