IBM Support

OpenShift Container Platform (OCP) upgrade fails from v4.14 to v4.16

Troubleshooting


Problem

During the upgrade of an OpenShift Container Platform (OCP) cluster from version 4.14 to 4.16, several issues were encountered. Initially, the cluster had pending CSRs, a degraded Master Machine Config Pool (MCP), and a crashlooping config operator pod. After resolving these issues, two infrastructure pods (etcd and openshift-kube-scheduler) on one of the master nodes remained in an 'Init' state, preventing the upgrade from proceeding. The etcd pod issue was related to certificate expiration and initialization failures.

Symptom

The etcd pod (etcd-nonprod1ocpmaster2.tchtest.org) on master2 is in 'Init:0/3' status and is not starting properly. The openshift-kube-scheduler pod (openshift-kube-scheduler-nonprod1ocpmaster2.tchtest.org) is also in 'Init:0/1' status. The issue is potentially related to certificate expiration and is affecting the readiness of these pods. The problem is similar to a known issue documented in Red Hat's solution 7124179.

Cause

The issue was caused by multiple technical problems, including pending CSRs, a degraded Master MCP, and crashlooping pods in the customer's OCP cluster. These issues need to be resolved before proceeding with the upgrade.

Diagnosing The Problem

  1. Reviewed the initial must-gather to identify pre-upgrade issues.
  2. Identified pending CSRs and provided command to approve them.
  3. Resolved degraded Master MCP by investigating and fixing underlying causes.
  4. Addressed crashlooping config operator pod by applying relevant fixes.
  5. Verified cluster health in subsequent must-gathers to ensure readiness for upgrade.
  6. Provided guidance on backing up etcd before proceeding with the upgrade.

Resolving The Problem

The OCP cluster upgrade from v4.14 to v4.16 was successfully facilitated by resolving issues identified during the proactive assessment. The steps included:
1. Reviewing the cluster health using 'omc' commands and must-gather data.
2. Identifying and resolving issues on the master nodes, specifically using the Red Hat solution for one of the nodes (https://access.redhat.com/solutions/6990188).
3. Confirming all 11 nodes were in a 'Ready' state and all Cluster Operators (COs) were ready.
4. Verifying Cluster Version, CSRs, CSVs, PVs, PVCs, MCP, Statefulsets, and Pods were in the expected state.
5. Proceeding with the upgrade on the scheduled date after validation.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB22","label":"Red Hat"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSDE6LV","label":"RH OPENSHIFT"},"ARM Category":[],"ARM Case Number":"TS019602066","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""},{"Type":"MASTER","Line of Business":{"code":"LOB33","label":"N\/A"},"Business Unit":{"code":"BU051","label":"N\/A"},"Product":{"code":"SSR5HY","label":"Cloud Pak RHOCP COC"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Document Information

Modified date:
18 December 2025

UID

ibm17254870