Upgrade from 10.0.5.x to 10.0.8.x Blocked Due to Residual Crunchy pgcluster

Troubleshooting

Problem

During the upgrade from version 10.0.5.x to 10.0.8.x, the process failed repeatedly due to the preupgrade job being stuck. The associated pod was logging errors indicating it was unable to locate the Postgres leader. This preupgrade pod must complete successfully for the upgrade to proceed.

Upon reviewing the pre upgrade pod logs, the following error was observed which has been formatted for readability:

{
  "level": "error",
  "ts": "<timestamp>",
  "logger": "controllers.ManagementCluster.Reconcile.reconcileCrunchy2dcdrCleanup",
  "msg": "error from CheckCrunchyPostgresPrimaryHealth",
  "mgmt-instance": {
    "name": "apicluster-mgmt",
    "namespace": "apic"
  },
  "error": "more than 1 role=master pods found in running state, bootstrap might be running, cannot proceed further",
  "stacktrace": "github.ibm.com/velox/apiconnect-operator/ibm-apiconnect/internal/controller/management.apiconnect"
}

This indicates that multiple role=master pods were detected, which prevents the upgrade from continuing. This situation typically arises when a leftover Crunchy pgcluster resource is present from a previous deployment.

Resolving The Problem

This is a known issue. To resolve it:

Ensure you have a backup of your API Connect environment. This should have been taken before starting the upgrade process to avoid any risk of data loss.
Ensure the EDB cluster is up and running before attempting to delete the pgcluster resource.
Use the following command to check the cluster status:
```
kubectl get cluster
```
To inspect the health of the EDB cluster and verify that the database pods are running, use:
```
kubectl get cluster -o yaml
```
In the YAML output, look for pod status indicators to confirm that the database is operational.
If the EDB cluster is not healthy or its pod is not up and ready then do not proceed with the remaining steps.
Identify the leftover pgcluster resource:
```
kubectl get pgcluster
```
Manually delete the stale pgcluster:
```
kubectl delete pgcluster <pgcluster-name>
```
Replace <pgcluster-name> with the actual name of the leftover resource.

Once the stale pgcluster is removed, the preupgrade pod should be able to complete successfully, allowing the upgrade to proceed.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"ARM Category":[{"code":"a8mKe000000CaZWIA0","label":"API Connect-\u003EAPIC Platform - Install\/Upgrade\/Migrate"}],"ARM Case Number":"TS019955633","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"and future releases;10.0.5"}]

Tips

Upgrade from 10.0.5.x to 10.0.8.x Blocked Due to Residual Crunchy pgcluster

Troubleshooting

Problem

Resolving The Problem

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?