Status of Postgres cluster custom resource is stuck in the Setting up primary
state or no state
After you install or upgrade to IBM Cloud Pak foundational services version 4.6 or later while using Postgres cluster as a database, the status of common-service-db
Postgres cluster custom resource (CR) is stuck in the Setting up primary
state or no state.
Symptom
The installation or upgrade process does not complete and you can observe the following symptoms:
-
There are no pods named
common-service-db-*
running for thecommon-service-db
Postgres cluster CR. Run the following command to check for thecommon-service-db
pods.oc get pods -n <namespace-where-foundational services-are-installed> | grep common-service-db
You can see an empty output.
-
The
common-service-db
Postgres cluster CR status is stuck in theSetting up primary
state. Run the following command to check the status of thecommon-service-db
Postgres cluster CR.oc get cluster.postgresql.k8s common-service-db -n <namespace-where-foundational services-are-installed> -o jsonpath='{.status.phase}'
You either see an empty output, or the following output shows the CR in the
Setting up primary
status:Setting up primary
-
You get an error in the
postgresql-operator-controller-manager
pod logs. Run the following command to view the logs:oc logs -n <namespace-where-foundational services-operator-are-installed> -l app.kubernetes.io/name=cloud-native-postgresql
One of the following errors is displayed in the logs:
{"level":"info","ts":"2023-10-24T16:08:35Z","msg":"Selected PVC is not ready yet, waiting for 1 second","controller":"cluster","controllerGroup":"postgresql.k8s.enterprisedb.io","controllerKind":"Cluster","Cluster":{"name":"common-service-db","namespace":"ibm-common-services"},"namespace":"ibm-common-services","name":"common-service-db","reconcileID":"b8bdc142-b7ce-470e-a46f-7387f7a02494","uuid":"96140b94-7287-11ee-8076-0a580afe0e67","pvc":"common-service-db-1","status":"initializing","instance":"common-service-db-1"}
{{"level":"info","ts":"2024-12-09T19:20:26.276373532Z","msg":"refusing to create the primary instance while the latest generated serial is not zero","controller":"cluster","controllerGroup":"postgresql.k8s.enterprisedb.io","controllerKind":"Cluster","Cluster":{"name":"common-service-db","namespace":"ibm-common-services"},"namespace":"ibm-common-services","name":"common-service-db","reconcileID":"342eb40e-d895-4a86-a42b-49472ad30de6","latestGeneratedNode":1}
Cause
This issue occurs when the cloud-native postgresql operator is running into race condition and cannot create the required resources for the common-service-db
Postgres cluster CR.
For more information, see the description of the similar issues in the cloudnative-pg community: 4147 and 5235.
Resolving the problem
Note: Perform the following actions only if the issue occurs when you install or upgrade {{site.data.keyword.cs}} version 4.6 or later. If this issue occurs after installation or upgrade, contact IBM® support.
To resolve the issue, delete the existing common-service-db
Postgres cluster CR and re-create it. Complete the following steps.
-
Delete the
common-service-db
Postgres Cluster CR.oc delete cluster.postgresql.k8s common-service-db -n <namespace-where-foundational services-are-installed>
-
Delete the
operand-deployment-lifecycle-manager
pod to re-create thecommon-service-db
Postgres cluster (CR).oc delete pod -n <namespace-where-foundational services-are-installed> -l name=operand-deployment-lifecycle-manager
-
Make sure that the
common-service-db
Postgres cluster CR is created successfully.oc get cluster.postgresql.k8s common-service-db -n <namespace-where-foundational services-are-installed>
-
Wait for the
common-service-db
Postgres cluster CR status to change toCluster in healthy state
.oc get cluster.postgresql.k8s common-service-db -n <namespace-where-foundational services-are-installed> -o jsonpath='{.status.phase}'
The following output shows the CR in the
luster in healthy state
status:Cluster in healthy state