IBM Support

PostgreSQL pods do not start with EnterpriseDB operator Version 1.10 operator

Troubleshooting


Problem

If Version 1.10 of the EnterpriseDB operator is installed on your cluster, PostgreSQL pods might not start. The EnterpriseDB operator crashes when it attempts to process the contents of its custom resource, and it doesn't run the mandatory init job when it restarts.
This problem affects the IBM Watson Speech Version 4.0.3 services on IBM Cloud Pak for Data Version 4.0 Refresh 3.

Symptom

The EnterpriseDB operator fails to start the PostgreSQL pods. As a result, the IBM Watson Speech to Text and IBM Watson Text to Speech services also fail to start.

Cause

The problem is caused by a regression in Version 1.10 of the EnterpriseDB operator. It occurs when you deploy the EnterpriseDB operator in the ibm-common-services project and set the operator to watch a different namespace.

Environment

The problem occurs for IBM Watson Speech services that are deployed on IBM Cloud Pak for Data version 4.0 Refresh 3:
  • IBM Watson Speech to Text Version 4.0.3
  • IBM Watson Text to Speech Version 4.0.3

Diagnosing The Problem

The following containers for the IBM Watson Speech services remain in the Init state until they can connect to PostgreSQL:
  • stt-async
  • stt-customization
  • tts-customization.
If the Speech services don't start, use the following command to check the status of the EDB cluster object:
oc get cluster <speech-cr-name>-postgres --namespace <project-name>
  • <speech-cr-name> is the name of the Speech services custom resource (typically speech-prod-cr)
  • <project-name> is the name of the project (the namespace) where you installed the Speech services.
If the EnterpriseDB error occurred, the output includes the following message for the PostgreSQL cluster:
Cluster is in an unrecoverable state, needs manual intervention
In addition, the error causes only a single PostgreSQL replica pod to exist. That pod attempts to start but fails.  You can use the following command to check the status of the pod:
oc get pods | grep postgres

Resolving The Problem

Use the following commands to remove the namespaceSelector field from two cluster-scope webhooks. Then delete the EBD cluster object. After you delete the EDB cluster object, the PostgreSQL operator pod might take a few minutes to restart.
You must be authenticated as a cluster administrator to resolve the problem.
  1. Determine the name of the mutatingwebhookconfiguration webhook.
     
    oc get mutatingwebhookconfiguration | grep mcluster
    

    The command returns a name with the format mcluster.kb.io-xxxxx.
     
  2. Edit the webhook to remove the namespaceSelector field.
     
    oc edit mutatingwebhookconfiguration mcluster.kb.io-xxxxx
  3. Save your changes and close the editor. For example, if you are using vi, hit esc and enter :wq.
     
  4. Determine the name of the validatingwebhookconfiguration webhook.
     
    oc get validatingwebhookconfiguration | grep vcluster 

    The command returns a name with the format vcluster.kb.io-xxxxx.
     
  5. Edit the named webhook to remove the namespaceSelector field.
     
    oc edit validatingwebhookconfiguration vcluster.kb.io-xxxxx
  6. Save your changes and close the editor. For example, if you are using vi, hit esc and enter :wq.
     
  7. Determine the name of the EBD cluster object. 
     
    oc get cluster <speech-cr-name>-postgres --namespace <project-name>
     
    • <speech-cr-name> is the name of the Speech services custom resource (typically speech-prod-cr)
    • <project-name> is the name of the project (the namespace) where you installed the Speech services.
       
  8. Delete the named EDB cluster object. 
     
    oc delete cluster <cluster-object-name>

    The Speech services operator will automatically re-create it.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m50000000ClUuAAK","label":"Installation"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
15 December 2021

UID

ibm16525340