Installation common issues
This document provides solutions to common issues encountered during installation and operation.
Common issues during installation
- Symptoms
- Kafka not verifying
- Solution
- Delete the Kafka resource and allow it to re-create automatically.
- Symptoms
- Role binding conflicts
- Solution
- Delete existing role bindings before you proceed with the installation.
- Symptoms
- Images pull errors
- Solution
- Verify that entitlement key secrets are correctly created and configured.
Conflicts multiple operator versions attempt to manage the same PostgreSQL
- Symptoms:
- When you migrate from OLM (Operator Lifecycle Manager) to non-OLM deployments, both the old OLM-based Postgres operator and the new Helm-based operator might run simultaneously. This creates conflicts and resource contention, as multiple operator versions attempt to manage the same PostgreSQL resources. The old CSV (ClusterServiceVersion) must be manually removed to prevent operational issues and can ensure that only the Helm-based operator manages PostgreSQL instances.
- Solution
- Follow these steps to remove CSV-based Postgres deployments when you migrate from OLM to
non-OLM.
- Step 1 - Verify Helm-based Postgres Operator
- Check whether the Helm-based Postgres operator is
running:
oc get deploy postgresql-operator-controller-manager-1-25-4 \ -n ${PROJECT_CPD_INST_OPERATORS} oc get deploy postgresql-operator-controller-manager-1-25-3 \ -n ${PROJECT_CPD_INST_OPERATORS} - If both operators are active (versions 1-25-4 and 1-25-3), you need to remove the older version (1-25-3).
- Check whether the Helm-based Postgres operator is
running:
- Step 2 - Inspect Finalizers on old OLM CSV
oc get csv cloud-native-postgresql.v1.25.3 \ -n ${PROJECT_CPD_INST_OPERATORS} \ -o jsonpath='{.metadata.finalizers}'Expected output
["operators.coreos.com/csv-cleanup"] - Step 3 - Remove the Finalizer
- Forces unlock by removing the finalizer for the old Postgres CSV:
oc patch csv cloud-native-postgresql.v1.25.3 \ -n ${PROJECT_CPD_INST_OPERATORS} \ --type=json \ -p='[{"op":"remove","path":"/metadata/finalizers"}]'
- Forces unlock by removing the finalizer for the old Postgres CSV:
- Step 4 - Delete the CSV
oc delete csv cloud-native-postgresql.v1.25.3 \ -n ${PROJECT_CPD_INST_OPERATORS} - Step 5 - Verify RemovalWait 30 seconds, then verify that the OLM-based operator deployment is removed:
Watson Assistant TensorFlow Deployment Issuesoc get deploy -n ${PROJECT_CPD_INST_OPERATORS} | grep postgresql
- Step 1 - Verify Helm-based Postgres Operator
Watson Assistant TensorFlow deployment issues
Problem: WA TF Deployment Not Starting
Symptom The pod restarts before models are downloaded due to startup probe timing out.
Solution Adjust the startup probe with a temporary patch to accommodate slow download speeds.
Apply the following patch:
cat <<EOF | oc apply -f -
apiVersion: assistant.watson.ibm.com/v1
kind: TemporaryPatch
metadata:
name: wa-tf-probe-fix
spec:
apiVersion: assistant.watson.ibm.com/v1
kind: WatsonAssistantClu
name: wo-wa
patchType: patchStrategicMerge
patch:
tf:
deployment:
spec:
template:
spec:
containers:
- name: tensorflow-serving-adapter
startupProbe:
periodSeconds: 30
EOFThis increases the period between startup probe checks from the default to 30 seconds.
Alternative
Increase the failureThreshold instead of adjusting
periodSeconds.