Troubleshooting your Watson Speech services installation
You can use this troubleshooting information to diagnose and resolve problems with your Speech services installation. The information documents example scenarios of things that can go wrong and how to identify and debug the root-cause problems.
- Permissions you need for these tasks:
- You must be an administrator of the Red Hat® OpenShift® project to manage the cluster.
Troubleshooting topics
See the following scenarios for more information about troubleshooting the different problems:
- Installation of Watson Speech services fails
- The Watson Speech operator pod fails to start
- Some pods are in the pending state
- The Watson Speech operator is running but no microservices are installed
- Jobs to pull Speech to Text models are taking a long time to finish
- Speech services initContainers are running for a long time
- The Watson Speech operator log indicates that the TLS secret was not created on time
- MinIO pods fail to start or have errors
- No status is reported for the Watson Speech service
- Some Speech services are not running
- Training of custom acoustic models is failing
- PostgreSQL pods stuck in Terminating state on upgrade
- Upgrade to Watson Speech services version 4.6.3 and later fails to complete
- Upgrade to Watson Speech services version 4.6.0 and later leaves unneeded PostgreSQL pods
${PROJECT_CPD_OPS}
is the
name of the project (namespace) in which the Watson Speech operator is deployed, and
${PROJECT_CPD_INSTANCE}
is the name of the project
(namespace) in which the Speech services are installed.The Watson Speech operator pod fails to start
The Watson Speech operator pod fails to start.
Learn the name of the pod for the operator:
oc get pods -l app.kubernetes.io/name=watson-speech -n ${PROJECT_CPD_OPS}
Use the following command to learn more about the nature of the problem. In the command, pod-name is the name of a pod whose status you want to learn.
oc describe pod-name -n ${PROJECT_CPD_OPS}
You can send the log files for the pod to IBM Support for further help. For more information, see Retrieving logs for the Watson Speech operator.
Some pods are in the pending state
Some Speech services pods are stuck in the Pending
status.
Use the following command to learn more about the nature of the problem. In the command, pod-name is the name of a pod whose status is
Pending
.oc describe pod-name -n ${PROJECT_CPD_INSTANCE}
Some possible causes of the problem follow:
Insufficient resources (memory and CPU) are available for the pod.
The pod is unable to pull the container image or images.
Installation of Watson Speech services fails
Installation of the Watson Speech services returns an error message of the following form:
TASK [utils : applying CR <speech-cr> for Watson Speech to Text] ********************************************
Tuesday 1 November 2022 17:44:48 +0000 (0:00:02.140) 0:01:08.881 ****** fatal: [localhost]: FAILED! =>
{"changed": false, "error": 422, "msg": "Failed to create object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},
\"status\":\"Failure\",\"message\":\"WatsonSpeech.speech.watson.ibm.com \\\\\"<speech-cr>\\\\\" is invalid: spec.tags:
Required value\",\"reason\":\"Invalid\",\"details\":{\"name\":\"<speech-cr>",\"group\":\"speech.watson.ibm.com\",\"kind
\":\"WatsonSpeech\",\"causes\":[{\"reason\":\"FieldValueRequired\",\"message\":\"Required value\",\"field\":\"spec.tags\"}]},
\"code\":422}\\n'", "reason": "Unprocessable Entity", "status": 422}
This message indicates that all of the Speech microservices were set to false
during initial
installation of the Speech services with the param-file
option. You must set at least one of the
microservices to true
for the installation to succeed. For more information, see
Specifying additional installation options.
The Watson Speech operator is running but no microservices are installed
The Watson Speech operator is running but none of the Speech microservices is being installed.
Use the following command to verify that you created the Speech services custom resource in the desired namespace:
oc get WatsonSpeech -n ${PROJECT_CPD_INSTANCE}
If the command does not return any information in its output, you might need to create the custom resource. For more information, see Installing Watson Speech services.
If the custom resource exists, check the operator logs to determine why the microservices are not being installed. For more information about checking the operator logs, see Retrieving logs for the Watson Speech operator.
Jobs to pull Speech to Text models are taking a long time to finish
Jobs with names like <custom-resource-name>-stt-models-string
pull the images for Speech to Text models from Docker and upload them to the MinIO datastore. The
custom resource runs one job per enabled model.
The Speech to Text models are large. In some cases, these jobs can take from 20-25 minutes to complete. If you are concerned with the time it is taking to pull and upload the images, check the log files for your pods to see whether any errors have occurred. For more information, see Retrieving logs for pods.
Speech services initContainers are running for a long time
An initContainer
is a special container that is run for a pod. All
initContainers
for a pod must be complete before the pod can start its regular
container.
The Speech to Text and Text to Speech runtimes use an initContainer
to wait for the MinIO datastore to be running and for all installed models and voices to be uploaded.
The initContainer
for either of the runtime microservices might run for as long as 30
minutes, especially in the case of an online install, while it pulls the images from the IBM registry.
If the initContainer
for either runtime microservice continues to run for more than 30
minutes, use the appropriate command to check its log file:
For the Speech to Text service, run the following command to check the status of the service's models:
oc logs -f runtime-pod-name -c wait4models -n ${PROJECT_CPD_INSTANCE}
For the Text to Speech service, run the following command to check the status of the service's voices:
oc logs -f runtime-pod-name -c wait4voices -n ${PROJECT_CPD_INSTANCE}
In both commands, runtime-pod-name specifies the name of the pod for the Speech
to Text or Text to Speech runtime. For example, the name is something like
speech-cr-stt-runtime-85957944ff-wrzl4
for the Speech to Text runtime or
speech-cr-tts-runtime-858bd6f96f-g7dcw
for the Text to Speech runtime.
Possible reasons for the initContainer
to run for a long time include
the following:
The runtime pod is not able to connect to MinIO. MinIO might be in the process of starting up or might have experienced an error. Wait for MinIO to start or check its log file. For more information, see Retrieving logs for pods.
MinIO might be waiting for all of the required models and voices to be installed. Wait for all of the models and voices to be uploaded. You can use the following command to check the status of the jobs that are uploading the models and voices:
oc get jobs -l 'app.kubernetes.io/component in (stt-models,tts-voices)' -n ${PROJECT_CPD_INSTANCE}
You can check the log files for the pods to determine whether a failure has occurred. Otherwise, wait for the upload jobs operation to complete.
The Watson Speech operator log indicates that the TLS secret was not created on time
The Watson Speech operator uses the certificate manager from the foundational services to create a secret name that can be used as a TLS certificate by the Speech services microservices. The following error in the Watson Speech operator log might indicate that microservices of the foundational services are not configured properly. In the message, <custom-resource-name> is the name of your Speech services custom resource.
Secret: <custom-resource-name>-instance-tls did not get created in time.
If this error occurs, contact IBM® Support for assistance.
MinIO pods fail to start or have errors
If the MinIO pods fail to start or generate errors, check the log files for the pods for possible problems. For more information, see Retrieving logs for pods.
Some possible causes of problems follow:
-
The required MinIO secret does not exist. Make sure you specified the correct secret in the custom resource.
-
The persistent volume claims (PVCs) were not bound. Use the following command to make sure that the PVCs are bound in your namespace:
oc get pvc -l "release in (${CUSTOM_RESOURCE_SPEECH}, ${CUSTOM_RESOURCE_SPEECH}-name-rabbitmq)" -n {{ ${PROJECT_CPD_INSTANCE} }}
If the PVCs are not bound, use the following command to describe the PVC to determine the cause, where <pvc-name> is the name of an unbound PVC:
oc describe <pvc-name> -n {{ ${PROJECT_CPD_INSTANCE} }}
Your storage classes might not have been created or possibly the storage class was omitted from or set incorrectly in the Speech services custom resource. Use the following command to make sure that the storage class was created in your namespace:
oc get storageclass | grep -e portworx-db-gp3-sc -e portworx-shared-gp3
This command uses the Portworx storage classes. Substitute the name of the block and file storage classes that are associated with the storage solution that you are using.
-
If you created the Speech services custom resource multiple times, a stale PVC from a previous custom resource might still exist. Remove the Speech services custom resource, then remove any stale PVCs for MinIO and RabbitMQ. You can then re-create the custom resource.
-
For more information about removing the custom resource, see Uninstalling Watson Speech services..
-
For more information about reinstalling the custom resource, see Installing Watson Speech services.
-
No status is reported for the Watson Speech service
The following command fails to report any status for the Watson Speech service (the response is empty):
oc get WatsonSpeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE}
To determine the cause of this problem, do the following:
Use the following command to determine whether the Watson Speech operator pod is running:
oc get pods -n ${PROJECT_CPD_INSTANCE}
The operator must be running for the
oc get WatsonSpeech
command to report its status. If the status of the operator pod indicates that it is still in the process of starting, wait for the operator to start running. The operator can take from 20-60 minutes to create or apply changes to your custom resource.Check the log file for the Watson Speech operator pod and check for any errors or problems. For more information, see Retrieving logs for the Watson Speech operator.
Some Speech services are not running
The following command reports the status of some Speech services as NotRunning
:
oc get WatsonSpeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE}
A status of NotRunning
can indicate that the process is still starting up. It can
take 20-60 minutes for the operator to complete and for the service to start running.
You also check the log file of the pod for any service that is not running. For more information, see Retrieving logs for pods.
Training of custom acoustic models is failing
When you attempt to train custom acoustic models for Speech to Text, the service reports the following error messages:
Unresponsive backend detected. Please try later.
This message indicates that the Speech to Text AM Patcher does not have sufficient resources to
handle its requests. To increase the number of CPUs that are available to the AM Patcher, use the
custom resource property named sttAMPatcher.resources.requestsCPU
to increase the
value of the property from 1
to 5
.
Allocating more resources prevents this error and enables custom acoustic models to be trained as expected. Increasing the value of the property increases the size of the deployment.
PostgreSQL pods stuck in Terminating state on upgrade
When you upgrade the Watson Speech services, you might encounter an issue where the PostgreSQL pods become
stuck in the Terminating
state. If this problem occurs during your upgrade, perform the
following steps to resolve the problem.
Use the following command to identify pods that remain in the
Terminating
state:oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | awk {'print $1'}
Use the following command to set the environment variable
pods
to include the list of pods that remain in theTerminating
state:pods=$(oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | grep Terminating | awk {'print $1'})
Use the following command to delete the stuck pods so that the upgrade process can continue:
oc delete pod $pods -n ${PROJECT_CPD_INSTANCE} --force=true --grace-period=0
Upgrade to Watson Speech services version 4.6.3 and later fails to complete
When you upgrade to Watson Speech services version 4.6.3 and later, upgrade of the MinIO custom resource can fail because the MinIO backup job or the MinIO PVC creation job failed to be deleted in the previous upgrade procedure. The solution is to delete the backup and PVC creation jobs. The upgrade then proceeds normally. Perform the following steps to resolve the problem.
To check the status of the MinIO custom resource, issue the following command:
oc get MinioCluster ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE}
The failed MinIO custom resource is identified by an entry of the following form:
<custom-resource-name> MinioCluster 8d 4 ReleaseFailed True UpgradeError
You can run the following command to get more detailed information about the failure:
oc describe MinioCluster ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE}
The custom resource returns a status message similar to the following:
[2:37 PM] - lastTransitionTime: "2023-04-18T11:05:05Z" message: 'failed to upgrade release: pre-upgrade hooks failed: warning: Hook pre-upgrade ibm-minio/templates/minio-createpvc-job.yaml failed: jobs.batch "<custom-resource-name>-ibm-minio-create-pvc" already exists' reason: UpgradeError status: "True" type: ReleaseFailed
To delete the failed MinIO PVC creation job, issue the following command:
oc delete job ${CUSTOM_RESOURCE_SPEECH}-ibm-minio-create-pvc --namespace ${PROJECT_CPD_INSTANCE}
To determine whether the MinIO backup job remains undeleted, issue the following command:
oc get job --namespace ${PROJECT_CPD_INSTANCE} | grep ${CUSTOM_RESOURCE_SPEECH}-ibm-minio-backup
The MinIO backup job that is not deleted is identified by an entry of the following form:
<custom-resource-name>-ibm-minio-backup 1/1 3m25s 1d
To delete the backup job, issue the following command:
oc delete job ${CUSTOM_RESOURCE_SPEECH}-ibm-minio-backup --namespace ${PROJECT_CPD_INSTANCE}
Once you delete these jobs, upgrade continues and completes.
Upgrade to Watson Speech services version 4.6.0 and later leaves unneeded PostgreSQL pods
Prior to version 4.6.0, the PostgreSQL datastore was installed with all Watson Speech services deployments, but PostgreSQL was not used by the Speech to Text and Text to Speech runtime microservices. As of version 4.6.0, PostgreSQL is installed only if at least one of the following microservices is installed:
- Speech to Text asynchronous microservice
- Speech to Text customization microservice
- Text to Speech customization microservice
When you upgrade from a version earlier than 4.6.0 to version 4.6.0 or later, unnecessary pods for the PostgreSQL datastore can remain in your environment. If you do not use the asynchronous or customization microservices listed previously, you can use the following procedure to delete the unnecessary PostgreSQL pods. Do not delete the PostgreSQL pods if you use the asynchronous or customization microservices.
To query for the presence and status of PostgreSQL pods, run the following command:
oc get pods -n ${PROJECT_CPD_INSTANCE} | grep ${CUSTOM_RESOURCE_SPEECH}-postgres
Three PostgreSQL pods exist. The command returns status similar to the following for each pod. Unused PostgreSQL pods are in the crashed state:
CrashLoopBackOff
.zen <custom-resource-name>-postgres-3 0/1 CrashLoopBackOff 206 (2m31s ago) 17h
If you use only the runtime microservices, use the following command to delete the unnecessary PostgreSQL pods and the associated PVCs:
oc get delete cluster ${CUSTOM_RESOURCE_SPEECH}-postgres -n ${PROJECT_CPD_INSTANCE}