Troubleshooting Setup

If the bai-setup pod does not work as expected, here are diagnoses and solutions.
No pods are in ready state or some pods are prevented to start.
Tip:

Make sure that the jq command-line JSON processor is installed. Some of these troubleshooting procedures require this tool. The jq tool is available from this page: https://stedolan.github.io/jq/ External link opens a new window or tab.

bai-setup pods in Error status

bai-setup pods are defined as <custom_resource_name>-bai-setup-<id>.

Problem
One or several bai-setup pods are in Error status. These pods ensure that the Kafka topics are created and deploy the OpenSearch mappings and the dashboard visualizations. They run when IBM Business Automation Insights starts.
Diagnosis
Either of the following situations can occur.
Not all pods are in error.
One or more bai-setup pods are in Error status but another one is in 0/1 Completed status. This is a normal, harmless situation. The restart strategy of these pods is such that if a pod does not succeed, another one is started, unless the bai_configuration.setup.backoff_limit value is reached. This parameter specifies the number of retries before a job is considered failed. For each try, a new pod is started with a different identifier. You have nothing to do.
The limit is not reached.
All bai-setup pods are in Error status but the backoff_limit has not been reached yet.

To know when the maximum number of retries is exceeded, check the current value of the backoff_limit parameter. The default value of this parameter is 6 in Business Automation Insights 20.0.3. It might have been customized through the bai_configuration.setup.backoff_limit parameter in the custom resource. Until the number of retries is exceeded, pods in Error status are a normal, harmless situation. Wait until a pod reaches the 0/1 Completed status or exceeds the number of retries.

All pods are in error and the limit is reached.
  • All bai-setup pods are in Error status and the backoff_limit has been reached. As a consequence, the retry strategy does not start any new pod. This is the only harmful situation.
Cause
The possible causes include these situations.
  • The Kafka server is not healthy.
  • The OpenSearch server is not healthy or the storage volumes that are assigned to OpenSearch pods are not fast enough.
Solutions
If all the bai-setup pods are in Error status and the backoff_limit is reached, apply one of the following solutions.
  • Check the last part of the log of the most recent bai-setup pod. If the log shows errors such as Failed to create Kafka topics, check that the Kafka server is healthy.
    Note: If bai-setup pods fail because of Kafka server health issues, the same cause usually affects both the bai-setup and bai-admin pods.
  • Otherwise, try the following solutions.
    1. Ensure that your Kubernetes nodes have very fast access to the storage volumes that are assigned to OpenSearch pods. If you are using Kubernetes Service, use the gold storage class for this purpose.
    2. Increase the value of the backoff_limit parameter for the setup job by using the bai_configuration.setup.backoff_limit parameter in the custom resource. For information about the update procedure, see Updating your Business Automation Insights custom resource.
    3. Restart the bai-setup Kubernetes job, which restarts the bai-setup pods. See the Tip at the top of this page about the jq tool.
      kubectl get job <custom_resource_name>-bai-setup -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels)' | kubectl replace --force -f -