Troubleshooting Setup
bai-setup pod does not work as expected, here are diagnoses and
solutions.
Make sure that the jq command-line JSON processor is
installed. Some of
these troubleshooting procedures require this tool. The jq tool
is available from this page: https://stedolan.github.io/jq/
.
bai-setup pods in Error status
bai-setup pods are defined as
<custom_resource_name>-bai-setup-<id>.
- Problem
- One or several
bai-setuppods are in Error status. These pods ensure that the Kafka topics are created and deploy the OpenSearch mappings and the dashboard visualizations. They run when IBM Business Automation Insights starts. - Diagnosis
- Either of the following situations can occur.
- Not all pods are in error.
- One or more
bai-setuppods are in Error status but another one is in 0/1 Completed status. This is a normal, harmless situation. The restart strategy of these pods is such that if a pod does not succeed, another one is started, unless the bai_configuration.setup.backoff_limit value is reached. This parameter specifies the number of retries before a job is considered failed. For each try, a new pod is started with a different identifier. You have nothing to do. - The limit is not reached.
- All
bai-setuppods are in Error status but the backoff_limit has not been reached yet.To know when the maximum number of retries is exceeded, check the current value of the backoff_limit parameter. The default value of this parameter is 6 in Business Automation Insights 20.0.3. It might have been customized through the bai_configuration.setup.backoff_limit parameter in the custom resource. Until the number of retries is exceeded, pods in Error status are a normal, harmless situation. Wait until a pod reaches the 0/1 Completed status or exceeds the number of retries.
- All pods are in error and the limit is reached.
-
- All
bai-setuppods are in Error status and the backoff_limit has been reached. As a consequence, the retry strategy does not start any new pod. This is the only harmful situation.
- All
- Cause
- The possible causes include these situations.
- The Kafka server is not healthy.
- The OpenSearch server is not healthy or the storage volumes that are assigned to OpenSearch pods are not fast enough.
- Solutions
- If all the
bai-setuppods are in Error status and the backoff_limit is reached, apply one of the following solutions.- Check the last part of the log of the most recent
bai-setuppod. If the log shows errors such as Failed to create Kafka topics, check that the Kafka server is healthy.Note: Ifbai-setuppods fail because of Kafka server health issues, the same cause usually affects both thebai-setupandbai-adminpods. - Otherwise, try the following solutions.
- Ensure that your Kubernetes nodes have very fast access to the storage volumes that are assigned to OpenSearch pods. If you are using Kubernetes Service, use the gold storage class for this purpose.
- Increase the value of the backoff_limit parameter for the
setupjob by using the bai_configuration.setup.backoff_limit parameter in the custom resource. For information about the update procedure, see Updating your Business Automation Insights custom resource. - Restart the
bai-setupKubernetes job, which restarts thebai-setuppods. See the Tip at the top of this page about the jq tool.kubectl get job <custom_resource_name>-bai-setup -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels)' | kubectl replace --force -f -
- Check the last part of the log of the most recent