Troubleshooting

Troubleshoot issues that may occur while using Integration tracing.

"Application is not available" error, although all pods are ready

The error message "Application is not available" may display when you attempt to access the Web Console with a web browser, even though all pods of Operations Dashboard are ready. if your ingress controller (default-router) is using HostNetwork then adding the network.openshift.io/policy-group: ingress label to the default namespace will resolve the issue by allowing traffic to the network.

  1. Run the following to identify if there is a hostNetwork configuration:

oc get deployment router-default -n openshift-ingress -o jsonpath='{.spec.template.spec.hostNetwork}'
  1. If there is a hostNetwork configuration, run this command to resolve the issue:

oc label namespace default network.openshift.io/policy-group=ingress

"An unexpected authentication error has occured" error message is displayed

The error message "An unexpected authentication error has occured" may display when you try to access the Web Console with a web browser. This may occur because:

  • IBM Cloud Pak® foundational services authentication/authorization services are experiencing issues. Look in the Cloud Pak foundational services namespace (usually ibm-common-services) for pods named "auth-\*" that have error messages.

  • Operations Dashboard does not yet support signing in with SSO or OpenShift authentication.

The integration capability registration process with the API does not complete successfully

The Operations Dashboard service binding API for automatically registering capabilities is managed by the Operations Dashboard operator. Once the registration process completes successfully, the registration request should appear on the Registration requests page with a status of Approved. If the OperationsDashboardServiceBinding custom resource has been created and the registration process has not completed successfully, complete the following steps:

  • Make sure the prerequisites of the capability registration API are met.

  • Inspect the logs of the Operations Dashboard operator that watches the integration capability namespace (this might be a different pod than the pod that is watching the namespace where the Operations Dashboard instance has been installed), and look for errors.

Manual integration capability registration request does not appear in Operations Dashboard Web Console

When a new capability is deployed with the IBM Cloud Pak for Integration instance (IBM Auatomation UI), it must register with Operations Dashboard in order to be able to send distributed tracing data. Complete the following steps to troubleshoot a case where a registration request does not appear in Operations Dashboard Web Console for manual approval:

  • Make sure the instance of the integration capability has been deployed with tracing enabled. If tracing has not been enabled, the instance of the integration capability needs to be redeployed with tracing enabled.

  • Make sure the Operations Dashboard namespace name that was provided during the integration capability installation is correct. If the namespace name is incorrect, the instance of the integration capability needs to be redeployed with the correct namespace name.

  • Registration reqeust approval is required once per integration capability namespace. A new registration request appears only if the namespace of the newly deployed capability instance has not been approved before. If it has been approved before, there is no need to approve it again.

  • A registration job within the integration capability namespace is responsible of making the registration request to Operations Dashboard. Inspect the logs of this job and make sure this request was successful.

  • One of the registration-endpoint containers of Operations Dashboard's front end pods receives the registration requests and logs them. To inspect the requests log, execute bash within these containers and review the file /var/log/httpd/access_log.

  • The registration-processor container of the same Operations Dashboard front end pod analyzes the requests log and executes REST requests to the ui-manager container of the same pod. Inspect the logs of this container and make sure the request was successful.

  • The ui-manager container of the same Operations Dashboard front end pod logs this REST request and insert the request to the configuration database. Inspect the logs of this container and make sure the request processing was successful.

  • In registration requests page, try to change the filter from New request to All requests to display requests that have been approved or archived before.

No tracing data is displayed

Complete the following steps to troubleshoot a case where no tracing data is displayed:

  • Make sure tracing is supported for your specific use case, as described in overview. For example, only MQ messages that include an MQRFH2 header or message properties are supported.
  • Make sure the capability registration process has completed successfully, and that the integration capability instance is ready and processing requests.

  • The collector container in the integration capability pod is responsible for sending tracing data to the Operations Dashboard Store. Inspect the logs of this container and make sure no errors are reported. If there are HTTP 401 error messages, this is probably caused by a registration request reprocess or a re-installation of Operations Dashboard, and the integration capability pods need to be restarted so the newly created Secret becomes effective. See registration requests for more information.

  • Try to sign out from Operations Dashboard and sign in again. Operations Dashboard permissions are based on the permissions a user has for namespaces. If the user is a Cluster Administrator, and namespaces are added after the user signs in, the user must sign out and sign in again to force an immediate refresh of the user's permissions. Otherwise, it might take some time until it is refreshed.

  • By default, only ten percent (10%) of the traces is sampled. To change this value (for example, to increase it to 100% so all traces are sampled), navigate to sampling policy page.

  • The basic entitlement has some limitations, such as maximum duration of two hours. Make sure there are requests/traces within this time range.
  • Make sure system time is set correctly and is synchronized between all worker nodes.

Operations Dashboard store pod fails to start, vm.max_map_count is too low

Sometimes, due to an issue with tuned pods, Operations Dashboard store pod fails to start with the following error message:

ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

If you observe this error message in the logs, complete the following steps to resolve the issue:

  • Execute the following command to restart all tuned pods: oc get pods -n openshift-cluster-node-tuning-operator | awk '{print $1}' | grep tuned- | xargs -I ppp oc -n openshift-cluster-node-tuning-operator delete pod ppp

  • Wait for the store pod to restart.

MustGather: Gathering data for solving problems

If you are asked to provide a MustGather to the IBM Support team, in order to assist with investigating an issue, complete the following steps. Change <OD_NAMESPACE> in the following commands (it appears several times) to the namespace where Operations Dashboard is installed in your environment, execute them, and provide the output:

oc get pods -n <OD_NAMESPACE>
oc get pods -n <OD_NAMESPACE> -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ibm-integration-o | xargs oc describe pod -n <OD_NAMESPACE>
oc get pods -n <OD_NAMESPACE> -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ibm-integration-o | xargs -I ppp sh -c 'oc get pod ppp -n <OD_NAMESPACE> -o jsonpath='"'"'{range .spec.initContainers[*]}{.name}{"\n"}{end}'"'"' | xargs -I ccc sh -c '"'"'echo "=== ppp ccc ==="; oc logs -n <OD_NAMESPACE> ppp -c ccc'"'"
oc get pods -n <OD_NAMESPACE> -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ibm-integration-o | xargs -I ppp sh -c 'oc get pod ppp -n <OD_NAMESPACE> -o jsonpath='"'"'{range .spec.containers[*]}{.name}{"\n"}{end}'"'"' | xargs -I ccc sh -c '"'"'echo "=== ppp ccc ==="; oc logs -n <OD_NAMESPACE> ppp -c ccc'"'"

In addition, collect the following information:

  • A full screenshot of Operations Dashboard Web Console About modal which includes the version of the product as well as timezone information (available under the user menu in the top navigation bar).

  • A full screenshot of Internal alerts page.

  • A full screenshot of Scheduler info.

  • Export of traces that demonstrate the issue (if applicable) - see following instructions.

  • Any other screenshot or description that may help recreating the issue.

Exporting a single trace

In case you are asked to provide a single trace export to IBM support team, in order to assist with investigating an issue, complete the following steps:

  • Locate a trace that demonstrates the issue and copy its trace ID using traces page.

  • See "Debug log" in Store status.

  • Use the Trace ID filter to display only spans of that trace.

  • Copy the output.

Enable debug logs

In case you are asked by IBM support team to increase the log level, complete the following steps:

ui-proxy container

To enable debug logs, change the following values in the following command and execute it:

  • <OD_NAMESPACE> - The namespace where Operations Dashboard is deployed.

  • <OD_POD_NAME> - Operations Dashboard management pod name.

oc exec -it <OD_POD_NAME> -c ui-proxy -n <OD_NAMESPACE> -- sh -c "sed -i 's/LogLevel warn/LogLevel debug/' /etc/httpd/conf/httpd.conf | httpd -k graceful"

To disable debug logs, change the following values in the following command and execute it:

  • <OD_NAMESPACE> - The namespace where Operations Dashboard is deployed.

  • <OD_POD_NAME> - Operations Dashboard management pod name.

oc exec -it <OD_POD_NAME> -c ui-proxy -n <OD_NAMESPACE> -- sh -c "sed -i 's/LogLevel debug/LogLevel warn/' /etc/httpd/conf/httpd.conf | httpd -k graceful"

App Connect (ACE) tracing runtime component

To enable debug logs (in /var/log/ACEOpenTracing), change the following values in the following command and execute it:

  • <ACE_NAMESPACE> - The namespace where App Connect is deployed.

  • <ACE_POD_NAME> - App Connect server pod name.

  • <ACE_CONTAINER_NAME> - App Connect server container name.

oc exec -it <ACE_POD_NAME> -c <ACE_CONTAINER_NAME> -n <ACE_NAMESPACE> -- sh -c "sed -i 's/ = info/ = debug/g' /etc/ACEOpenTracing/loggers.properties"

To disable debug logs (in /var/log/ACEOpenTracing), change the following values in the following command and execute it:

  • <ACE_NAMESPACE> - The namespace where App Connect is deployed.

  • <ACE_POD_NAME> - App Connect server pod name.

  • <ACE_CONTAINER_NAME> - App Connect server container name.

oc exec -it <ACE_POD_NAME> -c <ACE_CONTAINER_NAME> -n <ACE_NAMESPACE> -- sh -c "sed -i 's/ = debug/ = info/g' /etc/ACEOpenTracing/loggers.properties"

Useful CLI commands for gathering information

  • Get a list of pods: oc get pods -n <NAMESPACE>

  • Display logs of a container: oc logs <POD_NAME> -c <CONTAINER_NAME> -n <NAMESPACE>

  • Display a pod configuration (describe a pod): oc describe pod <POD_NAME> -n <NAMESPACE> -o yaml

  • Display recent events: oc get events -n <NAMESPACE>

  • Execute "bash" within a container to access its files and execute commands within the container: oc exec -it <POD_NAME> -c <CONTAINER_NAME> -n <NAMESPACE> -- /bin/bash