Troubleshooting ChatOps
Learn how to isolate and resolve problems with ChatOps IBM Cloud Pak® for AIOps.
- Slack buttons are unresponsive
- Log preview dialog or View alerts dialog hangs intermittently in an incident in Slack or Microsoft Teams
- Alert and incidents do not appear in Slack
- Slack ChatOps integration is not working within an air-gapped (offline) environment
- Microsoft Teams ChatOps integration shows no response in the console
- During a ChatOps secure tunnel creation an 'installation failed' message displays
- Show more button does not work in Slack after an upgrade
- Links in ChatOps incident messages are invalid
- Disabling ChatOps in IBM Cloud Pak® for AIOps
Slack buttons are unresponsive
Unresponsive buttons might be caused by a number of issues. If you encounter this issue, try completing the following steps to find the cause of the problem:
-
Ensure two-way communication is established:
- Test two-way communication: Incoming incidents are proof that one-way communication from IBM Cloud Pak for AIOps to Slack is successful. To check two-way communication, tag the IBM Cloud Pak for AIOps bot in the Slack channel. It responds with "Hey there I'm up and running." If it doesn't, IBM Cloud Pak for AIOps might not be receiving any communication from Slack.
- Verify endpoints: If the IBM Cloud Pak for AIOps bot did not respond, it might be that Slack is trying to communicate with the wrong endpoints, or the endpoints are not publicly accessible. To make sure that the endpoints
are correct, open up the slack api for the app and navigate to the app manifest. The
event_subscriptionsandinteractivityurls should match the ones that are produced by the IBM Cloud Pak for AIOps slack configuration and should be verified by Slack. If they do match but Slack has not verified them, that means Slack cannot access the endpoints likely because they are private. Slack needs some kind of proxy to access the private endpoints. A secure tunnel can be used in scenarios like this. For more information, see Secure Tunnel.
-
Check communication speed: If there are no problems with the steps that are outlined above, then it is possible the communication between Slack and IBM Cloud Pak for AIOps is taking too long. If a button is clicked, Slack waits 3 seconds to receive a response. If it doesn't get one, the communication is rejected and the button click fails. This is a rare occurrence, but can happen if the network communication is too slow, or if something else is preventing communication from happening in a timely manner. Try reconnecting and reconfiguring the Slack integration. For more information, see Creating a Slack integration. Then, verify that the issue is resolved.
-
Look for errors in the ChatOps pods: If 2-way comms are successful, and network communication seems to be fine, try logging in to the cluster to look at the logs. Look at the
aimanager-aio-chatops-slack-integratorandaimanager-aio-chatops-orchestratorpods for any unusual errors and then contact the L2 support team.
Log preview modal or View alerts modal hangs intermittently in an incident in Slack or Microsoft Teams
This issue might happen when the Log Anomaly detection is in a high load, and takes longer time than normal to respond. When it occurs, you might encounter the following :
-
When you select View alerts in an incident in Slack or Microsoft Teams, it hangs with the
Loading ...message.Figure. View alerts -
The Log preview dialog in Slack or Microsoft Teams hangs with the
Loading log data..message.Figure. Log preview -
Checking the log results in something similar to this:
Figure. View log
Solution: close the View alerts dialog or the Log preview dialog in Slack or Microsoft Teams, and retry the operation.
Alerts and incidents not appearing in Slack
Alerts and incidents are showing up on IBM Cloud Pak for AIOps console but not in Slack ChatOps.
A number of causes can be considered when alerts are not showing up in Slack. Try one or more of the following to help resolve the problem.
-
Make sure that alerts and incidents are being created from events: alerts and incidents can be seen in the Incidents and alerts page in the IBM Cloud Pak for AIOps console. If events are flowing but no alerts or incidents are being created, review Troubleshooting alert management.
-
Double check channel IDs: if alerts and incidents are visible in the IBM Cloud Pak for AIOps console, make sure the channel IDs are correctly pasted in the ChatOps data integration form. Ensure the channel ID rather than the channel name is being used and make sure that no extra blank spaces are in the form field. Instructions for retrieving channel IDs can be found in the Configuring Slack applications for integration page.
-
Ensure that the Slack app is a channel member: to post incident messages, the IBM Cloud Pak for AIOps Slack app must be a channel member of the channel IDs being used. To add an application to a channel, tag the bot with the "@" symbol followed by the bot name.
-
Look for any visible errors in the
chatops-slack-integratorlogs. If the IDs are correct and the app is a channel member, look for error messages in the pod and contact the L2 support team.
Note: If the Slack app is not properly configured when an incident is initially created, that incident is not posted to Slack. Only when the app is successfully configured do new incidents, and updates to those incidents, get posted.
Slack ChatOps integration is not working within an air-gapped (offline) environment
If you create a Slack integration within an air-gapped (offline) environment, the integration might not work as expected. When you first create the Slack integration, the test of the integration (by clicking Test connection to Slack.com.) must be successful. If the test was successful, but the Slack integration is still not working, restart the Slack integrator pod to retrieve the latest integration information.
To restart the pod, complete the following steps:
-
From a command line, use the OpenShift CLI
oc logincommand to log in to your cluster. -
Switch to the project (namespace) where IBM Cloud Pak for AIOps is installed:
oc project <project>Where
<project>is the project (namespace) where IBM Cloud Pak for AIOps is installed. -
Retrieve Slack integrator pod name:
oc get pods | grep slackRecord the name of the Slack integrator pod.
-
Delete the pod to cause the pod to restart:
oc delete pod <slack integrator pod>Where
<slack integrator pod>is the name of the pod name.
Microsoft Teams ChatOps integration shows no response in the console
If you have a Microsoft Teams ChatOps integration, you can encounter an issue the Microsoft Teams console shows no response through the integration. This issue can occur when you install and configure a Secure Tunnel for IBM Cloud Pak for AIOps on an on-premises Red Hat OpenShift Container Platform and configure a corresponding Secure Tunnel connecter for Microsoft Teams on a virtual machine (VM) that is accessible from a public network.
If you encounter this issue, complete the following steps:
-
Verify that the Secure Tunnel is working.
- Log in to the IBM Cloud Pak for AIOps console.
- Go to Administration > Secure Tunnel.
- Verify that the Secure Tunnel connection is configured and that the status is
Ready. - Click into the Application mappings column of your Secure Tunnel connection. Verify the URLs list in the Application mapping address column.
Notes:
- If the status is not
ready, then check whether the Secure Tunnel Connector is installed and running. For more information about setting up the connector, see Install the Secure Tunnel Connector. - If your Secure Tunnel Connector is running, but you do not see the status as
Readyfor the connection, check the log messages from the Secure Tunnel Connector container.- If the log message indicates TLS errors, you might not have the correct Secure Tunnel Connector package. Verify that the installed package is correct. If necessary, download and install the correct package. For more information about installing the connector, see Install the Secure Tunnel Connector.
- If the log message indicates a connecting error, check whether you can access the virtual machine (VM) where the Secure Tunnel Connector is installed from the Secure Tunnel pod.
-
Verify that the network between the Secure Tunnel and the VM on which Secure Tunnel Connector is installed is accessible.
-
Log in to the Red Hat OpenShift Container Platform console where IBM Cloud Pak for AIOps is installed.
-
Go to Workloads > Pods.
-
Switch to the project (namespace) for IBM Cloud Pak for AIOps.
-
Filter the pods to find the secure tunnel pod. For instance filter by
sre-tunnel-network. -
Click listed pod. Then, click Terminal.
-
Access the IP address of your VM by running a command similar to the following
curlcommand on port50443:curl -k https://your.vm.external.host.com:50443-
If you get any SSL error from the command, the error indicates that the network connection is working.
-
If you a message tht indicates
Connection refusedfrom the command, the message indicates that the network connection is not reachable. Contact your network administrator to fix the network connection.
-
-
-
Verify that the configuration of the Microsoft Azure Bot for the ChatOps integration is correct.
- Log in to the Microsoft Azure portal, and click Bot services. Then, click your Azure bot.
- Click Configuration under Settings.
- Check the
messaging endpointsetting. Replace therequest URLvalue with theapplication mapping URLfrom the Secure Tunnel connection. For more information about the settings for the connection, see Creating a Microsoft Teams connection. - Attempt to send the message
@<bot_name> /waiops_welcomein Microsoft Teams. If the connection still does not work, other issues might exist. contact your network administrator for assistance with investigating and fixing the network connection.
-
Verify that the network between the Microsoft Azure Bot and the VM on which the Secure Tunnel connector is installed is accessible:
-
Log in to the terminal of the VM on which you installed the Secure Tunnel connector.
-
Find the network interface (NIC) that is accessible from public network. You can use a command like
ip addressto find this interface. For example, if your network interface iseth1, listen to the port12443on that network interface by using the commandtcpdump -i eth1 port 12443. -
Test the access to the
application mapping URLof your Secure Tunnel connection from your device or from another device outside of your on-premise netowork. For example, run the following command, whereyour.vm.external.host.comis your VM host.curl -k -X POST https://your.vm.external.host.com:12443/aiops/aimanager/instances/1683613041902750/teams/api/messages -
Verify that there are outputs from the
tcpdumpcommand while you run thecurlcommand.If you do not find any output from the
tcpdumpcommand, you might have firewalls or ACL restrictions that are blocking inbound traffic to the VM. Contact your network administrator to grant inbound traffic on port12443for the VM and retest the command.
-
During a ChatOps secure tunnel creation an 'installation failed' message displays
When you create a ChatOps integration and it fails, wait for a few minutes to see if the installation retries, and if it does not simply create the integration again.
Show more button does not work in Slack after an upgrade
You might notice an issue where the Show more button does not work in Slack after an IBM Cloud Pak for AIOps upgrade.
The reason for this issue is that the TLS certificate CA for ChatOps is updated through the upgrade, and the tunnel ApplicationMapping is still using the previous TLS certificate CA.
If you encounter this issue, use the following steps to restart the pod sre-tunnel-controller-a-. This restart causes the Secure tunnel controller pod to update the TLS certificate CA:
-
Get the name of the Secure tunnel controller pod:
oc get pod | grep sre-tunnel-controller-a- -
Restart the pod
sre-tunnel-controller-a-*:oc delete pod <Secure tunnel controller pod>Where
Secure tunnel controller podis the name you extracted in the preceding step.
Links in ChatOps incident messages are invalid
In ChatOps incident messages for IBM Cloud Pak for AIOps 4.6.0, the following incident message links are invalid:
- On the Associated automations message, the View associated automations link is invalid.
- On the Incident resources message, the View incident resource topology link is invalid.
For IBM Cloud Pak for AIOps 4.6.0, go to the incident page to view the associated runbooks and resources.
Disabling ChatOps in IBM Cloud Pak® for AIOps
ChatOps by default will only work after a Slack/Teams integration is set up and configured in a policy. However, regardless of whether this is set up, the ChatOps pods will be ready and running. To stop these pods from running completely, the
ChatOps Slack/Teams integrator pod replicas can be scaled down to 0 in the deployment.
The same can be done for the ChatOps orchestrator, but this also impacts the ServiceNow integration. The ServiceNow integration relies on the ChatOps orchestrator, so if the ServiceNow integration is not being used, the ChatOps orchestrator
replicas can be safely scaled to down to 0 as well.