Troubleshooting ChatOps

Learn how to isolate and resolve problems with ChatOps IBM Cloud Pak® for AIOps.

Slack buttons are unresponsive

Unresponsive buttons might be caused by a number of issues. If you encounter this issue, try completing the following steps to find the cause of the problem:

  1. Ensure two-way communication is established:

    1. Test two-way communication: Incoming incidents are proof that one-way communication from IBM Cloud Pak for AIOps to Slack is successful. To check two-way communication, tag the IBM Cloud Pak for AIOps bot in the Slack channel. It responds with "Hey there I'm up and running." If it doesn't, IBM Cloud Pak for AIOps might not be receiving any communication from Slack.
    1. Verify endpoints: If the IBM Cloud Pak for AIOps bot did not respond, it might be that Slack is trying to communicate with the wrong endpoints, or the endpoints are not publicly accessible. To make sure that the endpoints are correct, open up the slack api for the app and navigate to the app manifest. The event_subscriptions and interactivity urls should match the ones that are produced by the IBM Cloud Pak for AIOps slack configuration and should be verified by Slack. If they do match but Slack has not verified them, that means Slack cannot access the endpoints likely because they are private. Slack needs some kind of proxy to access the private endpoints. A secure tunnel can be used in scenarios like this. For more information, see Secure Tunnel.
  2. Check communication speed: If there are no problems with the steps that are outlined above, then it is possible the communication between Slack and IBM Cloud Pak for AIOps is taking too long. If a button is clicked, Slack waits 3 seconds to receive a response. If it doesn't get one, the communication is rejected and the button click fails. This is a rare occurrence, but can happen if the network communication is too slow, or if something else is preventing communication from happening in a timely manner. Try reconnecting and reconfiguring the Slack integration. For more information, see Creating a Slack integration. Then, verify that the issue is resolved.

  3. Look for errors in the ChatOps pods: If 2-way comms are successful, and network communication seems to be fine, try logging in to the cluster to look at the logs. Look at the aimanager-aio-chatops-slack-integrator and aimanager-aio-chatops-orchestrator pods for any unusual errors and then contact the L2 support team.

Log preview modal or View alerts modal hangs intermittently in an incident in Slack or Microsoft Teams

This issue might happen when the Log Anomaly detection is in a high load, and takes longer time than normal to respond. When it occurs, you might encounter the following :

  • When you select View alerts in an incident in Slack or Microsoft Teams, it hangs with the Loading ... message.

    View Alerts
    Figure. View alerts

  • The Log preview dialog in Slack or Microsoft Teams hangs with the Loading log data.. message.

    Log Preview
    Figure. Log preview

  • Checking the log results in something similar to this:

    View Log
    Figure. View log

Solution: close the View alerts dialog or the Log preview dialog in Slack or Microsoft Teams, and retry the operation.

Alerts and incidents not appearing in Slack

Alerts and incidents are showing up on IBM Cloud Pak for AIOps console but not in Slack ChatOps.

A number of causes can be considered when alerts are not showing up in Slack. Try one or more of the following to help resolve the problem.

  1. Make sure that alerts and incidents are being created from events: alerts and incidents can be seen in the Incidents and alerts page in the IBM Cloud Pak for AIOps console. If events are flowing but no alerts or incidents are being created, review Troubleshooting alert management.

  2. Double check channel IDs: if alerts and incidents are visible in the IBM Cloud Pak for AIOps console, make sure the channel IDs are correctly pasted in the ChatOps data integration form. Ensure the channel ID rather than the channel name is being used and make sure that no extra blank spaces are in the form field. Instructions for retrieving channel IDs can be found in the Configuring Slack applications for integration page.

  3. Ensure that the Slack app is a channel member: to post incident messages, the IBM Cloud Pak for AIOps Slack app must be a channel member of the channel IDs being used. To add an application to a channel, tag the bot with the "@" symbol followed by the bot name.

  4. Look for any visible errors in the chatops-slack-integrator logs. If the IDs are correct and the app is a channel member, look for error messages in the pod and contact the L2 support team.

Note: If the Slack app is not properly configured when an incident is initially created, that incident is not posted to Slack. Only when the app is successfully configured do new incidents, and updates to those incidents, get posted.

Slack ChatOps integration is not working within an air-gapped (offline) environment

If you create a Slack integration within an air-gapped (offline) environment, the integration might not work as expected. When you first create the Slack integration, the test of the integration (by clicking Test connection to Slack.com.) must be successful. If the test was successful, but the Slack integration is still not working, restart the Slack integrator pod to retrieve the latest integration information.

To restart the pod, complete the following steps:

  1. From a command line, use the OpenShift CLI oc login command to log in to your cluster.

  2. Switch to the project (namespace) where IBM Cloud Pak for AIOps is installed:

    oc project <project>
    

    Where <project> is the project (namespace) where IBM Cloud Pak for AIOps is installed.

  3. Retrieve Slack integrator pod name:

    oc get pods | grep slack
    

    Record the name of the Slack integrator pod.

  4. Delete the pod to cause the pod to restart:

    oc delete pod <slack integrator pod>
    

    Where <slack integrator pod> is the name of the pod name.

Microsoft Teams ChatOps integration shows no response in the console

If you have a Microsoft Teams ChatOps integration, you can encounter an issue the Microsoft Teams console shows no response through the integration. This issue can occur when you install and configure a Secure Tunnel for IBM Cloud Pak for AIOps on an on-premises Red Hat OpenShift Container Platform and configure a corresponding Secure Tunnel connecter for Microsoft Teams on a virtual machine (VM) that is accessible from a public network.

If you encounter this issue, complete the following steps:

  1. Verify that the Secure Tunnel is working.

    1. Log in to the IBM Cloud Pak for AIOps console.
    2. Go to Administration > Secure Tunnel.
    3. Verify that the Secure Tunnel connection is configured and that the status is Ready.
    4. Click into the Application mappings column of your Secure Tunnel connection. Verify the URLs list in the Application mapping address column.

    Notes:

    • If the status is not ready, then check whether the Secure Tunnel Connector is installed and running. For more information about setting up the connector, see Install the Secure Tunnel Connector.
    • If your Secure Tunnel Connector is running, but you do not see the status as Ready for the connection, check the log messages from the Secure Tunnel Connector container.
      • If the log message indicates TLS errors, you might not have the correct Secure Tunnel Connector package. Verify that the installed package is correct. If necessary, download and install the correct package. For more information about installing the connector, see Install the Secure Tunnel Connector.
      • If the log message indicates a connecting error, check whether you can access the virtual machine (VM) where the Secure Tunnel Connector is installed from the Secure Tunnel pod.
  2. Verify that the network between the Secure Tunnel and the VM on which Secure Tunnel Connector is installed is accessible.

    1. Log in to the Red Hat OpenShift Container Platform console where IBM Cloud Pak for AIOps is installed.

    2. Go to Workloads > Pods.

    3. Switch to the project (namespace) for IBM Cloud Pak for AIOps.

    4. Filter the pods to find the secure tunnel pod. For instance filter by sre-tunnel-network.

    5. Click listed pod. Then, click Terminal.

    6. Access the IP address of your VM by running a command similar to the following curl command on port 50443:

      curl -k https://your.vm.external.host.com:50443
      
      • If you get any SSL error from the command, the error indicates that the network connection is working.

      • If you a message tht indicates Connection refused from the command, the message indicates that the network connection is not reachable. Contact your network administrator to fix the network connection.

  3. Verify that the configuration of the Microsoft Azure Bot for the ChatOps integration is correct.

    1. Log in to the Microsoft Azure portal, and click Bot services. Then, click your Azure bot.
    2. Click Configuration under Settings.
    3. Check the messaging endpoint setting. Replace the request URL value with the application mapping URL from the Secure Tunnel connection. For more information about the settings for the connection, see Creating a Microsoft Teams connection.
    4. Attempt to send the message @<bot_name> /waiops_welcome in Microsoft Teams. If the connection still does not work, other issues might exist. contact your network administrator for assistance with investigating and fixing the network connection.
  4. Verify that the network between the Microsoft Azure Bot and the VM on which the Secure Tunnel connector is installed is accessible:

    1. Log in to the terminal of the VM on which you installed the Secure Tunnel connector.

    2. Find the network interface (NIC) that is accessible from public network. You can use a command like ip address to find this interface. For example, if your network interface is eth1, listen to the port 12443 on that network interface by using the command tcpdump -i eth1 port 12443.

    3. Test the access to the application mapping URL of your Secure Tunnel connection from your device or from another device outside of your on-premise netowork. For example, run the following command, where your.vm.external.host.com is your VM host.

      curl -k -X POST https://your.vm.external.host.com:12443/aiops/aimanager/instances/1683613041902750/teams/api/messages
      
    4. Verify that there are outputs from the tcpdump command while you run the curl command.

      If you do not find any output from the tcpdump command, you might have firewalls or ACL restrictions that are blocking inbound traffic to the VM. Contact your network administrator to grant inbound traffic on port 12443 for the VM and retest the command.

During a ChatOps secure tunnel creation an 'installation failed' message displays

When you create a ChatOps integration and it fails, wait for a few minutes to see if the installation retries, and if it does not simply create the integration again.

Show more button does not work in Slack after an upgrade

You might notice an issue where the Show more button does not work in Slack after an IBM Cloud Pak for AIOps upgrade.

The reason for this issue is that the TLS certificate CA for ChatOps is updated through the upgrade, and the tunnel ApplicationMapping is still using the previous TLS certificate CA.

If you encounter this issue, use the following steps to restart the pod sre-tunnel-controller-a-. This restart causes the Secure tunnel controller pod to update the TLS certificate CA:

  1. Get the name of the Secure tunnel controller pod:

    oc get pod | grep sre-tunnel-controller-a-
    
  2. Restart the pod sre-tunnel-controller-a-*:

    oc delete pod <Secure tunnel controller pod>
    

    Where Secure tunnel controller pod is the name you extracted in the preceding step.