IBM Cloud Pak® for Watson AIOps has partnered with IBM Research and customers to help make AI more explainable.
In the IBM Cloud Pak® for Watson AIOps, we have artificial intelligence (AI) that helps clients and users manage their applications, IT infrastructure and services. It’s AI that can use log, metric, topology, event and ticket data and chat history to learn normal behaviour and help customers avoid issues, resolve them faster when they do occur and automate resolutions. But how do you trust that the artificial intelligence is doing what you want it to do?
Establishing a foundation of trust
For a start, we work closely with our colleagues in IBM Research — one of the largest industrial research organizations in the world. We embrace “inner source” — the sharing of ideas and technology, developing them together for the common good of our customers.
From a data science perspective, there are a lot of tests that can be performed to determine the accuracy of AI. Those tests often rely on data sets that have an associated “ground truth,” which indicates the expected behaviour and measures how well the AI can replicate it. The data is typically provided by clients who work closely with us to help meet the desired use cases. This way, when other clients use our AI, they are using something that has been tested and validated with real-world data and honest feedback.
Insights, understanding and decisions
Ultimately, however, the best way to ensure that our users trust AI is by making it easily explainable. We present insights, and allow users to understand how that decision was reached.
Example 1: Temporal correlation
In the following example, we have presented a group of events that tend to co-occur, using our Temporal Correlation algorithm. To help build trust, a user can drill down to a view that shows them why we made the decision to group them together. Every green line represents an occurrence of the event, and the user can immediately see that most of the time, these events occur together.
The strength of the algorithm can be seen in that we don't need 100% overlap of events to determine that they tend to occur together. You can see this in the chart, where sometimes the events don't occur together, but the relationship is still discovered:
Example 2: Metric anomaly
In this insight, we highlight that an anomaly has occurred because a metric, Number of Active Connections, “is now a flat line, where before it was varying.” The user can drill down into a view like the one shown below to view the history of the metric over time, together with a baseline and a red zone indicating precisely where the anomaly is occurring. The user can see that, previously, the metric has occasionally had a value of zero, but now it is at zero for much longer than normal. This is a good indication that the service has been interrupted or stopped:
Example 3: Seasonal events
For our final example, we use AI to highlight when events are occurring at non-random times. Knowing that an event occurs with a certain regular frequency is a good indication that you might be fixing the same problem over and over again. This is something that should be automated away — or the underlying cause addressed once and for all. It might highlight that this event is just noise that experienced operators know to ignore, so it would be good to filter it out altogether. To build trust, the user can drill down, where we present simple concise statements and easy-to-understand visualisations, as shown in the following diagram:
Knowing that this event seems to always occur on Fridays between 2pm and 3pm is good information. The user can also see it is not occurring at any other time. Through explainable AI, the user can build trust that other events enriched like this are doing what is expected.
Why is trust so important? The primary goal of AI is to help make our lives better and more efficient. If you trust the AI, you will be more likely to put it to use. When you are confident that the AI is doing what you expect and you can understand it, then you feel confident knowing your time is well spent taking action, investigating, triaging and automating the resolution — avoiding incidents, resolving them faster and resolving them automatically the next time they occur.
Let us help you build trust in IBM Cloud Pak® for Watson AIOps.