IBM Cloud Pak® for Watson AIOps has partnered with IBM Research and customers to help make AI more explainable.

In the IBM Cloud Pak® for Watson AIOps, we have artificial intelligence (AI) that helps clients and users manage their applications, IT infrastructure and services. It’s AI that can use log, metric, topology, event and ticket data and chat history to learn normal behaviour and help customers avoid issues, resolve them faster when they do occur and automate resolutions.  But how do you trust that the artificial intelligence is doing what you want it to do?

Establishing a foundation of trust

For a start, we work closely with our colleagues in IBM Research — one of the largest industrial research organizations in the world.  We embrace “inner source” — the sharing of ideas and technology, developing them together for the common good of our customers.

From a data science perspective, there are a lot of tests that can be performed to determine the accuracy of AI. Those tests often rely on data sets that have an associated “ground truth,” which indicates the expected behaviour and measures how well the AI can replicate it. The data is typically provided by clients who work closely with us to help meet the desired use cases. This way, when other clients use our AI, they are using something that has been tested and validated with real-world data and honest feedback.

Insights, understanding and decisions

Ultimately, however, the best way to ensure that our users trust AI is by making it easily explainable. We present insights, and allow users to understand how that decision was reached.

Example 1: Temporal correlation

In the following example, we have presented a group of events that tend to co-occur, using our Temporal Correlation algorithm. To help build trust, a user can drill down to a view that shows them why we made the decision to group them together. Every green line represents an occurrence of the event, and the user can immediately see that most of the time, these events occur together. 

The strength of the algorithm can be seen in that we don’t need 100% overlap of events to determine that they tend to occur together. You can see this in the chart, where sometimes the events don’t occur together, but the relationship is still discovered:

Example 2: Metric anomaly

In this insight, we highlight that an anomaly has occurred because a metric, Number of Active Connections, “is now a flat line, where before it was varying.” The user can drill down into a view like the one shown below to view the history of the metric over time, together with a baseline and a red zone indicating precisely where the anomaly is occurring. The user can see that, previously, the metric has occasionally had a value of zero, but now it is at zero for much longer than normal. This is a good indication that the service has been interrupted or stopped:

Example 3: Seasonal events

For our final example, we use AI to highlight when events are occurring at non-random times. Knowing that an event occurs with a certain regular frequency is a good indication that you might be fixing the same problem over and over again. This is something that should be automated away — or the underlying cause addressed once and for all. It might highlight that this event is just noise that experienced operators know to ignore, so it would be good to filter it out altogether. To build trust, the user can drill down, where we present simple concise statements and easy-to-understand visualisations, as shown in the following diagram:

Knowing that this event seems to always occur on Fridays between 2pm and 3pm is good information. The user can also see it is not occurring at any other time. Through explainable AI, the user can build trust that other events enriched like this are doing what is expected.


Why is trust so important? The primary goal of AI is to help make our lives better and more efficient. If you trust the AI, you will be more likely to put it to use. When you are confident that the AI is doing what you expect and you can understand it, then you feel confident knowing your time is well spent taking action, investigating, triaging and automating the resolution — avoiding incidents, resolving them faster and resolving them automatically the next time they occur.

Let us help you build trust in IBM Cloud Pak® for Watson AIOps.

More from Cloud

Clients can strengthen defenses for their data with IBM Storage Defender, now generally available

2 min read - We are excited to inform our clients and partners that IBM Storage Defender, part of our IBM Storage for Data Resilience portfolio, is now generally available. Enterprise clients worldwide continue to grapple with a threat landscape that is constantly evolving. Bad actors are moving faster than ever and are causing more lasting damage to data. According to an IBM report, cyberattacks like ransomware that used to take months to fully deploy can now take as little as four days. Cybercriminals…

2 min read

Integrating data center support: Lower costs and decrease downtime with your support strategy

3 min read - As organizations and their data centers embrace hybrid cloud deployments, they have a rapidly growing number of vendors and workloads in their IT environments. The proliferation of these vendors leads to numerous issues and challenges that overburden IT staff, impede clients’ core business innovations and development, and complicate the support and operation of these environments.  Couple that with the CIO’s priorities to improve IT environment availability, security and privacy posture, performance, and the TCO, and you now have a challenge…

3 min read

Using advanced scan settings in the IBM Cloud Security and Compliance Center

5 min read - Customers and users want the ability to schedule scans at the timing of their choice and receive alerts when issues arise, and we’re happy to make a few announcements in this area today: Scan frequency: Until recently, the IBM Cloud® Security and Compliance Center would scan resources every 24 hours, by default, on all of the attachments in an account. With this release, users can continue to run daily scans—which is the recommended option—but they also have the option for…

5 min read

Modernizing child support enforcement with IBM and AWS

7 min read - With 68% of child support enforcement (CSE) systems aging, most state agencies are currently modernizing them or preparing to modernize. More than 20% of families and children are supported by these systems, and with the current constituents of these systems becoming more consumer technology-centric, the use of antiquated technology systems is archaic and unsustainable. At this point, families expect state agencies to have a modern, efficient child support system. The following are some factors driving these states to pursue modernization:…

7 min read