Comparing IBM Watson OpenScale to open source on AI explainability

By | 4 minute read | April 7, 2020

IBM Watson OpenScale helps organizations detect and correct AI model bias and drift, explain AI outcomes, monitor model accuracy, analyze payloads, and more. There are algorithms available in open source that provide some of these capabilities. Some of these open source algorithms have originated from IBM such as AI Fairness 360 (AIF360) and AI Explainability 360 (AIX360). A question that often gets asked is “What are the advantages of using OpenScale instead of the algorithms available in open source?”

While we contribute to open source projects, we also adopt them into our products and make them enterprise-ready. Hence, the first and most obvious advantage of using OpenScale is that it is enterprise-grade software designed to handle production workloads, which open source algorithms cannot do. The difference between the two is similar to the difference between an engine and a car. An engine can generate power, but you need to build a bunch of stuff around it to make it useful.

Detecting AI bias

There are open source algorithms available for detecting bias in AI models. In an earlier blog post, I explore the bias detection capability in Watson OpenScale and compare it to open source. I show how open-source bias detection algorithms often lead to false positives. Watson OpenScale avoids this problem by using a data perturbation-based bias detection technique. It also gives an advance warning of model bias before the bias has an impact in production. This is another key capability for enterprise-grade AI model management.

Explaining AI outcomes

Explainability is another important capability. Watson OpenScale provides two kinds of explanations for AI models: the first is based on local interpretable model-agnostic explanations (LIME) and the second algorithm is an IBM Research proprietary algorithm for contrastive explanations.

Let’s first look at the LIME-based algorithm. This algorithm provides a local explanation for the model prediction. In OpenScale we have made multiple enhancements to the LIME algorithm to ensure that it is enterprise-ready and works well in a real-world production environment. I will explain some of these enhancements:

One of the biggest challenges with LIME is that in many cases, the explanation that it generates is not faithful to or reflective of the underlying predictive model. We have seen this happen as much as 25 percent of the time. The problem occurs when LIME struggles to locally approximate the model behavior around the data point. We have worked with IBM Research to identify these situations and made our own proprietary enhancements to the LIME algorithm to generate an accurate local explanation which is also intuitive to understand. We have extensively tested this algorithm on a variety of data and models and have ensured that it generates an accurate explanation all the time.

An advantage of LIME is that its explanation is easy to understand and treats all models as black boxes. Therefore, it can generate an explanation for a wide variety of models. However, a disadvantage of LIME is that for every explanation it needs to understand the model behavior in the vicinity of the data point. For this it generates data perturbations which are scored against the model. This leads to a cost for the customer as almost all AI platforms charge the user based on the number of scoring requests. In OpenScale, we have come up with an innovative caching-based technique which leads to a very significant drop in the number of scorings required for generating a local explanation. This helps reduce the cost associated with generating an explanation, which is very important when the model is being used in an enterprise setting where the number of explanations requests can potentially be very large. The caching-based technique also leads to a significant improvement in the speed of generating an explanation.

Making explainability clearer and more resilient

Another challenge while working in enterprise settings is that things can fail intermittently. For example, when LIME tries to score the perturbed data, it scores them in batches of multiple records. If even one of the scoring requests fail, the LIME explanation fails. We have added fault tolerance to the algorithm in OpenScale to help ensure that it is resilient to all kinds of failures. This ensures that clients are able to generate an explanation even when some of the model scorings fail.

The second explainability algorithm available in OpenScale provides contrastive explanation. This algorithm is specifically tailored to handle models which accept structured data. We compared open source algorithms that generate contrastive explanations to OpenScale and found that OpenScale is able to generate a contrastive explanation with far fewer scoring requests. As a result, Watson OpenScale generates explanations faster and is easier to understand and use than open source algorithms.

These are just some of the enhancements that we have made in OpenScale to make it enterprise-ready and capable of managing real-world AI production environments at scale. Getting to this point required much exploration and research, with the time and effort of a large team. A similar effort is needed to go from building an engine to building a car. So if you are part of an enterprise, rather than investing the time and energy to convert open source to enterprise-grade, it’s always prudent to use the pre-built, tested car and be assured of its robustness and quality. Bon Voyage!

Get started with Watson OpenScale at no charge here. To learn more about the solution, watch short videos that show how Watson OpenScale can detect and correct biasdrift, and provide explainability for AI outcomes. And read the analyst report that reveals why 451 Research awarded Watson OpenScale its Firestarter Award for innovation.

Accelerate your journey to AI.

Most Popular Articles