Home Think Topics AI interpretability What is AI interpretability?
Explore IBM's AI interpretability solution Subscribe to the Think newsletter
Illustration with collage of pictograms of clouds, pie chart, graph pictograms

Published: 8 October 2024
Contributors: Amanda McGrath, Alexandra Jonker

What is AI interpretability?

AI interpretability helps people better understand and explain the decision-making processes that power artificial intelligence (AI) models.  

AI models use a complex web of data inputs, algorithms, logic, data science and other processes to return insights. The more complex the model, the more difficult it can be for humans to understand the steps that led to its insights—even if those humans are the ones who designed and built it. An interpretable model is one whose decisions can be easily understood by users.

The use of AI is expanding. Systems that use large language models (LLMs) are becoming routine parts of daily life, from smart home devices to credit card fraud detection to the broad use of ChatGPT and other generative AI tools. As highly complex models (including deep-learning algorithms and neural networks) become more common, AI interpretability becomes more important.

Additionally, AI systems and machine-learning algorithms are increasingly prevalent in healthcare, finance and other industries that involve critical or life-altering decisions. With such high stakes, the public needs to be able to trust the outcomes are fair and reliable. That trust depends on understanding how AI systems arrive at their predictions and make their decisions.

AI governance for the enterprise
White-box models vs. black-box models

White-box AI models have inputs and logic that are easy to see and understand. For example, basic decision trees, which show a clear flow between each step, are not difficult for the average person to decipher. White-box models tend to use more linear decision-making systems that are easy to interpret, but can result in less accuracy or fewer compelling insights or applications.

Black-box AI models are more complicated and offer less transparency into their inner workings. The user generally doesn’t know how the model reaches its results. These more complex models tend to be more accurate and precise. But because they are difficult or impossible to understand, they come with concerns about their reliability, fairness, biases and other ethical issues. Making black-box models more interpretable is one way to build trust in their use.

AI interpretability vs. AI explainability

AI interpretability focuses on understanding the inner workings of an AI model while AI explainability aims to provide reasons for the model's outputs.

Interpretability is about transparency, allowing users to comprehend the model's architecture, the features it uses and how it combines them to deliver predictions. An interpretable model’s decision-making processes are easily understood by humans. Greater interpretability requires greater disclosure of its internal operations.

Explainability is about verification, or providing justifications for the model's outputs, often after it makes its predictions. Explainable AI (XAI) is used to identify the factors that led to the results. Various explainability methods can be used to present the models in ways that make their complex processes and underlying data science clear to a human being using natural language.

Learn more about explainable AI (XAI)
Why is AI interpretability important?

AI interpretability helps to debug models, detect biases, ensure compliance with regulations and build trust with users. It allows developers and users to see how their models affect people and businesses and to develop them responsibly.

Interpretability is important for several reasons:

  • Trust
  • Bias and fairness
  • Debugging
  • Regulatory compliance
  • Knowledge transfer

Trust
 

Without interpretability, users are left in the dark. This lack of accountability can erode public trust in the technology. When stakeholders fully understand how a model makes its decisions, they are more likely to accept its outputs. Model interpretability allows for transparency and clarity, which makes users feel comfortable relying on it in real-world applications such as medical diagnoses or financial decisions.
 

Bias and fairness
 

Biases within training data can be amplified by AI models. The resulting discriminatory outcomes perpetuate societal inequalities but also expose organizations to legal and reputational risks. Interpretable AI systems can help detect if a model is making biased decisions based on protected characteristics, such as race, age or gender. Interpretability allows model developers to identify and mitigate discriminatory patterns, helping ensure fairer outcomes.
 

Debugging
 

Interpretable machine learning allows the creators of ML algorithms and ML models to identify and fix errors. No machine learning model is 100% accurate from the start. Without understanding the AI's reasoning, debugging is an inefficient and risky process. By understanding how the ML model works, developers and data scientists can pinpoint the sources of incorrect predictions and optimize the model's performance. This process, in turn, increases its overall reliability and aids optimization.
 

Regulatory compliance
 

Some regulations, such as the Equal Credit Opportunity Act (ECOA) in the United States or the General Data Protection Regulation (GDPR) in the European Union, require that decisions made by automated systems be transparent and explainable. And a growing number of AI-specific regulations, including the European Union’s EU AI Act, are setting standards for AI development and use. Interpretable AI models can provide clear explanations for their decisions, helping to meet these regulatory requirements. Interpretability can also help with auditing issues, liability and data privacy protections.
 

Knowledge transfer
 

Without interpretability, developers and researchers might struggle to translate AI insights into actionable results or advance the technology with changes. Interpretability makes it easier to transfer knowledge about a model’s underpinnings and decisions among stakeholders and to use its knowledge to inform other model development.

Types of interpretability

Stanford University researcher Nigam Shah identifies three main types of interpretability: engineers’ interpretability, causal interpretability and trust-inducing interpretability.1

Engineers’ interpretability
 

This type focuses on how the AI model reached its output. It involves understanding the model's internal workings and is relevant to developers and researchers who need to debug or improve the model.
 

Causal interpretability
 

This type focuses on why the model produced its output. It involves identifying the factors that have the greatest influence on the model's predictions and how changes in these factors affect the outcomes.
 

Trust-inducing interpretability
 

This type focuses on providing explanations that build trust in the model's outputs. It involves presenting the model's decision-making process in a way that is understandable and relatable to users, even if they do not have technical expertise.

Factors in interpretability

Several characteristics influence the interpretability of AI models:

  • Intrinsic vs. post-hoc
  • Local vs. global
  • Model-specific vs. model-agnostic

Intrinsic vs. post-hoc
 

Intrinsic interpretability refers to models that are inherently interpretable, such as decision trees and linear regression models. Their simple structures are easy to understand. However, post-hoc interpretability involves applying interpretation methods to pre-trained models to explain their behavior. Post-hoc interpretation is best for more complex or black-box models.
 

Local vs. global
 

Local interpretability focuses on explaining individual predictions and helps show why the model reached a particular result. Global interpretability aims to understand the model's behavior across the entire dataset, showing its overall patterns and trends.
 

Model-specific vs. model-agnostic
 

Model-specific interpretability methods use a model's internal structure to provide explanations. Model-agnostic methods work with any type of model.

Methods of interpretability

Various methods can establish interpretability in AI models.

Some models are simple enough for intrinsic interpretation. These inherently interpretable models rely on straightforward structures such as decision trees, rule-based systems and linear regressions. Humans can easily understand the decision-making patterns and processes of linear models.

More complex models require post-hoc interpretation, in which interpretation methods are applied to pre-trained models to explain the model’s output. Some common post-hoc interpretation methods include:

  • Local Interpretable Model-Agnostic Explanations (LIME)
  • Shapley Additive exPlanations (SHAP)
  • Partial Dependence Plots (PDPs)
  • Individual Conditional Expectation (ICE) Plots

Local interpretable model-agnostic explanations (LIME)
 

LIME helps to explain a model's predictions by focusing on a single prediction at a time. It does this by creating a simpler, interpretable model that mimics the behavior of the complex model for that specific prediction. It uses feature attribution to determine the influence of a particular characteristic (such as shape, color or another data point) on the model’s output. For example, it takes a specific prediction and then generates many similar instances by slightly tweaking or adjusting the feature values. From there, it creates a simpler, more interpretable model based on these “perturbed” feature values and their results. In short, LIME provides a simplified, local explanation for how the complex model would behave.
 

Shapley additive explanations (SHAP)
 

SHAP is a cooperative game theory-style approach to interpretability that considers all possible combinations of features and how they affect the prediction. It assigns a value (called a Shapley value) to each feature based on how much it contributes to the prediction in different scenarios. SHAP can work with any machine learning system. It offers both local explanations for individual predictions that are delivered by machine learning algorithms and global explanations for the model as a whole. However, because of its computational complexity SHAP can be a slower and more expensive method.
 

Partial dependence plots (PDPs)
 

PDPs show how a feature affects the model's predictions, on average, across the dataset. They help visualize the relationship between a feature and the model's output, holding all other features constant. This method is useful for interpreting a small number of features or when stakeholders want to focus on a specific subset of features.
 

Individual conditional expectation (ICE) plots
 

ICE plots show how much a predicted outcome depends on a specific feature. They are similar to PDPs but show the relationship between a feature and the model's output for individual instances, rather than averaging across the dataset. They can complement PDPs by providing a more detailed view of the model’s behavior—for example, by highlighting variability and showing interactions between features at an instance level.  And they are useful when computer science researchers or stakeholders want to identify outliers or unusual patterns in the model’s operations..

Interpretability: Examples and use cases

AI interpretability is important in any industry that uses AI models to make decisions that impact individuals or society. Some industries in which AI interpretability is relevant include:

Healthcare

Medical professional use artificial intelligence for diagnosis, treatment recommendations and research. Interpretability can help doctors and patients trust and understand an AI model’s decisions and identify bias or errors in its reasoning.

Finance

Finance professionals can use AI to detect fraud, quantify risk, assign credit scores and make recommendations for investments. Interpretability is essential to regulatory compliance and auditing in the finance and banking industry. And understanding a model’s decision-making process for activities such as loan approvals and risk management can help prevent biased results.

Criminal justice

The criminal justice sector can use AI to analyze crime scenes, DNA and forensic evidence and local or national crime patterns. Users might also turn to AI to offer sentencing recommendations and perform other routine judicial operations. Interpretability is critical to ensuring fairness, accuracy and accountability.

Human resources

Some human resources departments use AI for resume screening and candidate evaluation. Interpretability is one way to prevent discrimination in the initial hiring process.

Insurance

The insurance industry uses artificial intelligence for assessing risk, processing insurance claims and setting pricing. Interpretability can help customers understand their premiums and insurers justify their decisions.

Customer support

As more marketing, sales and customer service functions rely on AI-powered chatbots, interpretability can offer important safeguards. Understanding why a chatbot makes a recommendation or decision builds trust in the AI system and help improve or personalize its offerings.

Challenges and limitations of AI interpretability

Interpretability comes with some challenges and limitations.

Often there is a trade-off between model performance and interpretability. Simpler or white-box models are more interpretable but might have lower accuracy compared to complex black-box models such as deep neural networks.

Interpretability also suffers from a lack of standardization. Different methods can provide different explanations for the same model, making it difficult to compare and validate them without formal frameworks. And interpretability is often subjective. What may be considered easy to understand for one user might not be enough for another.

Some experts say that interpretability is not necessary in some cases, or can be counterproductive in others. If the model is private or has no significant impact, or the problem is already the subject of much accepted study, greater interpretability could be redundant or unnecessary. In some cases, greater interpretability might present safety concerns, as more transparency could allow bad actors to exploit a system or allow users to game the system in a way that undermines its efficacy.

Related solutions
IBM® watsonx.governance™

Direct and monitor the artificial intelligence (AI) activities of your organization with tools for governance, compliance and risk management. 

Explore IBM watsonx.governance

IBM® Consulting: AI governance

We work with clients to create a comprehensive, transparent AI strategy that weaves responsible governance into the fabric of your business. 

Explore IBM consulting services for AI governance

IBM® AI solutions

Scale artificial intelligence to more parts of your business with greater confidence and stronger results. 

Explore IBM artificial intelligence solutions
Resources What is artificial intelligence?

AI enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy.

Generative AI: Should you believe the hype?

No technology in history has generated so much interest in so short a time. Here's how to step outside the hype cycle to see its benefits (and limitations.)

A beginner's guide to natural language processing

Discover how natural language processing can help you to converse more naturally with computers.

How to manage AI risks

AI has enormous value but capturing the full benefits of AI means facing and handling its potential pitfalls.

What is explainable AI?

XAI is a set of processes that allows human users to comprehend and trust the results and output created by machine learning algorithms.

Tools for trustworthy AI

How can we monitor the technology to make sure it’s used ethically? Here's what the latest research says.

Take the next step

Accelerate responsible, transparent and explainable AI workflows across the lifecycle for both generative and machine learning models. Direct, manage, and monitor your organization’s AI activities to better manage growing AI regulations and detect and mitigate risk.

Explore watsonx.governance Book a live demo
Footnotes

All links reside outside ibm.com.

1 Miller, Katharine. Should AI models be explainable? That depends. Stanford Institute for Human-Centered Artificial Intelligence. March 2021.