My IBM

What is AI interpretability?

08 October 2024

Authors

What is AI interpretability?

AI interpretability helps people better understand and explain the decision-making processes that power artificial intelligence (AI) models.

AI models use a complex web of data inputs, algorithms, logic, data science and other processes to return insights. The more complex the model, the more difficult it can be for humans to understand the steps that led to its insights—even if those humans are the ones who designed and built it. An interpretable model is one whose decisions can be easily understood by users.

The use of AI is expanding. Systems that use large language models (LLMs) are becoming routine parts of daily life, from smart home devices to credit card fraud detection to the broad use of ChatGPT and other generative AI tools. As highly complex models (including deep-learning algorithms and neural networks) become more common, AI interpretability becomes more important.

Additionally, AI systems and machine-learning algorithms are increasingly prevalent in healthcare, finance and other industries that involve critical or life-altering decisions. With such high stakes, the public needs to be able to trust the outcomes are fair and reliable. That trust depends on understanding how AI systems arrive at their predictions and make their decisions.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Subscribe today

White-box models vs. black-box models

White-box AI models have inputs and logic that are easy to see and understand. For example, basic decision trees, which show a clear flow between each step, are not difficult for the average person to decipher. White-box models tend to use more linear decision-making systems that are easy to interpret, but can result in less accuracy or fewer compelling insights or applications.

Black-box AI models are more complicated and offer less transparency into their inner workings. The user generally doesn’t know how the model reaches its results. These more complex models tend to be more accurate and precise. But because they are difficult or impossible to understand, they come with concerns about their reliability, fairness, biases and other ethical issues. Making black-box models more interpretable is one way to build trust in their use.

AI Academy

Trust, transparency and governance in AI

AI trust is arguably the most important topic in AI. It's also an understandably overwhelming topic. We'll unpack issues such as hallucination, bias and risk, and share steps to adopt AI in an ethical, responsible and fair manner.

Go to episode

AI interpretability vs. AI explainability

AI interpretability focuses on understanding the inner workings of an AI model while AI explainability aims to provide reasons for the model's outputs.

Interpretability is about transparency, allowing users to comprehend the model's architecture, the features it uses and how it combines them to deliver predictions. An interpretable model’s decision-making processes are easily understood by humans. Greater interpretability requires greater disclosure of its internal operations.

Explainability is about verification, or providing justifications for the model's outputs, often after it makes its predictions. Explainable AI (XAI) is used to identify the factors that led to the results. Various explainability methods can be used to present the models in ways that make their complex processes and underlying data science clear to a human being using natural language.

Why is AI interpretability important?

AI interpretability helps to debug models, detect biases, ensure compliance with regulations and build trust with users. It allows developers and users to see how their models affect people and businesses and to develop them responsibly.

Interpretability is important for several reasons:

Trust
Bias and fairness
Debugging
Regulatory compliance
Knowledge transfer

Trust

Without interpretability, users are left in the dark. This lack of accountability can erode public trust in the technology. When stakeholders fully understand how a model makes its decisions, they are more likely to accept its outputs. Model interpretability allows for transparency and clarity, which makes users feel comfortable relying on it in real-world applications such as medical diagnoses or financial decisions.

Bias and fairness

Biases within training data can be amplified by AI models. The resulting discriminatory outcomes perpetuate societal inequalities but also expose organizations to legal and reputational risks. Interpretable AI systems can help detect if a model is making biased decisions based on protected characteristics, such as race, age or gender. Interpretability allows model developers to identify and mitigate discriminatory patterns, helping ensure fairer outcomes.

Debugging

Interpretable machine learning allows the creators of ML algorithms and ML models to identify and fix errors. No machine learning model is 100% accurate from the start. Without understanding the AI's reasoning, debugging is an inefficient and risky process. By understanding how the ML model works, developers and data scientists can pinpoint the sources of incorrect predictions and optimize the model's performance. This process, in turn, increases its overall reliability and aids optimization.

Regulatory compliance

Some regulations, such as the Equal Credit Opportunity Act (ECOA) in the United States or the General Data Protection Regulation (GDPR) in the European Union, require that decisions made by automated systems be transparent and explainable. And a growing number of AI-specific regulations, including the European Union’s EU AI Act, are setting standards for AI development and use. Interpretable AI models can provide clear explanations for their decisions, helping to meet these regulatory requirements. Interpretability can also help with auditing issues, liability and data privacy protections.

Knowledge transfer

Without interpretability, developers and researchers might struggle to translate AI insights into actionable results or advance the technology with changes. Interpretability makes it easier to transfer knowledge about a model’s underpinnings and decisions among stakeholders and to use its knowledge to inform other model development.

Types of interpretability

Stanford University researcher Nigam Shah identifies three main types of interpretability: engineers’ interpretability, causal interpretability and trust-inducing interpretability.¹

Engineers’ interpretability

This type focuses on how the AI model reached its output. It involves understanding the model's internal workings and is relevant to developers and researchers who need to debug or improve the model.

Causal interpretability

This type focuses on why the model produced its output. It involves identifying the factors that have the greatest influence on the model's predictions and how changes in these factors affect the outcomes.

Trust-inducing interpretability

This type focuses on providing explanations that build trust in the model's outputs. It involves presenting the model's decision-making process in a way that is understandable and relatable to users, even if they do not have technical expertise.

Factors in interpretability

Several characteristics influence the interpretability of AI models:

Intrinsic vs. post-hoc
Local vs. global
Model-specific vs. model-agnostic

Intrinsic vs. post-hoc

Intrinsic interpretability refers to models that are inherently interpretable, such as decision trees and linear regression models. Their simple structures are easy to understand. However, post-hoc interpretability involves applying interpretation methods to pre-trained models to explain their behavior. Post-hoc interpretation is best for more complex or black-box models.

Local vs. global

Local interpretability focuses on explaining individual predictions and helps show why the model reached a particular result. Global interpretability aims to understand the model's behavior across the entire dataset, showing its overall patterns and trends.

Model-specific vs. model-agnostic

Model-specific interpretability methods use a model's internal structure to provide explanations. Model-agnostic methods work with any type of model.

Methods of interpretability

Various methods can establish interpretability in AI models.

Some models are simple enough for intrinsic interpretation. These inherently interpretable models rely on straightforward structures such as decision trees, rule-based systems and linear regressions. Humans can easily understand the decision-making patterns and processes of linear models.

More complex models require post-hoc interpretation, in which interpretation methods are applied to pre-trained models to explain the model’s output. Some common post-hoc interpretation methods include:

Local Interpretable Model-Agnostic Explanations (LIME)
Shapley Additive exPlanations (SHAP)
Partial Dependence Plots (PDPs)
Individual Conditional Expectation (ICE) Plots

Local interpretable model-agnostic explanations (LIME)

LIME helps to explain a model's predictions by focusing on a single prediction at a time. It does this by creating a simpler, interpretable model that mimics the behavior of the complex model for that specific prediction. It uses feature attribution to determine the influence of a particular characteristic (such as shape, color or another data point) on the model’s output. For example, it takes a specific prediction and then generates many similar instances by slightly tweaking or adjusting the feature values. From there, it creates a simpler, more interpretable model based on these “perturbed” feature values and their results. In short, LIME provides a simplified, local explanation for how the complex model would behave.

Shapley additive explanations (SHAP)

SHAP is a cooperative game theory-style approach to interpretability that considers all possible combinations of features and how they affect the prediction. It assigns a value (called a Shapley value) to each feature based on how much it contributes to the prediction in different scenarios. SHAP can work with any machine learning system. It offers both local explanations for individual predictions that are delivered by machine learning algorithms and global explanations for the model as a whole. However, because of its computational complexity SHAP can be a slower and more expensive method.

Partial dependence plots (PDPs)

PDPs show how a feature affects the model's predictions, on average, across the dataset. They help visualize the relationship between a feature and the model's output, holding all other features constant. This method is useful for interpreting a small number of features or when stakeholders want to focus on a specific subset of features.

Individual conditional expectation (ICE) plots

ICE plots show how much a predicted outcome depends on a specific feature. They are similar to PDPs but show the relationship between a feature and the model's output for individual instances, rather than averaging across the dataset. They can complement PDPs by providing a more detailed view of the model’s behavior—for example, by highlighting variability and showing interactions between features at an instance level.And they are useful when computer science researchers or stakeholders want to identify outliers or unusual patterns in the model’s operations..

Interpretability: Examples and use cases

AI interpretability is important in any industry that uses AI models to make decisions that impact individuals or society. Some industries in which AI interpretability is relevant include:

Healthcare

Medical professional use artificial intelligence for diagnosis, treatment recommendations and research. Interpretability can help doctors and patients trust and understand an AI model’s decisions and identify bias or errors in its reasoning.

Finance

Finance professionals can use AI to detect fraud, quantify risk, assign credit scores and make recommendations for investments. Interpretability is essential to regulatory compliance and auditing in the finance and banking industry. And understanding a model’s decision-making process for activities such as loan approvals and risk management can help prevent biased results.

Criminal justice

The criminal justice sector can use AI to analyze crime scenes, DNA and forensic evidence and local or national crime patterns. Users might also turn to AI to offer sentencing recommendations and perform other routine judicial operations. Interpretability is critical to ensuring fairness, accuracy and accountability.

Human resources

Some human resources departments use AI for resume screening and candidate evaluation. Interpretability is one way to prevent discrimination in the initial hiring process.

Insurance

The insurance industry uses artificial intelligence for assessing risk, processing insurance claims and setting pricing. Interpretability can help customers understand their premiums and insurers justify their decisions.

Customer support

As more marketing, sales and customer service functions rely on AI-powered chatbots, interpretability can offer important safeguards. Understanding why a chatbot makes a recommendation or decision builds trust in the AI system and help improve or personalize its offerings.

Challenges and limitations of AI interpretability

Interpretability comes with some challenges and limitations.

Often there is a trade-off between model performance and interpretability. Simpler or white-box models are more interpretable but might have lower accuracy compared to complex black-box models such as deep neural networks.

Interpretability also suffers from a lack of standardization. Different methods can provide different explanations for the same model, making it difficult to compare and validate them without formal frameworks. And interpretability is often subjective. What may be considered easy to understand for one user might not be enough for another.

Some experts say that interpretability is not necessary in some cases, or can be counterproductive in others. If the model is private or has no significant impact, or the problem is already the subject of much accepted study, greater interpretability could be redundant or unnecessary. In some cases, greater interpretability might present safety concerns, as more transparency could allow bad actors to exploit a system or allow users to game the system in a way that undermines its efficacy.

AI governance for the enterprise

Learn the key benefits gained with automated AI governance for both today’s generative AI and traditional machine learning models.

Resources

Why AI governance is a business imperative for scaling enterprise artificial intelligence

Learn about the new challenges of generative AI, the need for governing AI and ML models and steps to build a trusted, transparent and explainable AI framework.

AI lifecycle governance

Read about driving ethical and compliant practices with a portfolio of AI products for generative AI models.

AI governance for generative AI prompt models

Gain a deeper understanding of how to ensure fairness, manage drift, maintain quality and enhance explainability with watsonx.governance™.

AI in Action 2024

We surveyed 2,000 organizations about their AI initiatives to discover what’s working, what’s not and how you can get ahead.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

Footnotes

¹ Miller, Katharine. Should AI models be explainable? That depends. Stanford Institute for Human-Centered Artificial Intelligence. March 2021.

What is AI interpretability?

08 October 2024

Authors

Amanda McGrath

Alexandra Jonker

What is AI interpretability?

The latest AI News + Insights

White-box models vs. black-box models

Trust, transparency and governance in AI

AI interpretability vs. AI explainability

Why is AI interpretability important?

Trust

Bias and fairness

Debugging

Regulatory compliance

Knowledge transfer

Types of interpretability

Engineers’ interpretability

Causal interpretability

Trust-inducing interpretability

Factors in interpretability

Intrinsic vs. post-hoc

Local vs. global

Model-specific vs. model-agnostic

Methods of interpretability

Local interpretable model-agnostic explanations (LIME)

Shapley additive explanations (SHAP)

Partial dependence plots (PDPs)

Individual conditional expectation (ICE) plots

Interpretability: Examples and use cases

Challenges and limitations of AI interpretability

Resources

Related solutions

Footnotes