What is black box AI?

29 October 2024

Authors

Matthew Kosinski

Enterprise Technology Writer

What is black box artificial intelligence (AI)?

A black box AI is an AI system whose internal workings are a mystery to its users. Users can see the system’s inputs and outputs, but they can’t see what happens within the AI tool to produce those outputs.

Consider a black box model that evaluates job candidates’ resumes. Users can see the inputs—the resumes they feed into the AI model. And users can see the outputs—the assessments the model returns for those resumes. But users don’t know exactly how the model arrives at its conclusions—the factors it considers, how it weighs those factors and so on. 

Many of the most advanced machine learning models available today, including large language models such as OpenAI’s ChatGPT and Meta’s Llama, are black box AIs. These artificial intelligence models are trained on massive data sets through complex deep learning processes, and even their own creators do not fully understand how they work. 

These complex black boxes can deliver impressive results, but the lack of transparency can sometimes make it hard to trust their outputs. Users cannot easily validate a model’s outputs if they don’t know what’s happening under the hood. Furthermore, the opacity of a black box model can hide cybersecurity vulnerabilities, biases, privacy violations and other problems. 

To address these challenges, AI researchers are working to develop explainable AI tools that balance the performance of advanced models with the need for transparency into AI outcomes. 

Why do black box AI systems exist?

Black box AI models arise for one of two reasons: Either their developers make them into black boxes on purpose, or they become black boxes as a by-product of their training. 

Some AI developers and programmers obscure the inner workings of AI tools before releasing them to the public. This tactic is often meant to protect intellectual property. The system’s creators know exactly how it works, but they keep the source code and decision-making process a secret. Many traditional, rule-based AI algorithms are black boxes for this reason.

However, many of the most advanced AI technologies, including generative AI tools, are what one might call “organic black boxes.” The creators of these tools do not intentionally obscure their operations. Rather, the deep learning systems that power these models are so complex that even the creators themselves do not understand exactly what happens inside them.

Deep learning algorithms are a type of machine learning algorithm that uses multilayered neural networks. Where a traditional machine learning model might use a network with one or two layers, deep learning models can have hundreds or even thousands of layers. Each layer contains multiple neurons, which are bundles of code designed to mimic the functions of the human brain.

Deep neural networks can consume and analyze raw, unstructured big data sets with little human intervention. They can take in massive amounts of data, identify patterns, learn from these patterns and use what they learn to generate new outputs, such as images, video and text. 

This capacity for large-scale learning with no supervision enables AI systems to do things like advanced language processing, original content creation and other feats that can seem close to human intelligence.

However, these deep neural networks are inherently opaque. Users—including AI developers—can see what happens at the input and output layers, also called the “visible layers.” They can see the data that goes in and the predictions, classifications or other content that comes out. But they do not know what happens at all the network layers in between, the so-called “hidden layers.”

AI developers broadly know how data moves through each layer of the network, and they have a general sense of what the models do with the data they ingest. But they don’t know all the specifics. For example, they might not know what it means when a certain combination of neurons activates, or exactly how the model finds and combines vector embeddings to respond to a prompt. 

Even open-source AI models that share their underlying code are ultimately black boxes because users still cannot interpret what happens within each layer of the model when it’s active.

The black box problem

The most advanced AI and ML models available today are extremely powerful, but this power comes at the price of lower interpretability. 

Generative AI models rely on complex neural networks to respond to natural language commands, solve novel problems and create original content, but it’s difficult to interpret what happens inside those networks. Simpler, rule-based AI models are easier to explain, but they’re generally not as powerful or flexible as generative AI models.

So organizations cannot solve the black box problem by simply using more explainable, traditional AI tools. Traditional AI models can perform many functions, but there are some things that only an advanced AI model can do.

While there might be practical reasons to use black box machine learning models, the lack of transparency can be an obstacle to getting the full value from these advanced models.  

Specifically, black box AI poses challenges such as:

Reduced trust in model outputs

Users don’t know how a black box model makes the decisions that it does—the factors it weighs and the correlations it draws. Even if the model’s outputs seem accurate, validation can be difficult without a clear understanding of the processes that lead to those outputs. 

Unbeknownst to their users, black box models can arrive at the right conclusions for the wrong reason. This phenomenon is sometimes called the “Clever Hans effect,” after a horse who could supposedly count and do simple arithmetic by stomping his hoof. In truth, Hans was picking up on subtle cues from his owner’s body language to tell when it was time to stop stomping.

The Clever Hans effect can have serious consequences when models are applied to fields like healthcare. For example, AI models trained to diagnose COVID-19 based on lung x-rays have been known to reach high accuracy levels with training data but perform less capably in the real world. 

This performance gap often arises because the models are learning to identify COVID based on irrelevant factors. One experimental model “diagnosed” COVID based on the presence of annotations on x-rays rather than the x-rays themselves. This happened because COVID-positive x-rays were more likely to be annotated in the model’s training data, because physicians were highlighting their relevant features for other physicians.1

Difficulty adjusting model operations

If a black box model does make the wrong decisions or consistently produces inaccurate or harmful outputs, it can be hard to adjust the model to correct this behavior. Without knowing exactly what happens inside the model, users cannot pinpoint exactly where it is going wrong.

This problem poses a significant challenge in the field of autonomous vehicles, where developers train sophisticated AI systems to make real-time driving decisions. If an autonomous vehicle makes the wrong decision, the consequences can be fatal. But because the models behind these vehicles are so complex, understanding why they make bad decisions, and how to correct them, can be difficult. 

To get around this issue, many autonomous vehicle developers supplement their AIs with more explainable systems, such as radar and lidar sensors. While these systems do not shed light on the AI itself, they do provide developers with insight into the environments and situations that seem to cause AI models to make bad calls.2

Security issues

Because organizations can’t see everything happening in a black box model, they might miss vulnerabilities lurking inside. Generative AI models are also susceptible to prompt injection and data poisoning attacks, which can secretly change a model’s behavior without users knowing. If users can’t see into a model’s processes, they won’t know when those processes have been altered.

Ethical concerns

Black box models might be susceptible to bias. Any AI tool can reproduce human biases if those biases are present in its training data or design. With black box models, it can be especially hard to pinpoint the existence of bias or its causes.

Bias can lead to suboptimal, outright harmful and illegal outcomes. For example, an AI model trained to screen job candidates can learn to filter out talented female applicants if the training data skews male. 

Some criminal justice systems use sophisticated AI models to assess a person’s risk of reoffending. These models are often black boxes, at least to the public, who might not know exactly what factors the models consider. If the algorithm is not transparent, it can be hard to trust its predictions or appeal them when they’re wrong.3

Regulatory noncompliance

Certain regulations, such as the European Union AI Act and the California Consumer Privacy Act (CCPA), set rules on how organizations can use sensitive personal data in AI-powered decision-making tools. With black box models, it can be hard for an organization to know whether it is compliant or to prove compliance in the event of an audit.

3D design of balls rolling on a track

The latest AI News + Insights 


Expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

Black box AI vs. white box AI

White box AI, also called explainable AI (XAI) or glass box AI, is the opposite of black box AI. It is an AI system with transparent inner workings. Users understand how the AI takes in data, processes it and arrives at a conclusion. 

White box AI models make it easier to trust and validate outcomes, as well as tweak models to correct errors and adjust performance. But it isn’t easy to turn every AI into a white box. 

Traditional AI models can often be made transparent by sharing their source code. But sophisticated machine learning models develop their own parameters through deep learning algorithms. Simply having access to the architectures of these models doesn’t always fully explain what they’re doing.

That said, there are efforts underway to make advanced AI models more explainable. For example, researchers at Anthropic are applying autoencoders—a type of neural network—to the company’s Claude 3 Sonnet LLM to understand which neuron combinations correspond to which concepts. So far, researchers have identified combinations that signify things such as the Golden Gate Bridge and the field of neuroscience.

OpenAI’s recent o1 model shares information about the steps it takes to reach its outputs, which can help illustrate how it arrives at its answers. However, this is not a direct look inside the model, but a model-generated explanation of its own activity. Much of the model’s operations—including the raw chain of thought—remain hidden.5

Other researchers have developed techniques to help explain how models arrive at specific conclusions. For example, local interpretable model-agnostic explanation (LIME) is a process that uses a separate machine learning model to analyze the relationships between the inputs and outputs of a black box, with the goal of identifying features that might influence the model’s outputs. 

These conclusion-focused techniques are often designed to work on models with clearly structured inputs and outputs. For example, LIME can help explain predictions and classifications, but it sheds less light on open-ended AI systems with deep neural networks.

Dealing with the challenges of black box AI

Organizations can opt for transparent models wherever possible, but some workflows require sophisticated black box AI tools. That said, there are ways to make black box models more trustworthy and mitigate some of their risks.

Open-source models

Open-source models can give users more transparency into their development and operations than closed-sourced AI tools that keep their model architectures private.

An open-source generative AI model might ultimately be a black box due to its complex neural network, but it can offer users more insight than a closed-source model.

AI governance

AI governance—the processes, standards and guardrails that help ensure AI systems and tools are safe and ethical—enables organizations to establish robust control structures for AI implementations.

Governance tools can offer more insight into model operations through automation of monitoring, performance alerts, health scores and audit trails. AI governance might not make a black box transparent, but it can help catch anomalies and thwart inappropriate use. 

AI security

AI security processes and tools can help identify and fix vulnerabilities in AI models, applications and related data sets that IT and security teams might not find on their own. 

AI security tools can also offer insights into each AI deployment’s data, model and application usage, as well as the applications accessing the AI. 

Responsible AI

A responsible AI framework supplies an organization with a set of principles and practices to make AI more trustworthy.

For example, IBM’s Pillars of Trust for AI include explainability, fairness, robustness, transparency and privacy. Where black box models are necessary, adhering to a framework can help an organization use those models in a more transparent way.

Footnotes
Related solutions watsonx.governance™

Direct, manage and monitor your risk and compliance with an end-to-end toolkit for AI governance across the entire AI lifecycle.

Guardium® AI Security

Manage the security risk of sensitive AI data and AI models.

Artificial intelligence (AI) consulting services

Redefine how you work with AI for business.

Resources

Cybersecurity in the era of generative AI
Report

Register

What is data security?
Explainer

Learn more

IBM X-Force Cloud Threat Landscape Report 2024
Report

Download

What is a cyberattack?
Explainer

Learn more

Take the next step

IBM Security® provides transformative, AI-powered solutions that optimize analysts’ time by accelerating threat detection, expediting responses and protecting user identity and datasets while keeping cybersecurity teams in the loop and in charge.

Explore IBM's AI-powered solutions