Efforts to make AI self-examining are multiplying across the industry. Tech companies and universities are exploring how models might not only generate outputs, but also describe the reasoning behind them.

The Anthropic finding aligns with a broader effort to understand the relationship between model complexity and transparency. Larger models in the study exhibited stronger introspective signals, suggesting that greater representational richness may facilitate their ability to detect internal changes. That effect, researchers said in interviews, may reflect differences in scale rather than reliability or usefulness. “It makes sense that richer internal representations allow more structure to detect when something is off,” Ramamurthy said. “That could mean awareness, in the technical sense, grows with capability.”

Interpreting that word—“awareness”—requires some caution, IBM researchers said. In AI research, the term has nothing to do with consciousness or emotion; it simply describes a system’s ability to detect a statistical irregularity within its own patterns. “Awareness here means the model can sense a discrepancy,” Ramamurthy said. “There is no feeling attached to it.”

IBM Fellow Kush Varshney, who leads projects on trustworthy and explainable AI at IBM Research, sees the work not as a step toward sentience, but as another way of probing how these systems reason. “It’s interesting to use the language of metacognition,” he said in an interview. “But the technology as it exists today is really just a form of interpretability. Borrowing terms from human cognition is fine; we just have to resist believing it’s the same thing as introspection.”

IBM’s own research has long focused on building systems with built-in checks and balances. One example is the company’s open-source AI Steerability 360 toolkit, which enables developers to observe internal activations and adjust model behavior in real-time.

“We built these toolkits to see why models behave the way they do and to change that behavior when needed,” Varshney said. They can detect when an AI starts to hallucinate or drift from policy guidelines by analyzing activation patterns. “It’s like a health monitor for reasoning,” he said.

That kind of built-in oversight could make AI systems safer and more accountable, IBM experts said. Models that can flag their own inconsistencies might give developers earlier warnings about bias, misinformation or misuse. “If an AI system can explain why it took a certain action, it’s easier to audit,” Varshney said. “That’s especially important in industries where you can’t rely on intuition.”

The push for transparency is becoming a central focus in the next phase of AI development. Models can now handle decisions that were once made by human experts, from evaluating loans to identifying disease, and each leap in performance brings greater pressure to understand how those systems think. “The systems have become so large that it is nearly impossible to track what is happening inside them,” Cox said. “Introspection could help us recover some of that visibility.”