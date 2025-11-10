The pursuit of trustworthy AI has also prompted a reconsideration of what constitutes intelligence. For decades, the goal for many has been artificial general intelligence (AGI), machines that can match human performance across a wide range of tasks. By that metric, Chen admits, the field has arguably already arrived.

“If AGI means solving multiple problems at a human level, then yes, we’ve reached it,” he said. “But that’s not the same as understanding.”

In conversation, he replaced the capital letters with a lowercase aspiration he calls “artificial good intelligence”: systems that behave responsibly and understand their limits. “These models can write essays, pass exams, even compose music,” he said. “But they don’t know what they’re doing. The next step is to teach them awareness of their own boundaries.”

That awareness begins, paradoxically, with failure. Chen’s group builds adversarial tests for today’s systems, designed to expose vulnerabilities through prompts that trick models into bias, contradictions or security breaches.

“You have to think like an attacker,” he said. “If we can predict how something will be misused, we can defend against it.”

He approaches persuasion with similar caution. In the same way he probes technical vulnerabilities, Chen examines how modern AI assistants are tuned for agreeableness, rewarding compliance over correctness.

“One version of a chatbot became so compliant that people complained it was useless,” he said. “At first, they liked how polite it was. Then they realized it never challenged them.” For Chen, the behavior revealed a deeper tension between truth and customer satisfaction. “The system learns that agreement gets rewarded,” he said. “But that’s not the same as being right.”

That insight underlies a broader debate within the AI development community. Should assistants prioritize accuracy or empathy? Politeness or precision? Chen favors models that occasionally correct their users. “AI should assist thinking, not mirror it,” he said.

Within enterprise deployments, the answer often begins with data, Chen said. He pointed out that most industries already possess valuable information, but lack the infrastructure to use it safely. He describes foundation models as engines for representation. “One way I think about them is as converters that turn raw data into structured vectors,” he explained. “Once you encode the raw data, you can train simpler, auditable models on top. You get scale without losing interpretability.”

The approach offers a way to keep AI flexible yet accountable. A foundation model can turn raw data into a useful structure, while smaller, transparent systems handle the final calls. A manufacturer might process sensor data this way, and a hospital might use it to summarize notes while doctors make the diagnoses. “You can have power and clarity at the same time,” Chen said.

His insistence on boundaries stems partly from his previous research. Early in his career, he demonstrated how imperceptible changes to an image, involving just a few pixels, could cause a classifier to label a bagel as a piano. “We realized how fragile these systems were,” he said. “That fragility doesn’t disappear with size; it just becomes harder to detect.”

The same, he said, holds for language. The seamless paragraphs generated by modern models can conceal deep structural uncertainty. A sentence that reads like certainty may in fact be statistical improvisation.“The better they sound,” Chen said, “the less we can tell when they’re wrong.”

Companies eager to monetize conversational interfaces often prioritize responsiveness over restraint, Chen said. And that, he added, is where engineering discipline matters most. “If the training and evaluation reward guessing,” he said, “then guessing is what the model will learn to do.”

He believes the real test of maturity will be whether the industry can value silence. “A model that can admit uncertainty,” he said, “is a model you can trust.”

In Introduction to Foundation Models, Chen and Liu describe that capability as the convergence of technical design and moral architecture. The authors call for cross-disciplinary standards combining software verification with ethics, regulation and user education. “You need checks at every layer,” the authors explain, “from data collection and model training to deployment and feedback.” The vision is not of perfect AI, but of responsible infrastructure.

That framing also reflects the tone of IBM’s broader research agenda, Chen said. Rather than chase the next benchmark, the company has spent years developing governance frameworks for foundation models, including those focused on explainability and audit pipelines. Chen sees the attention as overdue.

“We have built competent systems,” he said. “Now we need to make sure we can explain them.”

The approach aligns with a broader movement in AI research that treats introspection as a technical property rather than a metaphor. Tools like IBM’s Attention Tracker or Anthropic’s interpretability probes attempt to visualize internal reasoning.

Still, there’s only so much we can see. Even with new transparency tools, the inner workings of these models can be baffling. Studying them, Chen said, is a bit like neuroscience, where you can watch the neurons light up without really knowing why. “We can see which neurons fire,” he said, “but we’re still learning what that means.”

The goal, Chen said, is to embed humility in design: “Technology doesn’t have to be perfect, but it should be honest about what it can and can’t do.”

That may sound modest, but it amounts to a quiet redefinition of progress. For years, success in AI was measured by the next benchmark, the next leap in scale. The coming era, Chen believes, will use other metrics: reproducibility, transparency, restraint. “It’s easy to build bigger models,” he said. “It’s much harder to make them trustworthy.”

The irony, Chen observed, is that the same predictive machinery that fuels hallucination also contains the seeds of its solution. A model trained to predict things could, in principle, learn to predict its own uncertainty. “If it knows when it doesn’t know,” he said, “that’s when it becomes useful.”

He paused before adding, “That’s when we can start to believe what it says.”