A chatbot will tell you that honesty matters.
Ask whether it is acceptable to lie to a coworker to avoid embarrassment, and the answer often arrives in calm, careful prose. The system may explain that honesty builds trust, that deception erodes relationships, that transparency helps organizations function. The response can read like the work of someone who has paused to weigh competing principles. But researchers say that impression can be misleading.
Two recent studies suggest that AI systems can produce convincing ethical language without actually reasoning about morality. One paper from researchers at Google DeepMind calls for new tests that measure what the authors describe as “moral competence,” rather than rewarding models simply for producing answers that sound morally appropriate. Another study from Anthropic analyzed hundreds of thousands of conversations with its Claude chatbot to examine how values appear in practice.
“A system that sounds ethical is not the same as a system that reasons ethically,” Phaedra Boinodiris, IBM Global Leader for Trustworthy AI, told IBM Think in an interview. “Conflating the two is how organizations end up deploying a very expensive autocomplete function in life-altering decisions.”
Large language models (LLMs), the technology behind systems like ChatGPT and Claude, generate responses by predicting the most likely next word in a sequence. Engineers train these systems on enormous collections of text drawn from books, websites and academic writing.
Over time, the models learn statistical patterns in language rather than formal rules for reasoning. Because their training data includes vast amounts of human writing about fairness, responsibility and harm, the systems learn how people typically talk about ethical questions.
“What we are seeing is not moral reasoning,” Ignacio Cofone, a legal scholar at the Institute for Ethics in AI at Oxford who studies AI governance, said in an interview with IBM Think. “Large language models generate outputs by predicting the most plausible continuation of a prompt, given statistical structure learned from vast text.”
Scholars say that process can create the impression that a chatbot is reasoning about morality when it is actually reproducing patterns from its training data.
“What looks like moral reasoning is the result of statistical pattern formation during pretraining on vast corpora of human text,” Jake Okechukwu Effoduh, Assistant Professor of Law at Toronto Metropolitan University’s Lincoln Alexander School of Law, told IBM Think in an interview.
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Evidence of how those patterns appear in everyday use emerged in the study from Anthropic. Researchers analyzed more than 300,000 subjective conversations with the company’s Claude chatbot and sought to identify the values it expressed in its responses.
The team identified 3,307 distinct values in those conversations. Some reflected practical goals, such as clarity or professionalism. Others reflected ethical priorities like honesty, transparency or harm prevention.
The analysis found that the model typically aligned with the user’s values. When people raised ideas such as community building or personal growth, Claude often reinforced those themes in its responses.
Moreover, the system frequently mirrored a user’s value language. For example, the same value might appear in both the user’s prompt and the model’s reply, particularly when the conversation involves topics such as authenticity, personal growth or cooperation.
Instances of the model strongly resisting a user’s request were rare, but they appeared in roughly 3% of conversations. Those cases typically involved requests that violated the system’s usage policies, such as attempts to generate harmful or deceptive material. In those exchanges, the model often invoked values such as ethical integrity, honesty or harm prevention.
“Honestly, I think this [study] says more about humans than it does about the tools,” Michael Hilton, a Teaching Professor at Carnegie Mellon University who studies software engineering, said in an interview with IBM Think. “The models are trained on a lot of data that represents a lot of different viewpoints on a lot of different issues.”
Hilton said that diversity makes it difficult to describe any single moral perspective inside the system.
“If the systems are not truly reasoning, but just reflecting what is in their training data, then people are delegating moral decisions based on some unidentified, stochastically determined subset of the training data,” he said.
Researchers say that dynamic raises difficult questions for developers about how to design systems that behave consistently across different ethical contexts.
Some researchers argue that genuine machine ethics, meaning systems that can reason about ethical rules rather than reproduce patterns in language, would require a very different kind of system. Instead of predicting the next likely word in a sequence, such systems would need explicit representations of ethical rules and legal frameworks that they could reason over. Selmer Bringsjord, Professor of Cognitive Science and Computer Science at Rensselaer Polytechnic Institute, said in an interview with IBM Think that meaningful moral reasoning would require formal representations of ethical rules and legal frameworks inside a computational system.
“Such a capacity requires that the system has on hand a formalization of ethical theories, associated ethical codes … and relevant laws,” Bringsjord said. “I’m not even aware of a precise capturing into formal computational logic of even traffic laws.”
Even if these systems cannot perform genuine moral reasoning, some researchers say they can still be useful. Nigel Melville, Associate Professor of Information Systems at the University of Michigan, said AI systems can still help people think through complex questions—especially when organizations treat them as advisory tools rather than decision-makers.
“If AI systems are employed effectively, they can enrich arguments and human understanding of all sides of morally inflected decisions,” Melville said in an interview with IBM Think. “If they are used unwisely, they can create significant damage and harm.”
Researchers say the stakes will only grow as AI moves deeper into workplaces and public-facing services. Boinodiris said developers should build systems that acknowledge uncertainty rather than present moral advice with unwarranted confidence.
“The most important output an AI system can generate in a morally sensitive context is not a confident answer,” she said. “It is an honest acknowledgment of the limits of what it knows.”
Govern generative AI models from anywhere and deploy on the cloud or on premises with IBM watsonx.governance.
See how AI governance can help increase your employees’ confidence in AI, accelerate adoption and innovation and improve customer trust.
Prepare for the EU AI Act and establish a responsible AI governance approach with the help of IBM Consulting®.