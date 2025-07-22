Most cognitive models strip experiments down to raw numbers. Centaur does the opposite. It reads each task in full, complete with natural language instructions and every step of the human response. The model was trained on a dataset called Psych 101, a collection of classic psychology problems that includes everything from visual puzzles and memory tests to moral dilemmas and language games. By seeing the same information a person would, Centaur learns to follow the task like a human.

That approach enabled generalization well beyond the training data. When researchers reworded a standard reinforcement learning problem, switching the framing from astronauts to magic carpets, Centaur still exhibited the same behavioral tendencies. It also performed well on entirely new types of tasks, such as LSAT-style logic puzzles.

The use of language, rather than compressed numerical descriptions, was deliberate. “We wanted the model to see what participants saw,” Binz explained. “Full instructions, full context. No shortcuts.”

Centaur isn’t built to explain the workings of the brain. Instead, it focuses on reproducing what people do in behavioral studies. That predictive power has immediate implications for researchers, who often rely on narrow, hand-built models for each type of cognitive function.

Russell Poldrack, a Professor of Psychology at Stanford University who was not involved in the project, views Centaur as part of a larger shift in the field.

“Historically, we’ve given models highly reduced versions of tasks,” he told IBM Think in an interview. “Now, we can give them what we’d give a person and see behavior that mirrors what a person would do.”

The difference isn’t just in scale, but in intent. Most cognitive models are constructed to explain a specific behavior. Centaur is built to observe and replicate behavior across domains, such as visual reasoning and memory tasks. That opens the possibility of discovering new patterns that researchers might otherwise miss.

In one example from the study, the team examined how people choose between products with multiple expert ratings. Centaur’s behavior revealed a two-step strategy: people initially appeared to count the number of positive ratings, and only used expert credibility as a tiebreaker. That insight led to a new, interpretable model of human decision-making, one that Centaur was able to match after refinement.

“We’re not trying to replace cognitive models,” said Binz. “We want to give researchers better tools for exploring what people might be doing.”