In 2026, the smartest AI models may not be the biggest ones.

That is the bet now being placed by labs, investors and researchers who spent the past year watching their assumptions collapse. The coming 12 months will be defined not by the race to build larger systems, but by the scramble to develop wiser ones, models that think before they speak, that do more with less.

“You can get a small language model performing at the same level, or even better, than much larger models,” Kush Varshney, an IBM Fellow, told IBM Think in an interview.

A year ago, that would have sounded like heresy. For a decade, AI had operated according to a brutally simple catechism: more data, more parameters, more computing power, more intelligence. Labs competed to announce parameter counts like bodybuilders flexing in a mirror. Training runs consumed the electrical output of small cities. The whole enterprise had the feeling of a land rush, except the territory being claimed was measured in teraflops.

Then came January 2025. A company called DeepSeek, based in China, released a model that sent Nvidia’s stock down by 17% in a single day. Algorithmic cleverness could substitute for brute computational force. You didn’t need a cathedral. You needed a better blueprint.

The major American labs pivoted fast. Within months, they moved from building ever-larger systems to building ones that pause and reason before they answer. Seyed Emadi, an Associate Professor of Operations at the University of North Carolina Kenan-Flagler, put it bluntly when he spoke with IBM Think: “If I had to summarize 2025 in AI, we stopped making models bigger and started making them wiser.”

That pivot now shapes what comes next. The consensus among researchers is striking, almost eerie. When asked to identify the most significant development of the past year, Misha Belkin, a Professor of Machine Learning at UC San Diego, pointed to “the rise of thinking models and inference-time scaling”—and in an interview called it the foundation for 2026. Rada Mihalcea, who directs the AI Laboratory at the University of Michigan, offered a complementary view: “advances in multi-agent systems, as well as a deeper understanding of ... weaknesses” would define the path forward, she told IBM Think.

The shift represents a rethinking of what intelligence means in silicon. The old approach treated it as something you baked in during training, like seasoning in a stew. Once complete, the model was frozen. The new approach treats intelligence as something that can emerge at runtime by giving the model more time to reason, called inference time compute.

The implications are still being worked out. Gabriel Poesia, a researcher who studies AI reasoning at Stanford University, has observed models getting better at “thinking for longer periods of time” and “seamlessly using tools during long thinking periods.” The plain-English version: the machines learned to think before they speak.

The old models worked like reflexes: input in, prediction out, no pause for thought. The new ones deliberate. Ask a hard question, and the model will sit with it, sometimes for minutes, checking its logic, backtracking from dead ends. It looks remarkably like thinking. Whether it is thinking, in any meaningful sense, remains one of the great unanswered questions.