In 2026, the smartest AI models may not be the biggest ones.
That is the bet now being placed by labs, investors and researchers who spent the past year watching their assumptions collapse. The coming 12 months will be defined not by the race to build larger systems, but by the scramble to develop wiser ones, models that think before they speak, that do more with less.
“You can get a small language model performing at the same level, or even better, than much larger models,” Kush Varshney, an IBM Fellow, told IBM Think in an interview.
A year ago, that would have sounded like heresy. For a decade, AI had operated according to a brutally simple catechism: more data, more parameters, more computing power, more intelligence. Labs competed to announce parameter counts like bodybuilders flexing in a mirror. Training runs consumed the electrical output of small cities. The whole enterprise had the feeling of a land rush, except the territory being claimed was measured in teraflops.
Then came January 2025. A company called DeepSeek, based in China, released a model that sent Nvidia’s stock down by 17% in a single day. Algorithmic cleverness could substitute for brute computational force. You didn’t need a cathedral. You needed a better blueprint.
The major American labs pivoted fast. Within months, they moved from building ever-larger systems to building ones that pause and reason before they answer. Seyed Emadi, an Associate Professor of Operations at the University of North Carolina Kenan-Flagler, put it bluntly when he spoke with IBM Think: “If I had to summarize 2025 in AI, we stopped making models bigger and started making them wiser.”
That pivot now shapes what comes next. The consensus among researchers is striking, almost eerie. When asked to identify the most significant development of the past year, Misha Belkin, a Professor of Machine Learning at UC San Diego, pointed to “the rise of thinking models and inference-time scaling”—and in an interview called it the foundation for 2026. Rada Mihalcea, who directs the AI Laboratory at the University of Michigan, offered a complementary view: “advances in multi-agent systems, as well as a deeper understanding of ... weaknesses” would define the path forward, she told IBM Think.
The shift represents a rethinking of what intelligence means in silicon. The old approach treated it as something you baked in during training, like seasoning in a stew. Once complete, the model was frozen. The new approach treats intelligence as something that can emerge at runtime by giving the model more time to reason, called inference time compute.
The implications are still being worked out. Gabriel Poesia, a researcher who studies AI reasoning at Stanford University, has observed models getting better at “thinking for longer periods of time” and “seamlessly using tools during long thinking periods.” The plain-English version: the machines learned to think before they speak.
The old models worked like reflexes: input in, prediction out, no pause for thought. The new ones deliberate. Ask a hard question, and the model will sit with it, sometimes for minutes, checking its logic, backtracking from dead ends. It looks remarkably like thinking. Whether it is thinking, in any meaningful sense, remains one of the great unanswered questions.
If thinking models were the intellectual story of 2025, the commercial bombshell was cruder: frontier AI turned out to be much cheaper than anyone thought. The economics that had seemed as immutable as gravity turned out to be more like fashion. That revelation will reshape competition in 2026.
DeepSeek’s January release landed like a bomb. The model matched Western systems using roughly one-tenth the training compute. “That pushed things,” Varshney said. “Now there’s another competitor, and everyone needs to up their game.”
Model architecture has undergone its own quiet changes. The hot new pattern, mixture of experts, routes inputs to specialized subnetworks instead of activating every parameter for every query. Think of it like consulting the right specialist rather than asking one doctor to know everything. Andrew Chin, a Law Professor at UNC who studies technology policy, explained the economics to IBM Think: “Dense models incur roughly the same computational cost for every token,” he said. “Sparse systems route tokens through only a subset of parameters.” The implication for enterprises is significant: “Scale becomes something to manage, not merely to maximize.”
The democratization extends beyond architecture to fine-tuning. Christelle Scharff, a Computer Science Professor at Pace University, told IBM Think that she’s witnessed “a clear shift toward LoRA and lightweight fine-tuning, enabling powerful models to be adapted with limited compute.” Researchers with modest budgets can now customize models that would have been beyond reach a year ago. The gates are opening.
The efficiency gains also include system design. Kandyce Brennan, an Assistant Professor at the UNC School of Nursing who works on AI in healthcare, told IBM Think that approaches like MIT’s DisCIPL planner—where “a large model plans and coordinates ... many small models”—achieve results with “much lower computational cost.” The efficiency also reduces energy use and environmental burden.
“Data limitations and energy concerns have now become a real challenge,” Mihalcea said, “which has pushed research in the direction of smaller models.” Those constraints will only tighten.
What enterprises actually need, it turns out, is not the ability to do everything, Varshney said. He offered a whimsical example: you could ask a model to comment on civil rights on the moon, and it would produce something fluent. “But most enterprise tasks are not that,” he said. “They’re more targeted.” The theology of scale is giving way to the pragmatism of fit-for-purpose.
The advances have been real. So have the limits. Despite their newfound capacity for deliberation, AI models remain capable of a particular kind of wrongness: the confident mistake, delivered with the serene assurance of a tour guide who has wandered into the wrong museum.
Poesia identified the core problems: “Two major challenges continue to be reliability and creativity. Even succeeding 99.9% of the time is not enough,” he said. The math is unforgiving. A system that fails once in a thousand attempts will fail a thousand times processing a million queries. In medicine, law or finance, those are not acceptable odds.
Creativity is another issue. “For open-ended tasks ... even models from different companies tend to give similar outputs,” Poesia observed. The models have become remarkably good at finding correct answers. They remain strangely uniform when asked to be original.
The reasoning models have their own blind spots. Varshney noted that “on tasks where there’s the ability to verify intermediate steps...these longer flows help. But there are tons of things where there are no intermediate verifiable steps.”
A benchmark called ARC-AGI-2 illustrates the gap. The test presents problems humans find easy, but AI finds extraordinarily difficult. “Even state-of-the-art thinking models score well below human performance,” Emadi said. “Models can reason better than before, but they can still be confidently wrong.”
Hallucination, which is the field’s polite term for making things up, has changed in how it manifests but still remain. Mohammad Hossein Jarrahi, a professor at UNC who studies human-AI interaction, told IBM Think that “hallucinations have shifted in character but not vanished completely.” The tendency to generate plausible-sounding but factually incorrect information remains stubbornly persistent.
Some researchers worry about the broader trajectory. Todd Cherner, who directs an educational technology program at the University of North Carolina, told IBM Think that “the advancing capability of AI agents is provocative. I think the future is headed to AGI faster than people are aware. We should make good use of what we have before really pushing for AGI.”
The foundational principle of computing still applies. “Garbage in, garbage out,” Nathalie Volkheimer, a User Engagement Specialist at RENCI, told IBM Think. “We are focusing on the machine making the sausage, and not the sausage itself. But in the end we eat what we make.”
One less heralded advance: the expansion of context windows, the amount of information a model can hold in working memory. “We see much better repository-scale context, up to around one million tokens,” Jarrahi said. A million tokens is roughly several novels. Models can now maintain coherent understanding across much longer interactions, which matters enormously for legal document review, software development and research synthesis, he said.
Citation features have also improved, with “built-in...grounding features that can point to specific passages,” Jarrahi added. When a model can show its work, users can verify rather than accept on faith. Trust, but verify. Or rather: don’t trust, and definitely verify.
But verification only gets you so far. Aude Oliva, MIT Director of the MIT-IBM Watson AI Lab, told IBM Think that “the future of AI-human collaboration is a dialogue. An artificial agentic system must possess some degree of theory of mind. Understanding an AI system’s inner workings ... forms the foundation of trust.” Theory of mind—the ability to understand that others have different perspectives—is fundamental to human interaction. Its absence in AI creates friction that no amount of capability can overcome.
The metrics for success are shifting accordingly. “The field is inexorably headed toward models judged less by raw fluency and more by traceability, calibration and interactional robustness,” Jarrahi said. The glamour metrics are giving way to reliability metrics. Flash is out. Predictability is in.
“The dominant theme has been capabilities-through-constraints,” Chin said. “Instead of treating scale as an end in itself, leading efforts focus on making systems work predictably under real limits.” Progress looks less like a moonshot and more like an engineering problem.
Three constraints will shape what organizations can do with AI in 2026, a range of experts told IBM Think. The first is economic, the second is physical and the third is regulatory.
Start with money. “Inference economics will increasingly act as a hard ceiling,” Chin said. “Many recent reasoning gains rely on materially more compute per query.” A model that takes minutes to think cannot be deployed where real-time responses are required at scale, he noted.
The physical constraints are equally daunting. “Global data center electricity consumption is projected to more than double by 2030,” Emadi said. “Next year’s constraint for many organizations won’t be chip availability but gigawatts to plug them into.” The industry has spent years obsessing over chips. The bottleneck is moving to power plants.
“The computational demands, and therefore the environmental costs, remain high,” Brennan added, “raising important ethical questions about sustainability.” The carbon footprint of AI has become impossible to ignore.
Then there’s regulation. “Governance-by-design pressures will shape model development more directly,” Chin said. “For many deployments, the requirement is not just high performance, but auditable and bounded behavior.” The era of the black box may be ending.
The growing gap between industry and academia troubles some observers. “Universities must refocus on foundational AI,” Scharff said, “and invest in ideas that will shape the field 10 to 20 years from now.” The largest models are increasingly beyond academic reach, raising uncomfortable questions about where the next generation of ideas will come from.
One development has gone underreported: the rise of sovereign AI. “In many countries, people have been developing their own models,” Varshney said. These matter because training data is more culturally responsive, and they shift economic control closer to home, he noted.
For 2026, Varshney expects continued experimentation rather than dramatic breakthroughs. “Not everything has to be exactly a transformer,” he said. Mihalcea offered a similar forecast: “smaller specialized mix of expert models, leveraging multi-agent systems.” When asked whether big leaps are coming, Varshney was cautious. “There’s always a chance...another ChatGPT moment,” he said. “But I don’t expect that.” The honest answer is that nobody knows.
Practitioners are already adapting to this new landscape. Jayashankar Swaminathan, a Professor of Global Operations at UNC Kenan-Flagler, told IBM Think that “the biggest advancements are around...autonomous agentic capability, where AI is now able to do multiple tasks in a simple order. The second relates to reasoning the logic behind the decision-making.”
In healthcare, the transformation is already underway. Maureen Baker, a Clinical Associate Professor at the UNC School of Nursing, told IBM Think that “AI models are advancing at an incredible pace.” But she distinguished capability from deployment: “Critical thinking, clinical reasoning and judgment must remain at the forefront.” Her approach is pragmatic: “I seek easy wins with minimal risk.”
The ecosystem is differentiating. David Sachs, a Professor of Information Technology at Pace University, told IBM Think that “there seem to be two types of models appearing: the large, we can do everything model, and the more focused ones like Julius or Perplexity.” Much as software evolved from monolithic applications to specialized tools, AI is fragmenting into niches.
“The actual use of these systems ... is shaped ... by designing symbiotic workflows,” Jarrahi said. Humans bring judgment, creativity and accountability. AI brings speed, consistency and the capacity to process vast amounts of information. Organizations that figure out how to combine them will have an edge.
“Frontier AI is moving away from an era defined by raw scale,” Chin said, “toward one defined by procedures, constraints and operational trade-offs.” Technologies mature when engineers start optimizing for real-world limits. By that measure, AI is finally growing up.
But Varshney is thinking about something deeper than technology. “What will the tasks be that get delegated to AI systems, and which ones will humans continue to do?” he asked. “Is it because humans find meaning … from doing certain things? What does it mean to be human, in many ways?”
