This article was featured in the Think newsletter. Get it in your inbox.
Can open-source AI systems outperform closed ones? Thanks to the newest model from Chinese AI research lab Moonshot AI, the answer to that hotly debated question may finally be yes. Kimi K2 Thinking is an open-source agent that uses a mixture-of-experts architecture and outperforms OpenAI’s ChatGPT and xAI’s Grok on key benchmarks like Humanity’s Last Exam and BrowseComp, according to Moonshot AI.
“This is actually a big open source milestone and a challenge to the entire closed AI economy,” said IBM Principal Research Scientist Kaoutar El Maghraoui on a recent episode of the Mixture of Experts podcast. “If the best model in the world is open weight, the center of gravity in AI shifts from secret models to shared ecosystems.”
A core feature of the Kimi K2 Thinking model is its ability to reason step by step while using tools. Specifically, it can execute up to 200-300 sequential tool calls without human intervention, meaning the model can call external tools one after another, in a specific order, using one tool’s output as the next tool’s input. This is especially useful for multi-step research and reasoning tasks, where a model must gather new information, update its reasoning and explore different tools, in a process called “interleaved thinking.” Kimi K2 Thinking’s mixture-of-experts architecture, which in this case activates only relevant parts of its massive trillion parameters, further boosts its efficiency.
That combination of performance and efficiency is particularly interesting for companies, El Maghraoui said. “For enterprises, this open-way dominance really means that you can finally bring top-tier reasoning in-house, with much lower costs.” IBM Fellow Aaron Baughman said that while it’s exciting to see open-source models performing so well, he cautioned that additional verification of this standout performance is also needed. “I think a third-party independent assessment needs to be made around this model, too,” he said on the Mixture of Experts episode.
But capability and efficiency may, in fact, no longer be enough to distinguish a model, say experts like El Maghraoui. Case in point: OpenAI’s newest (closed) frontier model GPT-5.1, which arrived last week, just six days after Kimi K2 Thinking. Instead of focusing on raw power alone, OpenAI took its cues from users: “We heard clearly from users that great AI should not only be smart, but also enjoyable to talk to,” the company stated in a blog post for GPT-5.1. To that end, OpenAI’s newest agent is warmer, even playful, as it chats with users in a more conversational style than earlier models.
Why this new focus? “I think it develops a sense of empathy with the user and trust so that if the model can have a warmer personality and respond in that way, then it develops that relationship further,” Baughman said. Beyond personality, the fact that ChatGPT 5.1 has a router means the model can provide an instant response more cost-effectively when that’s desired—but “if I need it to go into a deeper chain of thought, it can do that too,” he said.
According to El Maghraoui, differentiating models through the user experience may signal a broader shift toward a world “where raw intelligence is becoming a commodity.” “We’re starting to see a segmentation of markets between models that are focused on pure efficiency, and models that are trying to win with user experience and personality,” she said. “It’s a battle between model IQ versus model EQ.”
Industry newsletter
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
AI companies hoping to capture enterprise users are also segmenting their models along additional dimensions such as “trust, governance and compliance,” said El Maghraoui. IBM’s open-source Granite Guardian models, for example, function as digital guardians that can help ensure that other models are safe, compliant and auditable for business uses. Just because a model is open source does not guarantee “trust and transparency,” Baughman said.
For El Maghrauoui, “the next frontier may not be model quality, but also the integration. Who builds the most trusted, compliant and secure deployment pipelines?”
Explore Granite® 3.2 and the IBM library of foundation models in the watsonx® portfolio to scale generative AI for your business with confidence.
Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.