Anthropic's new Claude 3.7 Sonnet can now turn its deep thinking mode on and off like a light switch, answering simple questions instantly while reserving the computational heavy lifting for complex problems that need it.
This hybrid reasoning approach marks a shift in artificial intelligence that experts say can both cut costs and boost capabilities, with IBM's Granite models also adopting similar toggling features based on the task complexity. This evolution comes as organizations worldwide struggle with the financial realities of advanced AI, potentially making sophisticated reasoning more accessible while conserving valuable computing resources.
"The cost structure of thinking models matters; not all questions require a 32-second pause for the model to think through it," Maya Murad, Product Manager for AI at IBM Research, says during a recent episode of the Mixture of Experts podcast. "This capability allows enterprises to use resources intelligently, applying extensive computation only when the problem requires it, creating AI systems that better match how humans approach different cognitive tasks."
Hybrid reasoning signals a shift in the AI industry's focus from simply building more powerful systems to creating ones that are practical to use, Abraham Daniels, a Senior Program Manager with IBM Research, tells IBM Think. For businesses, this change could be crucial, as the cost of operating sophisticated AI has become a major consideration.
Models consume significantly more computational resources—and therefore cost more money—during deep reasoning than when providing simple responses. Hybrid reasoning lets companies optimize AI spending by matching computation levels to task complexity.
Anthropic recently launched Claude 3.7 Sonnet with "extended thinking mode," allowing users to request more thorough analysis when needed. IBM similarly equipped its Granite models with "toggling" capabilities, giving users control over when to activate intensive reasoning.
"We built hybrid reasoning with a different philosophy to other reasoning models on the market," an Anthropic spokesperson told IBM Think. "Our approach is based on how the human brain works. As humans, we don’t have two separate brains for fast versus deep thinking—and at Anthropic, we regard reasoning as something that needs to be deeply integrated into the capabilities of all of our models versus a separate feature. This approach is based on how we see Claude integrating with our customers across all applications. While some interactions require quick responses, like brainstorming marketing collateral, others, like complex financial analysis or industry research, require deeper, longer thinking. We wanted to make both of these functionalities as simple and cost-effective as possible for our customers to access and use."
The AI's thought process becomes more transparent with this approach. "The model itself is still a black box, but at least on the output, you can kind of see how the model came to that conclusion," Daniels says. This visibility can improve results and address explainability concerns, which is particularly important for regulated industries, he says.
Daniels and other experts see this development as addressing a practical need: eliminating unnecessary computational overhead for straightforward questions.
"You don't need a ton of reasoning for all tasks, and it gives you the ability, basically, when you have more complicated things, to pay more—both in terms of latency and cost," says Kate Soule, Director of Technical Product Management at IBM Research, on the podcast.
The inner workings of large language models (LLMs) have traditionally been opaque. A model would receive a prompt and generate a response, without revealing its internal reasoning steps.
Hybrid reasoning changes this dynamic by exposing a model’s step-by-step thinking process. When activated, systems like Granite 3.2 show their work, making the logical paths they follow visible.
"Our decision to make Claude’s reasoning process visible reflects consideration of multiple factors. One of those factors includes enhanced user experience and trust transparency in Claude’s reasoning process," the Anthropic spokesperson said. "This provides users with insight into how conclusions are reached, fostering appropriate levels of trust and understanding. Users generally trust outputs more when they can observe the chain of thought. We hope this visibility allows users to better evaluate the quality and thoroughness of Claude’s reasoning, and helps users better understand Claude’s capabilities. Furthermore, we hope users and developers can create better prompts by reading Claude’s thinking outputs and providing targeted feedback on specific reasoning steps."
"Being able to expose the actual thinking of the model is great for explainability," says Daniels. "Prior to being able to demonstrate the chain-of-thought (CoT) reasoning, it was really just the next token probability. So a little bit of a black box."
These technologies have business applications that extend across many industries. "Finance and legal are natural fits because they deal with structured documentation," says Daniels, adding that "any regulated industry stands to gain tremendous value" from these advanced thinking models.
But hybrid reasoning can be especially useful in domains requiring complex analysis.
"Math and code are really the two focus points that I've seen in terms of benchmarks for reasoning," says Daniels. For software development, the benefits could be substantial: "Using a thinking model would be able to frame out what the scope of the project should look like given the requirements that you've laid out," he says.
Standard LLMs generate responses by predicting the most likely next word based on patterns in their training data. This approach works well for many tasks, but these models can struggle with multi-step reasoning problems.
Hybrid reasoning models can switch into a computationally intensive mode, explicitly generating intermediate reasoning steps before providing a final answer. The model uses these steps to work through complex problems, similar to how humans write out intermediate steps when solving complex math problems.
The architecture enabling hybrid reasoning builds upon what researchers call "test-time compute," which involves dedicating computational resources during inference rather than only during training.
"A lot of times, traditionally, all your computing power would be used to train the model, and then inferencing the model would be relatively light in terms of computational requirements," Daniels says.
But as AI systems grow more complex, the challenge won’t just be processing power—it’ll be knowing when to use it efficiently. That’s why the next frontier for hybrid reasoning, Daniels says, will be smarter self-regulation: teaching AI when to activate its deeper thinking mode on its own, without humans telling it to do so.
"The next step in terms of reasoning models, or hybrid reasoning models, is how can we better understand or better triage inputs within the test-time compute, or within the thinking framework," he says.