Think smart, not hard: How Claude's hybrid reasoning could change AI economics

Woman working on laptop at night

Author

Sascha Brodsky

Staff Writer

IBM

Anthropic's new Claude 3.7 Sonnet can now turn its deep thinking mode on and off like a light switch, answering simple questions instantly while reserving the computational heavy lifting for complex problems that need it.

This hybrid reasoning approach marks a shift in artificial intelligence that experts say can both cut costs and boost capabilities, with IBM's Granite models also adopting similar toggling features based on the task complexity. This evolution comes as organizations worldwide struggle with the financial realities of advanced AI, potentially making sophisticated reasoning more accessible while conserving valuable computing resources.

"The cost structure of thinking models matters; not all questions require a 32-second pause for the model to think through it," Maya Murad, Product Manager for AI at IBM Research, says during a recent episode of the Mixture of Experts podcast. "This capability allows enterprises to use resources intelligently, applying extensive computation only when the problem requires it, creating AI systems that better match how humans approach different cognitive tasks."

The economics of machine thought

Hybrid reasoning signals a shift in the AI industry's focus from simply building more powerful systems to creating ones that are practical to use, Abraham Daniels, a Senior Program Manager with IBM Research, tells IBM Think. For businesses, this change could be crucial, as the cost of operating sophisticated AI has become a major consideration.

Models consume significantly more computational resources—and therefore cost more money—during deep reasoning than when providing simple responses. Hybrid reasoning lets companies optimize AI spending by matching computation levels to task complexity.

Anthropic recently launched Claude 3.7 Sonnet with "extended thinking mode," allowing users to request more thorough analysis when needed. IBM similarly equipped its Granite models with "toggling" capabilities, giving users control over when to activate intensive reasoning.

"We built hybrid reasoning with a different philosophy to other reasoning models on the market," an Anthropic spokesperson told IBM Think. "Our approach is based on how the human brain works. As humans, we don’t have two separate brains for fast versus deep thinking—and at Anthropic, we regard reasoning as something that needs to be deeply integrated into the capabilities of all of our models versus a separate feature. This approach is based on how we see Claude integrating with our customers across all applications. While some interactions require quick responses, like brainstorming marketing collateral, others, like complex financial analysis or industry research, require deeper, longer thinking. We wanted to make both of these functionalities as simple and cost-effective as possible for our customers to access and use."

The AI's thought process becomes more transparent with this approach. "The model itself is still a black box, but at least on the output, you can kind of see how the model came to that conclusion," Daniels says. This visibility can improve results and address explainability concerns, which is particularly important for regulated industries, he says.

Daniels and other experts see this development as addressing a practical need: eliminating unnecessary computational overhead for straightforward questions.

"You don't need a ton of reasoning for all tasks, and it gives you the ability, basically, when you have more complicated things, to pay more—both in terms of latency and cost," says Kate Soule, Director of Technical Product Management at IBM Research, on the podcast.

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

Inside the black box

The inner workings of large language models (LLMs) have traditionally been opaque. A model would receive a prompt and generate a response, without revealing its internal reasoning steps.

Hybrid reasoning changes this dynamic by exposing a model’s step-by-step thinking process. When activated, systems like Granite 3.2 show their work, making the logical paths they follow visible.

"Our decision to make Claude’s reasoning process visible reflects consideration of multiple factors. One of those factors includes enhanced user experience and trust transparency in Claude’s reasoning process," the Anthropic spokesperson said. "This provides users with insight into how conclusions are reached, fostering appropriate levels of trust and understanding. Users generally trust outputs more when they can observe the chain of thought. We hope this visibility allows users to better evaluate the quality and thoroughness of Claude’s reasoning, and helps users better understand Claude’s capabilities. Furthermore, we hope users and developers can create better prompts by reading Claude’s thinking outputs and providing targeted feedback on specific reasoning steps."

"Being able to expose the actual thinking of the model is great for explainability," says Daniels. "Prior to being able to demonstrate the chain-of-thought (CoT) reasoning, it was really just the next token probability. So a little bit of a black box."

These technologies have business applications that extend across many industries. "Finance and legal are natural fits because they deal with structured documentation," says Daniels, adding that "any regulated industry stands to gain tremendous value" from these advanced thinking models.

But hybrid reasoning can be especially useful in domains requiring complex analysis.

"Math and code are really the two focus points that I've seen in terms of benchmarks for reasoning," says Daniels. For software development, the benefits could be substantial: "Using a thinking model would be able to frame out what the scope of the project should look like given the requirements that you've laid out," he says.

Standard LLMs generate responses by predicting the most likely next word based on patterns in their training data. This approach works well for many tasks, but these models can struggle with multi-step reasoning problems.

Hybrid reasoning models can switch into a computationally intensive mode, explicitly generating intermediate reasoning steps before providing a final answer. The model uses these steps to work through complex problems, similar to how humans write out intermediate steps when solving complex math problems.

The architecture enabling hybrid reasoning builds upon what researchers call "test-time compute," which involves dedicating computational resources during inference rather than only during training.

"A lot of times, traditionally, all your computing power would be used to train the model, and then inferencing the model would be relatively light in terms of computational requirements," Daniels says.

But as AI systems grow more complex, the challenge won’t just be processing power—it’ll be knowing when to use it efficiently. That’s why the next frontier for hybrid reasoning, Daniels says, will be smarter self-regulation: teaching AI when to activate its deeper thinking mode on its own, without humans telling it to do so.

"The next step in terms of reasoning models, or hybrid reasoning models, is how can we better understand or better triage inputs within the test-time compute, or within the thinking framework," he says.

Mixture of Experts | 5 December, episode 84

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Related solutions
IBM® watsonx Orchestrate™ 

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
Artificial intelligence consulting and services

IBM Consulting AI services help reimagine how businesses work with AI for transformation.

Explore AI services
Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

Explore watsonx Orchestrate Explore watsonx.ai