Claude Sonnet 4.5 pushes agents closer to becoming the next OS

3D render with blue and orange neon lines and dots forming a digital hallway

Author

Anabelle Nicoud

Staff Writer

IBM

This article was featured in the Think newsletter. Get it in your inbox.

Could agents be the next OS?  

Ismael Faro, a VP of Quantum and AI at IBM Research, has been integrating Claude Sonnet models in his workflows since they were released. Anthropic’s newest coding model, Claude Sonnet 4.5, released earlier this week, sends an important signal, according to him: agents can effectively become new operating systems.

The model, which the company is calling its strongest model for building complex agents and the best model at using computers, illustrates how agentic AI can create apps and tools on the fly, said Faro. To illustrate this point, he showed how on a single prompt, an agent built a custom team management tool for a leader.

Faro, who has been exploring how agents could operate within a Unix-like paradigm—think: small, composable processes managed by a privileged “system” layer—offered IBM Think a demonstration as he was answering our questions during a recent interview.

“We’re witnessing a fundamental shift: from static applications to dynamic, self-modifying systems that blur the line between code and prompts,” he wrote in a LinkedIn post.

Faro said agentic systems are turning LLMs and coding models into something more dynamic: tools that can build and adapt themselves to fit each user’s needs.

“This means many things, but one of them is: are we going to need app stores in the future?” Faro said.

And Faro believes that the tools will only get better and better. It’s been only four months since Anthropic released Claude Sonnet 4, a model that was widely adopted by the developer community. But Sonnet 4.5 ranks among the top performers on SWE-bench, solving 82% of real-world software engineering tasks, according to Anthropic, compared to 72% for its previous version. “The systems are going to create everything by themselves,” he said.

Building the next generation of hardware

Of course, as coding models can build apps, it raises many questions about user interfaces, design and objects. “What is going to be the next user interface for all of that? Are there going to be icons? Are we going to use voice?” Faro asked. “Many people are working on this question.” 

The smartphone company Nothing, for example, recently launched Playground, a community platform for sharing user-generated apps, and Essential Apps, where users can build and customize AI apps. Silicon Valley is also clearly captivated by the future of AI design. This past spring, OpenAI teamed up with LoveFrom, the new company from iPhone designer Jony Ive, with the goal of building hardware for the AI age. 

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

What this means for enterprise

When it comes to the enterprise, however, AI agents that act like operating systems could introduce their own set of challenges. “What happens with the security, what happens with the data?” Faro said. “If these systems can create this automatically and can create this magically, where is the origin of this magic decision? What is the humanity in the middle to try to validate all these things?” 

IBM Research has been working on solutions to these questions. For example, the BeeAI framework’s RequirementAgent aims to add a new layer of reliability to the AI agent ecosystem. “A lot of the time, one of the blockers about why agents aren’t fully in production is because companies are concerned about their reliability,” said Sandi Besen, an AI Research Engineer and Ecosystem Lead at IBM Research, during an interview with IBM Think.  

“We are very clever and clear in defining the new protocols, the new interfaces, and that is the reason why I pay attention to Agent Communication Protocol or A2A [Agent2Agent],” Faro said. “First, we need to define how all these agents interact. Then, we define how you can modify one direction [an agent takes] or another direction, how you can put the information in one direction or another direction. The operating system leaders are going to take the information from all of them.”  

For Faro, the security question is paramount. “We are going to need to have other agents that have more privilege than the applications, because if they modify the operating system, the security is broken.” 

Related solutions
Model customization with InstructLab

See how InstructLab enables developers to optimize model performance through customization and alignment, tuning toward a specific use case by taking advantage of existing enterprise and synthetic data.

Discover watsonx.ai
AI for developers

Move your applications from prototype to production with the help of our AI development solutions.

Explore AI development tools
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Enhance AI model performance with end-to-end model customization with enterprise data in a matter of hours, not months. See how InstructLab enables developers to optimize model performance through customization and alignment, tuning toward a specific use case by taking advantage of existing enterprise and synthetic data.

Explore watsonx.ai Explore AI development tools