ibm cloud developers working inside ibm

When one AI model isn’t enough

Artificial intelligence is learning to delegate.

For the past few years, the story of AI has been about making individual models bigger, smarter and faster. Now, a different idea is gaining ground: that the real power might come not from a single brilliant model, but from many models working together, each doing what it does best. Think: less lone genius, more well-run team.

Perplexity Computer, launched recently, is the latest expression of that idea. Instead of routing a complex task to one AI model and hoping for the best, the system breaks the job into pieces and sends each piece to a specialized AI agent, a software program designed to take actions and make decisions on its own. The agents work simultaneously—typically in the cloud, rather than on a user’s own machine—and report back when done.

“The space around orchestration and building something that can manage a team is getting a bit crowded,” Aaron Baughman, an IBM Distinguished Engineer and Master Inventor, said on a recent episode of the Mixture of Experts podcast.

The reason so many researchers and companies are chasing this idea, Baughman said, comes down to two stubborn limits that even the most powerful AI models run into. The first is speed: a single model will work through a long, complicated task one step at a time, like a solo surgeon performing every role in an operating room at once. The second is memory: every AI model has a context window, a limit on how much information it can hold in its working memory at once. Push a model past that limit, and it starts to lose the thread, forgetting earlier instructions and making mistakes that compound.

“If coordinated correctly and the task can be parallelized, meaning split into simultaneous streams, N agents can complete the task N times as fast as a single agent alone,” Eugene Vinitsky, an Assistant Professor at New York University, said in an interview with IBM Think. “As the length of the input to an agent grows, its performance degrades accordingly, and it can start to forget things or fail to execute its task. Spawning things to sub-agents with dedicated roles can be useful for wringing the best performance out of agents.”

There is a diagnostic benefit, too, according to Niranjan Balasubramanian, an Assistant Professor of Computer Science at Stony Brook University. When a single model makes a mistake in the course of completing a long task, finding and fixing the error can be like searching a room for a lost key in the dark. Distributed systems make the problem smaller and the solution cleaner.

“The partitioning of roles across the agents actually allows for effective debugging and analysis of failure modes,” Balasubramanian said in an interview with IBM Think. “Getting into multi-agent systems not only has immediate computational and modularity benefits. It is where I believe systems development is: AI as services.”

The approach draws on a design principle that has been central to software engineering for decades, Balasubramanian said: complex systems work better when they are built from smaller, independent parts that each do one thing well, a principle called “modularity.” “Specialized AI models working together naturally mirrors this need when building complex workflows,” he said.

None of this fully explains why agent systems captured the popular imagination when they did. Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, pointed to something that sounds almost comically mundane. What changed the conversation, he said on Mixture of Experts, was not a technical breakthrough, but the addition of a for loop, a basic programming instruction that tells a system to repeat a task, and cron jobs, scheduled commands that fire automatically at set times. “That had a self-improvement aspect to it,” Goodhart said. “It actually tried curate almost a persona for your bot. That part may not have had a whole lot of utility to it, but it certainly started capturing the imagination and personified these things in a way that just having a long-running agent doesn’t.”

Giving an AI system a personality, it turns out, is more persuasive than giving it a benchmark. Perplexity Computer manages long-running tasks by routing them to sub-agents, Goodhart said, but what made early agent systems resonate with users was something it does not appear to replicate. “I think the kernel of what Perplexity Computer is trying to do is manage extremely long-running tasks that are very high-order in nature,” he said.

Separating real science from inflated claims

Years of academic work underpin these systems, Baughman said. Work on deep learning frameworks for optimal agent selection, systems that decide which AI model is best suited to a given subtask, and on the theory of manager agents that coordinate others has been developing in academic literature for years. “Lots of this work is built around that, which has been years in  development,” he said. “That gives us a foundation within science and engineering.”

That research foundation has not protected any product from overpromising, Baughman noted. Perplexity describes its new system as enabling “fully autonomous project management” and “broad accessibility.”

This is a pattern familiar to anyone who has watched a technology cycle play out .“What they’re really doing is trying to build you an end-to-end agentic platform where all of these elements come together,” said Chris Hay, a Distinguished Engineer at IBM, on the podcast. The vision is coherent. Whether any current product fully delivers it is a different question.

The security problems agents haven’t solved

There is a less comfortable aspect to all of this that tends to get lost in the product announcements. More capable AI architectures have introduced new security risks, Goodhart said, and neither open nor closed systems have fully resolved them. And the instinct to treat closed, curated systems as inherently safer than open ones misses the actual threat landscape.

“It’s not actually about open versus closed,” he said on the podcast. “It’s about building chain of trust. And that can happen in an open ecosystem or a closed ecosystem.”

Goodhart identified three distinct threats. The first is running an agent with unrestricted access to a user’s own computer and files—something cloud-based architectures address by keeping the agent off the device entirely. The second is giving an agent destructive permissions, such as the ability to delete emails or remove files. That risk persists regardless of where the system runs.

“If it’s given delete access, it can choose to do that, sandboxed or not,” Goodhart said, using “sandboxed” to describe the contained environment in which cloud agents typically operate. “It can just make the wrong decision. That part is not mitigated.”

The third threat is the most technically involved, and probably the most underappreciated. Prompt injection is an attack in which a threat actor hides malicious instructions inside content that an agent reads, hijacking its behavior. Imagine handing a new employee a folder of research documents, one of which contains a hidden note telling her to wire money to a stranger. Goodhart pointed to efforts such as skill.sh, a repository of reusable tools that AI agents can call to perform specific tasks. Curated libraries like this can limit one version of the prompt-injection problem by restricting agents to vetted integrations, he said, and the open-source community is building comparable vetting systems around those tools.

“If you look at skill.sh, you’ll see that all the skills are rated, and they have basically the same level of inspection you would see on a curated Docker hub type of ecosystem,” he said, referring to the industry-standard registry for vetted software components.

The web itself is the harder problem, Goodhart said. Any page an agent visits could carry hidden instructions in its metadata, the invisible code that describes a webpage to software.

“If these agents are going to go scrape the web, there is absolutely nothing preventing web page X, Y and Z from hiding prompt injection in the metadata and letting that get slurped into your agent,” Goodhart said. “I don’t think any of these systems are tackling that yet.”

Smart Talks

Redefining beauty through AI innovation

Malcolm Gladwell dives into the exciting collaboration between L'Oréal and IBM, exploring how a custom AI foundation model could revolutionize cosmetic product development and drive more innovation and sustainability.

OpenClaw, the open-source agent framework that drew widespread attention earlier this year for its ability to coordinate multiple AI agents to complete complex tasks, offers a useful point of comparison. Where Perplexity Computer runs in the cloud on a curated list of approved integrations, OpenClaw gives users unrestricted access to their own machine and an open ecosystem of skills and tools.

Perplexity Computer trades that openness for a narrower, more controlled environment, Goodhart said, which reduces some risks but does not eliminate them.

For organizations considering whether to deploy tools like Computer, researchers say moving too fast is one of the main ways things go wrong. “Tie it to a measurable value, ensure you measure that value rather than a proxy and iterate quickly,” Vinitsky said. Balasubramanian stressed that the biggest risk often has nothing to do with the technology itself. “The most important thing is to build on institutional knowledge about the existing processes rather than a blind overhaul that loses this critical knowledge,” he said.

Economics needs the same scrutiny, Baughman said. As the cost of running AI inference falls, the volume of messages passing between agents rises, and with it the network costs. “The economics just shift around,” he said. “We need to better understand how that’s going to work.”

On the question of whether users will be permanently tethered to whichever platform holds their AI memory, Goodhart said the trend is toward portability. Progress on interoperability, the ability of different systems to share information, will allow AI memory and user context to move across platforms rather than stay locked inside a single product.

He said memory presents the same engineering challenge as other components in the agent ecosystem. “There’s no fundamental reason why memory should be any different than anything else in this ecosystem. The same shape of problem that you can move from one place to the other.”

Author

Sascha Brodsky

Staff Writer

IBM

Related solutions
IBM Granite

Achieve over 90% cost savings with Granite's smaller and open models, designed for developer efficiency. These enterprise-ready models deliver exceptional performance against safety benchmarks and across a wide range of enterprise tasks from cybersecurity to RAG.

Explore Granite
Artificial intelligence solutions

Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Explore the IBM library of foundation models in the IBM watsonx portfolio to scale generative AI for your business with confidence.

  1. Discover watsonx.ai
  2. Explore IBM Granite AI models