This article was featured in the Think newsletter. Get it in your inbox.
OpenAI’s recently released ChatGPT agent expands the role of its artificial intelligence from conversational support to autonomous task execution. The system can interact with websites, write and run code, complete forms and perform multi-step actions on a virtual computer with limited human involvement.
The rollout highlights a broader shift in enterprise AI adoption. Instead of tools that require prompt-by-prompt input, companies are now evaluating systems that can act with partial or full autonomy. IBM researchers say this creates new opportunities for efficiency but also raises questions around oversight, system reliability and security. As businesses consider deployment, key decisions will hinge on task criticality, acceptable risk and the level of control organizations are willing to delegate to AI systems.
“At the simplest level, what OpenAI has released is an agent framework that has many more degrees of freedom than previous agentic capabilities,” Gabe Goodhart, Chief Architect for AI Open Innovation at IBM, told IBM Think in an interview. “[This allows] the system to accomplish tasks that require more complicated planning and a wider range of capabilities to complete.”
The technology builds on OpenAI’s earlier tools, such as Operator, which enables AI systems to control computers and web browsers. The new agent platform combines multiple tools into a single interface that allows for planning and executing multi-step tasks iteratively.
The agent scored 41.6% on what’s been called “humanity’s last exam” and 27.4% on a math test called Frontier Math. While those numbers suggest strong capabilities, they also highlight that these systems still make mistakes.
“With degrees of freedom comes also degrees of risk,” Goodhart said. The increased autonomy means moving into what he calls “probabilistic computing rather than deterministic computing.” Unlike traditional software that executes the same instructions every time, AI agents operate on a “best effort” basis.
OpenAI’s focus on individual users stands in contrast to the enterprise-first approach taken by other providers, who have built their systems with corporate security, governance and control in mind from the start. ChatGPT agent targets individual users and personal tasks, but business customers want different things: higher accuracy and tighter control. IBM’s watsonx Orchestrate promises more robust enterprise integration and on-premises deployment to keep corporate data secure.
The release also sharpened the industry conversation around personal versus enterprise agents. OpenAI’s design assumes a single user can act across tools and data without guardrails—an approach that poses significant risks inside corporate environments. IBM’s watsonx Orchestrate, by contrast, was built from the ground up for governed collaboration, shared workflows and secure coordination across teams.
The ChatGPT agent currently connects with consumer services, such as Gmail and GitHub, while Orchestrate is integrated with core enterprise platforms, including Salesforce, SAP, Workday and ServiceNow.
The OpenAI system appears to use what researchers call “agentic flows”: patterns that allow large language models to reason through problems, select tools and execute plans. Two approaches drive these capabilities: ReAct, where the agent works in a loop, taking actions and observing results; and ReWOO, which plans all steps before execution. OpenAI’s system seems to combine elements of both.
“Prior to the launch of the ChatGPT agent, OpenAI released several other key technologies,” Goodhart said, describing how the company has built what he sees as “a fairly principled approach to tool usage where they are creating well-defined buckets of tools for the most common problems that users would like to solve.”
The setup combines visual browsers, terminal access and direct API calls. Each additional AI component could increase the potential for errors; however, some configurations may provide checks that improve overall performance.
“Every time you add additional uncertainty, an additional step of AI into the system, you have some additional place where it could go off the rails,” Goodhart said. “Sometimes these things can act as checks and balances to each other and actually bring the uncertainty down.”
For Maryam Ashoori, VP of Product and Engineering at IBM, the technical capabilities matter less than the business context.
“It’s not about what the agent can do,” she told IBM Think in an interview. “It’s about what’s at stake—and how much control you’re willing to give up.”
IBM emphasizes a security-first architecture that includes strict access controls to manage user permissions, identity integration to verify users, and isolated execution to keep processes separated and secure. The company says these are not optional features but foundational to how the system is built.
The transition from research to business use necessitates that organizations balance productivity gains against new risks. Ashoori envisions agents in email systems that automatically summarize messages and draft responses, calendar applications that optimize schedules and procurement tools that analyze supplier contracts.
But each capability requires granting new permissions. “Every single permission granted to these agents becomes a vulnerability point,” Ashoori said. Organizations must balance convenience against exposure to data breaches and operational disruptions.
Goodhart said the utility depends heavily on job function. “If you are an everyday business user whose job relies on synthesizing information, gathering a wide variety of sources and drawing connections, this could be a huge accelerator,” he said. “If you are a line of business user that exists in a role that has very precise controls and very precise workflow associated with it, this is likely not going to help you very much.”
The enterprise market operates under different constraints than consumer applications. While individual users can decide what information to share, business deployments require institutional frameworks for regulatory compliance and data protection.
“[The] consumer is usually in charge of all the permission grants and everything is there,” Ashoori said. “But for enterprise users, it’s usually governed by a body of like policies or constructs of the company.”
This governance requirement could create opportunities for specialized enterprise AI providers who understand regulatory frameworks. It also suggests that broad deployment might require new insurance and audit capabilities. Some of the biggest potential gains may come from automating the kinds of loosely defined, repetitive—or “fuzzy”—tasks that typically slow people down, such as gathering information across systems, preparing reports or filling out forms.
“I think with the appropriate guardrails in place, an agentic system like this could go a long way inside an enterprise towards accelerating the efficacy of those fuzzy tasks and towards reducing friction that people experience on a day-to-day basis with those fuzzy tasks,” Goodhart said.
Industry newsletter
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Build, deploy and manage powerful AI assistants and agents that automate workflows and processes with generative AI.
Build the future of your business with AI solutions that you can trust.
IBM Consulting AI services help reimagine how businesses work with AI for transformation.