Estimated at around USD 5 billion in 2024, the AI agents market is projected to grow to about USD 50 billion by 2030.1 Yet as more enterprises build AI agents to streamline and automate workflows, new challenges emerge in monitoring the behavior of those agents, ensuring they perform as intended. AgentOps is a roughly-defined set of emerging best practices in evaluating agent performance, which builds on precepts established in the related fields of DevOps (which standardized software delivery) and MLOps (which did the same for machine learning models).

But managing agents isn’t as straightforward as building traditional software or even AI models. “Agentic” systems are complex and dynamic, essentially involving software with a mind of its own. Agents act autonomously, chain tasks, make decisions and behave non-deterministically. The idea behind AgentOps is to bring observability and reliability into a realm that could be chaotic, enabling developers to peer into the black box of agent interactions and other agent behavior.

There is no single tool to manage AgentOps, but rather an entire ecosystem; a recent study discovered 17 tools on Github and other code repositories relevant to the practice, from Agenta to LangSmith to Trulens (One ambitiously named AgentOps tool is called, simply, “AgentOps”). These tools typically provide support to developers’ agent framework of choice, be it IBM’s watsonx Agents or OpenAI’s Agents SDK. In this heated space, many popular platforms and frameworks have emerged, including AutoGen, LangChain and CrewAI (the latter optimized for the orchestration of multi-agent systems).