As AI agents become more sophisticated and autonomous, understanding their behavior, performance and decision-making processes is critical for ensuring reliability and governance. AgentOps, the practice of monitoring, observing and managing AI agents in production, provides the visibility needed to build trustworthy agentic AI systems.
This tutorial provides a step-by-step guide to setting up and using IBM Telemetry with watsonx Orchestrate® Developer Edition to monitor and govern AI agents. You’ll learn how to enable observability for AI agents and analyze their behavior in depth, from individual LLM calls to complete multistep workflows.
By the end of this tutorial, you’ll be able to:
IBM Telemetry is the native observability framework of watsonx Orchestrate that captures detailed information about how your AI agents execute requests. It records every step of the agent lifecycle, from routing decisions and prompt construction to LLM invocations and tools calls, providing complete visibility to agent behavior.
With IBM Telemetry, you can track performance metrics, monitor LLM cost, identify errors and ensure that your agents are operating as intended. IBM Telemetry provides enterprise-grade observability designed for production environments and AI systems at scale.
Before you begin, ensure that the following prerequisites are installed and configured on your system:
This guide includes installation steps for the ADK.
Authorization steps are provided later in this guide.
To get started, clone the GitHub repository by using https://github.com/IBM/ibmdotcom-tutorials.git as the HTTPS URL. For detailed steps on how to clone a repository, refer to the GitHub documentation.
Open the repository in your preferred integrated development environment (IDE) (for example, Visual Studio Code) and locate this tutorial’s project folder:
The IBM watsonx orchestrate Agent Development Kit (ADK) is a CLI tool that simplifies the installation, configuration and management of the watsonx Orchestrate Developer Edition.
To use the ADK, you must connect it to an existing watsonx Orchestrate environment. If you don’t have a watsonx Orchestrate account yet, you can sign up for a free 30-day trial. If you already have an account, you can use it to provide the environment credentials needed by the ADK.
These steps will guide you through installation by using a Python virtual environment, which is the recommended approach for keeping dependencies isolated. For alternative installation methods and detailed instructions, see the Getting started with the ADK documentation.
Create a new Python virtual environment in your project directory:
This step creates a
The activation command differs depending on your operating system.
macOS and Linux
Windows
Once activated, your terminal prompt should change to indicate you’re working inside the virtual environment (typically showing (
With your virtual environment activated, install the ADK by using pip:
This command downloads and installs the ADK along with all its dependencies. The installation can take a few minutes to complete.
Note: If you have an earlier version of the ADK installed (>
The ADK uses a
For alternative authentication methods and detailed configuration instructions, see the configuring your environment file documentation.
Inside the wxo-agentops directory, create a
Open the
The URL follows this format:
Copy and paste your service instance URL to replace the template value in your
Keep your API key secure and never commit it to version control. The
Now, you’re ready to install the watsonx Orchestrate Developer Edition, which will run a local instance of the watsonx Orchestrate server on your machine. This step also enables IBM Telemetry, giving you immediate access to observability features.
The ADK provides a single command that handles the entire installation process:
Let’s break down what this command does:
Execute the command from your wxo-agentops directory:
The following command starts the watsonx Orchestrate Developer Edition server by initializing the server environment:
Run this command to install watsonx Orchestrate server with IBM Telemetry:
This command creates internal containers managed by the ADK for:
The ADK automatically configures a virtual network that allows these containers to communicate with each other on
The installation process can take several minutes, especially on the first run as the necessary images are downloaded. A successful installation produces output similar to this example:
If you see this message, congratulations! Your local watsonx Orchestrate environment with IBM Telemetry is now running.
If the installation fails or hangs, try the following steps:
1. Reset the server:
This command stops and removes all containers created for watsonx Orchestrate, giving you a clean slate.
2. Restart the installation:
After resetting, run the start command again:
3. Check the server logs container status:
You can view service logs for the Orchestrate server to check for warnings or errors:
If the preceding steps do not work, reset the server and completely removing the server environment:
With the watsonx Orchestrate server successfully installed, you now need to activate your local environment and launch the chat interface where you’ll interact with your AI agents.
The watsonx Orchestrate ADK supports multiple environments (local, development, production, and so on.). You need to explicitly activate the local environment you created:
You should receive confirmation that the environment is active:
This sets the local environment as your default context for all subsequent ADK commands. Any agents, tools or configurations you work with will now target this local instance.
Start the watsonx Orchestrate chat UI service with the following command:
This command initializes the web-based chat interface and automatically opens it in your default browser. You should see output similar to:
The chat interface provides an easy way to interact with your AI agents. If the browser doesn’t open automatically, you can manually navigate to
Once the chat interface loads, you should see a clean chat window ready for interaction. At this stage, you haven’t imported any agents yet, so the interface will be mostly empty. That result is expected, you’ll add your first agent in the next step.
Now that your environment is set up, it’s time to import a preconfigured AI agent that demonstrates the monitoring capabilities of IBM Telemetry. This weather agent uses an external API tool to fetch real-time weather data, giving you a practical example to observe and analyze.
The weather agent is an ideal starting point because it:
From your project root (
This directory contains two YAML configuration files:
Tools are reusable capabilities that agents can invoke to perform specific actions. Import the
The
Now import the agent that will use this tool:
This command registers the Weather Agent in your local watonsx Orchestrate environment. The agent is preconfigured with:
Return to your browser where the chat interface is running. You might need to refresh the page to see the newly imported agent.
Click the agent dropdown menu (typically located at the top of the chat interface) and select Weather_Agent from the list
With the Weather Agent selected, try asking some questions to generate telemetry data:
Example queries:
The agent will process each request by:
Every interaction you have with the Weather Agent is being captured by IBM Telemetry. The system is recording:
In the next step, you’ll explore this telemetry data in detail to understand exactly how your agent behaves.
Now comes the most powerful part of this tutorial: using IBM Telemetry to gain deep visibility into your agent’s behavior. IBM Telemetry provides multiple views and analytics tools that let you understand every aspect of how your agent processes requests.
Open your browser and navigate to https://localhost:8765/?serviceName=wxo-server. The interface provides session replays that let you revisit past agent interactions for analysis.
Note: The URL uses
When the login screen appears, enter any name (to identify your local session) and click Login.
You’ll be taken to the main IBM Telemetry dashboard.
The dashboard shows a list of recent traces, each representing a single user interaction with an agent. Click the first trace in the Trace and Group Selection panel to view detailed analytics about your most recent chat with the Weather Agent.
This step takes you to the Agent Analytics screen, which serves as the central hub for understanding agent behavior.
The Agent Analytics screen provides an overview of the selected trace, including:
This high-level view gives you immediate insight into whether the agent performed as expected and how efficiently it operated.
The Tasks section is where you’ll spend most of your time analyzing agent behavior. It provides a visual, step-by-step timeline of everything the agent did during a request (every LLM call, tool invocation, routing decision and output generation).
Tasks are organized hierarchically to reflect how the agent actually executed the workflow, making it easy to understand the sequence of operations and their relationships.
Let’s examine the standard execution path for a watsonx Orchestrate agent request. Your Weather Agent trace should show a structure similar to this example:
This workflow shows the entire lifecycle of a single user query. Here’s what each task represents:
This approach matters because the root task’s duration tells you the total latency the user experienced. If the number is too high, you can analyze its child tasks to identify bottlenecks.
The router ensures that the correct downstream logic is invoked. If requests are being misrouted, this is where you’d identify the problem.
This step is where the “intelligence” of orchestration happens. The agent task ensures that the LLM receives all the context it needs to make informed decisions.
This step is the “thinking” step where the model processes information and makes decisions. Token usage, latency and quality issues all stem from this task. If your agent is slow or expensive, this step is usually the primary contributor.
This task ensures that the user receives a properly formatted response. If answers are being truncated or improperly formatted, this step is where you’d investigate.
Task workflow summary
To summarize the complete workflow:
All of this workflow is wrapped under the
Each task in the hierarchy contains three categories of attributes that provide detailed metadata about what the task consumed and produced:
1. Input attributes: Show everything the task received before execution: messages, tool responses, system instructions, the internal state.
Example: For the
2. Output attributes: Show what the task produced, including: LLM completions, tool calls and decisions.
Example: The same
3. General attributes: Provide telemetry metadata: token usage, timing information, identifiers like unique IDs and model information.
Example: You might see that a task used 450 input tokens and 120 output tokens, took 1.2 seconds to execute and used the
How to use task attributes
Together, these attributes let you fully understand what the model saw, what it decided and how it responded.
This level of detail is invaluable for debugging, optimization and validation.
Every task includes performance and cost-related metrics that summarize how the task executed. These metrics provide quantitative data about agent performance.
Key metrics include:
These metrics help you optimize performance and debug agent behavior. They can also aid in identifying slow tasks that could be parallelized or cached. This view is integral for capacity planning because it allows you to understand resource requirements for scaling and track token usage to control expenses.
For example, if you notice that a trace took 8 seconds but only 0.5 seconds were spent on LLM calls, you know that the bottleneck is elsewhere (likely in tool execution or network latency).
While tasks show you the logical workflow of your agent, spans represent the underlying system-level operations that occur during execution. Clicking the Spans tab reveals what the platform is doing internally to process each request.
Spans provide visibility into the low-level execution steps recorded by the orchestration framework (in this case, LangGraph, an open source framework running inside the wxo-server). Each span represents a discrete operation such as:
While tasks show the logical steps of agent execution (what the agent is trying to accomplish), spans show the technical steps (how the system accomplishes it). This dual view gives you both the high-level understanding and low-level debugging capability.
Example: A single task like
Each span includes tags that provide additional metadata and context. These tags are essential for filtering, debugging and analyzing agent performance.
Common span tags include:
Spans are useful for tracking down latency, understanding failures by seeing which internal component failed, analyzing patterns by filtering spans by tag to identify trends and cross-referencing by linking spans across multiple traces by using session IDs.
For example, if your agent occasionally hangs, you can filter spans by duration to identify which internal operations are taking unexpectedly long, perhaps a database query or a network call to an external service.
The Workflows tab provides a hierarchical visualization called the Runnables Tree, which shows the complete execution structure of your agent workflow. This view is especially useful for understanding complex multi-agent systems and nested execution patterns.
In the watsonx Orchestrate framework, a runnable is a unit of work or task that can be executed. Runnables can be:
The Runnable Tree displays parent-child relationships, making it easy to see:
For simple agents like the Weather Agent, the workflow view mirrors the task view closely. However, workflows become invaluable when you’re working with:
For example, imagine an agent that first checks if a query requires web search, then decides between using a calculator tool or a database query tool, and finally validates the result before responding. The Runnables Tree would show this entire branching structure clearly.
You can interact with the tree by:
The visualization makes debugging workflows significantly easier than trying to follow text logs or trace data alone.
The Eval (evaluation) tab provides a quality assurance and monitoring view that measures the correctness and reliability of your agent’s execution. This step is where you move from observing what happened to evaluating how well it happened.
The Eval tab shows evaluation results that assess quality through guardrails:
Evaluations help you monitor reliability by tracking how consistently your agent produces correct results, identify when changes degrade agent performance, prioritize improvements and build confidence by validating that agents work correctly before production deployment.
You can leverage evaluation measurements to set up alerts, track improvements, identify patterns and use feedback to guide development to improve prompts or tools.
If you notice that 15% of weather queries fail evaluation, you can investigate those specific traces to understand whether the issue is bad input handling, API failures or incorrect response formatting.
The Issues tab provides a centralized view of everything that went wrong during workflow execution. This tab is your first stop when debugging agent failures or unexpected behavior.
The Issues tab lists problems such as:
In the screenshot above, you can see a Tool Error that occurred when the weather API returned a 424 (Failed Dependency) or 404 (Not Found) error. The issues tab shows:
This approach makes it simple to understand what went wrong without digging through logs or trace data.
The Issues tab is especially valuable because it aggregates failures instead of forcing you to hunt through individual tasks. It provides complete context by including full error details and related data, while severity levels enable quick triage so you can prioritize which issues to address first. The direct links to source tasks mean that one click takes you to the exact execution point where things went wrong.
The Trajectory tab provides a chronological, conversation-style view of the agent interaction between the user and any tools the agent invokes. This view is invaluable for understanding the full context and flow of agent behavior.
The Trajectory view is useful because it allows you to see exactly how the agent processes requests from start to finish, giving you complete visibility into agent behavior. You can validate tool integration by ensuring tools are called with correct parameters and receive appropriate responses. When debugging unexpected responses, the trajectory helps you trace where the logic diverged from your expectations. You can also analyze how context builds over multiple conversation turns, watching the workflow evolve naturally. Beyond debugging, the trajectory serves as documentation, letting you capture examples of correct behavior that can be shared with team members or used as reference cases for future development. This view is particularly valuable for teams building with generative AI who need to validate agent adaptability across diverse scenarios.
Let’s walk through the Weather Agent trajectory shown in the screenshot:
1. The user query
The conversation starts with a clear, specific request about weather in New York City.
2. Agent makes a tool call
The agent recognizes it needs external data and invokes the weather tool:
This example shows that the agent correctly identified NYC’s approximate coordinates, properly structured the request for the API, and set the appropriate flag for current weather.
IBM Telemetry displays this result both as raw JSON and a nicely parsed, expandable tree view.
3. The tool returns data
The weather API responds with structured weather data:
This example shows that the tool successfully retrieved data and the response follows the expected schema and all required fields are present. Being able to inspect the raw tool response is crucial for debugging issues where the agent misinterprets tool outputs.
4. Agent summarizes the result
Finally, the agent processes the structured data and responds naturally:
The agent correctly extracted the temperature and weather code and converted the structured data into natural language. The response is concise and answers the user’s question.
The Trajectory tab also supports filtering by role to view only user messages, agent messages or tool interactions. You can also expand and collapse parts of long conversations to focus on details that matter to you. For further analysis or debugging, you can export the data as JSON to jump to linked tasks from trajectory steps for corresponding details.
Congratulations! You’ve successfully set up IBM Telemetry with watsonx Orchestrate and learned how to monitor and analyze AI agent behavior in depth. IBM Telemetry provides multiple layers of visibility to give you complete observability into how your AI agents think, decide and act. These capabilities you’ve explored are crucial for effective lifecycle management of agent operations in production or integrating with other agent frameworks in your environment.
If you encounter issues or have questions, check the documentation. Most common issues are covered in the troubleshooting guide. You can also review GitHub issues to see whether others have experienced similar problems.
Agent monitoring through platforms like IBM Telemetry has created a robust ecosystem for AgentOps, becoming essential as autonomous agents take on more complex tasks that involve integrating SDKs, tools and external APIs. The visibility you’ve gained into agent behavior enables you to create more reliable, efficient and trustworthy AI systems.
Build, deploy and manage powerful AI assistants and agents that automate workflows and processes with generative AI.
Build the future of your business with AI solutions that you can trust.
IBM Consulting AI services help reimagine how businesses work with AI for transformation.