As the number of AI agents built grows daily, enterprises are exploring autonomous systems to support decision-making and operational workflows. At the same time, organizations often express concerns about explainability, governance and production readiness, especially in multi-agent systems that rely on large language models (LLMs) for reasoning. It is a challenge to use agents in business settings that provide responses without a clear explanation or that exhibit inconsistent behavior.
These challenges can be addressed by using IBM® watsonx Orchestrate® that provides a structured platform, which supports self-hosting, tool-based reasoning, API calls and enterprise-grade control without requiring custom orchestration code.
In this tutorial, you will learn how to create an AI agent in watsonx Orchestrate with the watsonx Orchestrate Agent Development Kit (ADK). In LangChain or LangGraph-based AI systems, developers frequently handle retries, state handling and async execution manually. Watsonx Orchestrate ADK makes this process easier by integrated lifecycle management that allows consistent, enterprise-ready agent deployments.
You will use Python to create the risk evaluation logic, define agent behavior, and incorporate Langfuse to monitor and enhance agent behavior with real execution traces. You can find this tutorial on GitHub as well.
Langfuse is an open source LLM observability platform built on OpenTelemetry for monitoring LLM applications and agents. It uses telemetry (traces, metrics and logs) to monitor traces of agent executions, including LLM calls, tool calls, token usage, metadata and latency at a per-request and per-session level.
It enables developers to gain deep insights into how agents perform in real-world scenarios, pinpoint the root causes of incorrect outputs and systematically optimize agent behavior for greater reliability and efficiency.
The goal of this tutorial is to build an AI agent for enterprise governance that evaluates vendor risk through deterministic, rule-based analysis. We created a synthetic dataset of vendors with financial ratings, security certifications and incident history. The agent processes this data and classifies vendors into specific risk tiers (low, medium or high) while providing justifications for its decisions.
It also supports interactive follow-ups allowing users to compare vendors or simulate how changing data points would impact a risk score, ensuring the high level of auditability required for corporate compliance.
To complete this tutorial, you need:
Python 3.11 or later installed on your system.
A watsonx Orchestrate account. For this tutorial, a trial account is sufficient. You can use IBM Cloud to create a free 30-day trial when you don’t have an account.
A watsonx Orchestrate API key from the Orchestrate user interface (UI).
A watsonx Orchestrate ADK installed on your system.
Sign in to watsonx Orchestrate through IBM Cloud and open the watsonx Orchestrate UI. Go to API details after opening Settings from the profile menu. Create a new API key, copy it and save it safely. During local development, the watsonx Orchestrate ADK is authenticated with this API key. Watsonx Orchestrate acts like a local SDK (Software Development Kit) for building and testing agent logic.
In this step, create a local development environment. Throughout this tutorial, you will run a local development server, import tools, set up environments and create agents through the orchestrated command-line interface (CLI) provided by the ADK in Powershell. Start by navigating to the directory where you want to build your project. Then, create a new Python virtual environment:
An isolated Python environment is created in a .venv folder. Using a virtual environment ensures that all dependencies for this tutorial are separated from Python installation in your entire system.
Then, activate the virtual environment. Your operating system determines which activation command to use.
On Windows:
macOS and Linux:
Once activated, your terminal prompt will display .venv at the beginning to show that you are working in a virtual environment.
With the virtual environment active, install the watsonx Orchestrate ADK on your local machine. To use the ADK, connect it to your existing watsonx Orchestrate environment. Run the following command in PowerShell:
You can follow the next steps of ADK installation by following the steps in the official installation document.
Note: This tutorial runs the watsonx Orchestrate Developer Edition runtime locally, and connects it to the watsonx Orchestrate SaaS instance with the credentials.
To execute this method, create a file called .env at the project folder’s root and add the following values:
This .env file is required to run the server. In the next steps, you will start it with the command orchestrate server start -e .env -l.
Next, you need to set up your ADK with a valid watsonx Orchestrate API key to connect your local environment to watsonx Orchestrate.
Note: The watsonx Orchestrate ADK is compatible with multiple environment types, including IBM Cloud, AWS and on-premises deployments. In this tutorial, we use the on-prem environment and authenticate with an API key through the ADK CLI. The ADK securely manages credentials internally, so no manual environment variable configuration is implemented for this setup.
From your project directory (with the virtual environment activated), run the following command to add your watsonx Orchestrate environment:
Here, service-instance-url is your watsonx Orchestrate instance URL. You can find this information in the same API details tab in Settings in watsonx Orchestrate UI.
Next, activate the environment that you added:
Now, when prompted, enter the watsonx Orchestrate key obtained in step 1. When the environment has been activated, any subsequent commands related to the ADK, such as importing agents, tools or running the server, will be executed in the watsonx Orchestrate environment.
Note: If you want to run everything locally with the Developer Edition, you can activate the default local environment through:
This switches your ADK to the built-in local Orchestrate environment, which is useful for local testing.
In this step, you will create the agent framework template that contains your Vendor Risk Intelligence Agent’s definition, tools and source code. Then, create the folder structure required for local ADK-based development:
You can add the command given here to create the structure of the agent:
Each folder has a specific purpose:
The agents folder contains the YAML file with the instructions, reasoning rules and configuration of the agent model. It determines how the system answers user questions. The tools folder holds a YAML file that describes the tool provided to the agent. The src folder contains the Python implementation for custom tools and business logic.
Start the watsonx Orchestrate server so that it can receive and store your imports before importing tools and agents. Run this command from the project folder’s root:
The next step is agent observability where you analyze how the agent behaves at runtime such as which tools are called, latency, and where errors occur. For this purpose, activate Langfuse observability in the watsonx Orchestrate ADK environment.
In this tutorial, the Langfuse SaaS version is used, which allows you to capture traces without running Langfuse locally.
Make sure that the watsonx Orchestrate server is running (step 6) before configuring Langfuse.
Next, create a Langfuse account at https://cloud.langfuse.com. After signing in, create a new organization and project. From the project settings, copy your project ID, your public key, your secret key and your host URL.
Now set up Langfuse in the watsonx Orchestrate ADK with the command shown after this section and replace the placeholders with your copied values:
Once this command has finished successfully, the Langfuse module will be fully integrated into the watsonx Orchestrate environment. At this stage, all interactions with the agents are automatically recorded on the Langfuse module.
Now that the server is running, you can test the agent.
The next step is to define the Vendor Risk Intelligence Agent. Agents in watsonx Orchestrate ADK are declared with YAML files that describe the agent’s goal, reasoning limitations, model configuration and permitted tools.
Inside the agents directory, create a file named vendor-risk-agent.yaml. This file is the agent’s prompt management layer that guarantees all answers are based on deterministic reasoning as opposed to free-form inference.
The agent definition used in this tutorial is shown next. Copy and paste the following agent definition into your agents/vendor-risk-agent.yaml file. Save the file afterward.
This configuration ensures that the agent behaves predictably across what, why, how and comparison questions. The system prevents hallucinations by forbidding assumptions about financial ratings or risk meanings unless the tool’s output explicitly defines them.
In this step, implement the Python tool that performs the vendor risk assessment. In watsonx Orchestrate ADK, the custom business logic is implemented through Python tools that the agents start to get the results. Rather than relying on fine-tuning a model for domain-specific behavior, this tutorial demonstrates how deterministic, rule-based reasoning can be implemented through Python tools ensuring consistent outcomes.
Inside the src folder, create a file named main.py. This file contains the vendor dataset that we created, the risk evaluation rules and a Python tool named evaluate_all_vendor_risks that provides the logic to the agent.
The risk logic is rule-based. In terms of high‑risk signals, a company can receive a high-risk score because of its poor financial performance, security incidents in the past or regulatory inquiries. A company can receive a medium risk score because of the absence of certifications, single operational incidents or disruptions by weather conditions.
The evaluate_all_vendor_risks function is made as a watsonx Orchestrate tool, making it callable by the agent during execution. The tool returns structured output that includes the final risk level along with the exact reasons that caused the classification.
Note: The main.py file included in this tutorial is the final updated version of the vendor risk evaluation logic. The first sets of code caused erroneous behavior of agents, which were analyzed through Langfuse traces. You can find the Langfuse traces in the screenshots given in the later steps of this tutorial.
The complete implementation of main.py file used in this tutorial is given next. Copy and paste it into src/main.py. After that, save the file.
In this step, the tool and agent must be made available to watsonx Orchestrate. This process is done by importing these components into the activated ADK environment. Before running the import commands, make sure that you are inside the root folder of your project (the same folder that contains the agents, src and tools directories).
First, import the Python tool so that watsonx Orchestrate can register it as an executable capability. From the root of your project directory, run the following command:
This command packages the Python code into a module, registers the evaluate_all_vendor_risks tool, and allows it to be called by agents.
The Vendor Risk Intelligence Agent is then fully integrated with the Python tool in the watsonx Orchestrate environment.
With the local watsonx Orchestrate server running and agent observability enabled, you can now test the agent and analyze its behavior in real time.
Start the chat interface by running the following command:
Now open the watsonx Orchestrate chat UI in your browser. From the agent selector, choose Vendor_Risk_Intelligence_Agent and begin asking questions related to vendor risk.
To verify that the agent is responding correctly (and calling the Python tool deterministically), here are a few sample questions you can use for testing:
As you interact with the agent, open the Langfuse dashboard in your browser. Each user query creates a new trace, includes a session_id and a user ID. These traces record the full execution path of the agent. By selecting a trace, you can analyze:
The complete input and output of the agent
The Python tool invocation
The structured data returned by the tool
The reasoning process followed to reach the final answer
Latency for each step in the process
Session-based conversation flow for multiple questions
This observability helps you to analyze the performance metrics that point out bottlenecks such as errors in assumption, incomplete rules or unexpected agent behavior.
After testing the agent, use Langfuse traces to identify incorrect responses. Langfuse shows each tool invocation, reasoning step and response latency, making it easy to understand why an answer was produced.
In this use case, trace analysis revealed that some incorrect answers occurred when the agent inferred financial meanings or responded without explicitly citing rule-based evidence. To change this action, the agent instructions and corresponding Python code were modified to fix strict rule-based thinking with required tool action.
After reimporting the updated files, the agent was again tested. The new traces ensure that responses were firmly rooted in structured tool output, explanations were deterministic and responses to all descriptive questions were correct.
This observation and refinement process shows how Langfuse helps safe and sound development of agents based on user feedback.
This tutorial has walked you through the creation of a structured and trustworthy AI agent with watsonx Orchestrate that is developed for the creation of enterprise grade agents. With the integration of simple rule-based Python logic and clear agent instructions and reusable tools, you have implemented a use case in vendor risk assessment with minimal code and maximum transparency.
The addition of Langfuse has made it easy to detect agent behavior and identify the issues with reasoning to continuously improve accuracy without guesswork. This approach helps enterprises to optimize agent systems end-to-end, automate complex workflows and deploy AI systems with full transparency.
More importantly, together, watsonx Orchestrate ADK and Langfuse allow enterprises to design, debug and scale complex agentic workflows and AI applications faster, with stronger governance and clearer reasoning and reduced development time.
Build, deploy and manage powerful AI assistants and agents that automate workflows and processes with generative AI.
Build the future of your business with AI solutions that you can trust.
IBM Consulting AI services help reimagine how businesses work with AI for transformation.