LLM agent orchestration refers to the process of managing and coordinating the interactions between a large language model (LLM) and various tools, APIs or processes to perform complex tasks within AI systems. It involves structuring workflows where an AI agent, powered by artificial intelligence, acts as the central decision-maker or reasoning engine, orchestrating its actions based on inputs, context and outputs from external systems. Using an orchestration framework, LLMs can seamlessly integrate with APIs, databases and other AI applications, enabling functionalities such as chatbots and automation tools. Open-source agent frameworks further enhance the adaptability of these systems, making LLMs more effective in real-world scenarios.
Many people misunderstand the difference between LLM orchestration and LLM agent orchestration. The following illustration highlights the key differences:
In this tutorial, you will learn how to build an autonomous agent powered by large language models (LLMs) by using IBM® Granite™ models and LangChain. We’ll explore how agents leverage key components such as memory, planning and action to perform intelligent tasks. You’ll also implement a practical system that processes text from a book, answers queries dynamically and evaluates its performance by using accuracy metrics such as BLEU, precision, recall and F1 score.
The framework presented in figure-1 provides a holistic design for large language model (LLM)-based autonomous agents, emphasizing the interplay between key components: profile, memory, planning and action. Each component represents a critical stage in building an autonomous agent capable of reasoning, decision-making and interacting with dynamic environments.1
1. Profile: defining the agent’s identity
The profile gives the agent its identity by embedding information such as demographics, personality traits and social context. This process ensures that the agent can interact in a personalized way. Profiles can be manually crafted, generated by gen AI models such as IBM Granite models or OpenAI’s GPT (generative pretrained transformer), or aligned with specific datasets to meet task requirements. Leveraging prompt engineering, profiles can be dynamically refined to optimize responses. Additionally, within multiagent orchestration, the profile helps define roles and behaviors, ensuring seamless coordination across AI algorithms and decision-making systems.
2. Memory: storing and using context
Memory helps the agent retain and retrieve past interactions, enabling contextual responses. It can be unified (all data in one place) or hybrid (structured and unstructured). Operations including reading, writing and reflection allow the agent to learn from experience and provide consistent, informed outputs. Well-structured memory enhances multiagent orchestration by ensuring that different agents, including specialized agents designed for a specific task, can share and retrieve relevant data efficiently. In frameworks such as AutoGen and Crew AI, memory plays a crucial role in maintaining continuity within the ecosystem of collaborating agents, ensuring seamless coordination and optimized task execution.
3. Planning: strategizing actions
The planning component lets the agent devise strategies to achieve goals. It can follow predefined steps or adapt dynamically based on feedback from the environment, humans or the LLM itself. By integrating AI algorithms and leveraging a knowledge base, planning can be optimized to improve reasoning efficiency and problem-solving accuracy. In LLM applications, planning plays a crucial role in ensuring natural language understanding and decision-making processes align with the agent's objectives. Additionally, retrieval-augmented techniques enhance the agent's ability to access relevant information dynamically, improving response accuracy. This flexibility ensures that the agent remains effective in changing scenarios, especially in multiagent orchestration, where various agents coordinate their plans to achieve complex objectives while maintaining scalability for handling large and diverse tasks.
4. Action: executing decisions
Actions are the agent’s way of interacting with the world, whether by completing tasks, gathering information or communicating. It uses memory and planning to guide execution, employs tools when needed and adapts its internal state based on results for continuous improvement. Optimizing the action execution algorithm ensures efficiency, especially when integrating GPT-powered reasoning models and gen AI techniques for real-time decision-making.
By combining these components, the framework transforms LLMs into adaptable agents capable of reasoning, learning and performing tasks autonomously. This modular design makes it ideal for applications such as customer service, research assistance and creative problem-solving.
This tutorial demonstrates the creation of a queryable knowledge agent designed to process large text documents (like books) and answer user queries accurately. Using IBM Granite models and LangChain, the agent is built following the principles outlined in the framework for LLM-based autonomous agents. The framework's components align seamlessly with the agent's workflow to ensure adaptability and intelligent responses.
Let's understand how the framework applies in our use case.
Profile: The agent is designed with a "knowledge assistant" profile, focusing on summarization, question answering and reasoning tasks. Its context is personalized to process a specific document (for example, The Adventures of Sherlock Holmes).
Memory: The agent employs hybrid memory by embedding chunks of the book into a FAISS vector store. This ability allows it to retrieve relevant context dynamically during queries. Memory operations such as reading (retrieval) and writing (updating embeddings) ensure that the agent can adapt to new queries over time.
Planning: Query resolution involves single-path reasoning. The agent retrieves relevant chunks of text, generates answers by using IBM’s Granite LLM and evaluates the output for accuracy. Planning without feedback ensures simplicity, while the system’s modularity allows feedback loops to be incorporated in future iterations.
Action: The agent executes query resolution by integrating memory retrieval and LLM processing. It completes tasks such as generating answers, calculating accuracy metrics (BLEU, precision, recall and F1 score) and visualizing results for user interpretation. These outputs reflect the agent’s capability to act intelligently based on reasoning and planning.
You need an IBM Cloud® account to create a watsonx.ai® project.
While you can choose from several tools, this tutorial walks you through how to set up an IBM account by using a Jupyter Notebook.
This step opens a notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This tutorial is also available on GitHub.
To work with the LangChain framework and integrate IBM WatsonxLLM, we need to install some essential libraries. Let’s start by installing the required packages:
Note: If you are using old version of
In the preceding code cell,
This step ensures that your environment is ready for the tasks ahead.
Now that we’ve installed the necessary libraries, let’s import the modules required for this tutorial:
In the preceding code cell,
This step sets up all the tools and modules that we need to process text, create embeddings, store them in a vector database and interact with IBM's WatsonxLLM.
This code sets up credentials for accessing the IBM Watson machine learning (WML) API and ensures that the project ID is correctly configured.
This code initializes the IBM WatsonxLLM for use in the application:
This step prepares the WatsonxLLM for generating responses in the workflow.
To process the text from a document, we need a function that can read and extract its contents. The following function is designed to handle plain text files:
This function,
The entire content of the file is read into a variable named
This function allows us to process the input file (The Adventures of Sherlock Holmes) and extract its content for further operations such as text chunking and embedding. It ensures that the raw text is readily available for analysis.
To efficiently process and index large blocks of text, we need to divide the text into smaller, manageable chunks. The following function handles this task:
The
This function ensures contextual continuity across the chunks. The function utilizes the
It is essential when working with large documents, as language models often have token limitations and cannot process lengthy text directly.
To enable efficient semantic search, we need to convert text chunks into vector embeddings and store them in a searchable index. This step uses FAISS and HuggingFace embeddings to create the vector index, forming the foundation for retrieving relevant information based on queries.
The
It first initializes a HuggingFaceEmbeddings model
The function then uses
The resulting vector store is returned and will be used to find relevant chunks based on user queries, forming the backbone of the agent's search and retrieval process.
This step involves querying the vector index to retrieve relevant information and by using IBM's Granite LLM to generate a refined response. By integrating similarity search and LLM reasoning, the function provides a dynamic and intelligent query resolution process.
The
It first performs a similarity search on the vector index to retrieve the most relevant chunks of text. These chunks, referred to as
The function then constructs a prompt by combining the query and the retrieved context. This prompt is passed to the
Throughout the process, intermediate steps like the agent's
Finally, the function returns a dictionary containing all components, including the thought process, action taken, retrieved observation and the final answer.
This step is critical for transforming raw data retrieval into actionable and meaningful insights by using the LLM's reasoning capabilities.
This step dynamically processes multiple queries, retrieves relevant information and saves the results in a structured format for analysis. The function integrates querying, data structuring and export capabilities.
The
For each query, it uses the
Once all queries are processed, the list of results is converted into a pandas DataFrame. This tabular format allows easy analysis and visualization of the query results. The DataFrame is printed for review and saved as a CSV file for future use.
This step is essential for organizing the output in a user-friendly format, enabling downstream tasks such as accuracy evaluation and visualization.
This step combines all the previous steps into a single workflow to process a text file, answer user queries and save the results in a structured format. The
Let's understand how this workflow executes:
Input a text file: The
Text extraction:
Text chunking: The extracted text is divided into smaller chunks by using the
Create a vector index: The text chunks are converted into embeddings and stored in a
Define queries: A list of sample queries is provided, each designed to retrieve specific information from the text. These queries will be answered by the agent.
Process queries: The
This step integrates all components of the tutorial into a cohesive workflow. It automates the process from text extraction to query resolution, allowing you to test the agent's capabilities and examine the results in a structured format.
To execute the workflow, simply call the
Output
> Entering new AgentExecutor chain... Thought: The query 'What is the plot of 'A Scandal in Bohemia'?' requires context from the book to provide an accurate response. Action: Search FAISS Vector Store Action Input: "What is the plot of 'A Scandal in Bohemia'?" Observation: I. A SCANDAL IN BOHEMIA I. “I was aware of it,” said Holmes dryly. “The circumstances are of great delicacy, and every precaution has to be taken to quench what might grow to be an immense scandal and seriously compromise one of the reigning families of Europe. To speak plainly, the matter implicates the great House of Ormstein, hereditary kings of Bohemia.” “I was also aware of that,” murmured Holmes, settling himself down in his armchair and closing his eyes. Contents I. A Scandal in Bohemia II. The Red-Headed League III. A Case of Identity IV. The Boscombe Valley Mystery V. The Five Orange Pips VI. The Man with the Twisted Lip VII. The Adventure of the Blue Carbuncle VIII. The Adventure of the Speckled Band IX. The Adventure of the Engineer’s Thumb X. The Adventure of the Noble Bachelor XI. The Adventure of the Beryl Coronet XII. The Adventure of the Copper Beeches Thought: Combining retrieved context with the query to generate a detailed answer. /var/folders/4w/smh16qdx6l98q0534hr9v52r0000gn/T/ipykernel_2648/234523588.py:23: LangChainDeprecationWarning: The method `BaseLLM.__call__` was deprecated in langchain-core 0.1.7 and will be removed in 1.0. Use :meth:`~invoke` instead. final_answer = llm(prompt) Final Answer: Step 1: Identify the main characters and their roles. - Sherlock Holmes: The detective who is approached by a client with a delicate matter. - An unnamed client: A representative of the great House of Ormstein, hereditary kings of Bohemia, who seeks Holmes' help to prevent a potential scandal. Step 2: Understand the main issue or conflict. - The main issue is a delicate matter that, if exposed, could lead to a massive scandal and compromise one of the reigning families of Europe, specifically the House of Ormstein. Step 3: Ident > Finished chain. > Entering new AgentExecutor chain... Thought: The query 'Who is Dr. Watson, and what role does he play in the stories?' requires context from the book to provide an accurate response. Action: Search FAISS Vector Store Action Input: "Who is Dr. Watson, and what role does he play in the stories?" Observation: “Sarasate plays at the St. James’s Hall this afternoon,” he remarked. “What do you think, Watson? Could your patients spare you for a few hours?” “I have nothing to do to-day. My practice is never very absorbing.” “Try the settee,” said Holmes, relapsing into his armchair and putting his fingertips together, as was his custom when in judicial moods. “I know, my dear Watson, that you share my love of all that is bizarre and outside the conventions and humdrum routine of everyday life. You have shown your relish for it by the enthusiasm which has prompted you to chronicle, and, if you will excuse my saying so, somewhat to embellish so many of my own little adventures.” “My God! It’s Watson,” said he. He was in a pitiable state of reaction, with every nerve in a twitter. “I say, Watson, what o’clock is it?” “Nearly eleven.” “Of what day?” “Of Friday, June 19th.” “Good heavens! I thought it was Wednesday. It is Wednesday. What d’you want to frighten a chap for?” He sank his face onto his arms and began to sob in a high treble key. “I tell you that it is Friday, man. Your wife has been waiting this two days for you. You should be ashamed of yourself!” Thought: Combining retrieved context with the query to generate a detailed answer. Final Answer: Dr. Watson is a character in the Sherlock Holmes stories, written by Sir Arthur Conan Doyle. He is a former military surgeon who becomes the narrator and chronicler of Holmes' adventures. Watson is a close friend and confidant of Holmes, often accompanying him on cases and providing a more human perspective to the stories. He is known for his enthusiasm for the bizarre and unconventional, as well as his skill in recording the details of their investigations. Watson's role is crucial in presenting the narrative and offering insights into Holmes' character and methods. > Finished chain. Final DataFrame: Thought \ 0 The query 'What is the plot of 'A Scandal in B... 1 The query 'Who is Dr. Watson, and what role do... 2 The query 'Describe the relationship between S... 3 The query 'What methods does Sherlock Holmes u... Action \ 0 Search FAISS Vector Store 1 Search FAISS Vector Store 2 Search FAISS Vector Store 3 Search FAISS Vector Store Action Input \ 0 What is the plot of 'A Scandal in Bohemia'? 1 Who is Dr. Watson, and what role does he play ... 2 Describe the relationship between Sherlock Hol... 3 What methods does Sherlock Holmes use to solve... Observation \ 0 I. A SCANDAL IN BOHEMIA\n\n\nI.\n“I was aware ... 1 “Sarasate plays at the St. James’s Hall this a... 2 “You have really got it!” he cried, grasping S... 3 to learn of the case was told me by Sherlock H... Final Answer 0 Step 1: Identify the main characters and their... 1 Dr. Watson is a character in the Sherlock Holm... 2 Sherlock Holmes and Irene Adler have a profess... 3 Sherlock Holmes uses a variety of methods to s... Output saved to output.csv
After running the
Additionally, the results for all queries have been structured into a DataFrame and saved as
In this process, we combined text retrieval with LLM reasoning to answer complex queries about the book. The agent dynamically retrieved relevant information, used the context to generate precise answers and organized the output in a structured format for easy analysis.
With the output.csv file created, we will now proceed to visualize the query results and their associated accuracy metrics, providing deeper insights into the agent's performance.
In the following code cell, we load the saved query results from the
OUTPUT
Thought \ 0 The query 'What is the plot of 'A Scandal in B... 1 The query 'Who is Dr. Watson, and what role do... 2 The query 'Describe the relationship between S... 3 The query 'What methods does Sherlock Holmes u... Action \ 0 Search FAISS Vector Store 1 Search FAISS Vector Store 2 Search FAISS Vector Store 3 Search FAISS Vector Store Action Input \ 0 What is the plot of 'A Scandal in Bohemia'? 1 Who is Dr. Watson, and what role does he play ... 2 Describe the relationship between Sherlock Hol... 3 What methods does Sherlock Holmes use to solve... Observation \ 0 I. A SCANDAL IN BOHEMIA\n\n\nI.\n“I was aware ... 1 “Sarasate plays at the St. James’s Hall this a... 2 “You have really got it!” he cried, grasping S... 3 to learn of the case was told me by Sherlock H... Final Answer 0 Step 1: Identify the main characters and their... 1 Dr. Watson is a character in the Sherlock Holm... 2 Sherlock Holmes and Irene Adler have a profess... 3 Sherlock Holmes uses a variety of methods to s...
In this code, the DataFrame includes key components such as
To create visualizations of the query results, we import the necessary libraries:
Important note: If you encounter the error
This code creates a horizontal bar chart to compare the lengths of observations (retrieved context) and answers (generated responses) for each query. This visualization provides insight into how much context the agent uses compared to the length of the generated answers.
This function,
It calculates the character lengths of both observations and answers, adding them as new columns (
The bar chart is color-coded to differentiate between observation and answer lengths, and includes labels, a legend and a title for clarity.
This visualization helps analyze the balance between the size of the retrieved context and the detail in the generated response, offering insights into how the agent processes and responds to queries.
This step visualizes how much of the total text processed by the agent is used in observations (retrieved context) compared to the remaining text. A pie chart is created to provide an intuitive representation of the proportion.
The
This data is visualized in a pie chart, with clear labels for
This visualization provides a high-level overview of how much text the agent uses as context during query processing, offering insights into the efficiency and focus of the retrieval process.
This code generates two word clouds to visually represent the most frequently occurring words in the
This code generates two word clouds to visually represent the most frequently occurring words in the
To create a side-by-side visualization, subplots are used: the first subplot displays the word cloud for
In this section, we evaluate the agent's performance using multiple accuracy metrics:
Before we begin the tests, we import the necessary libraries for accuracy evaluation.
These libraries include tools for keyword matching, BLEU score calculation, precision and recall evaluation. Ensure that you have installed these libraries in your environment to avoid import errors.
This test evaluates how well the generated answers include the keywords from the queries. It uses
This test measures how closely the generated answers match the retrieved observations.
Precision and recall are calculated to evaluate the relevance and completeness of the answers. Precision measures the proportion of retrieved words in the answer that are relevant, while recall measures the proportion of relevant words in the observation that appear in the answer.
These metrics are appended to the DataFrame under the
The F1 score combines precision and recall into a single metric, providing a balanced evaluation of relevance and completeness. The formula for F1 score is:
The calculated
Finally, a summary function consolidates all the metrics to provide an overview of the agent's performance. It calculates the total number of queries, the count and percentage of accurate responses and the average BLEU and F1 scores.
OUTPUT
Total Entries: 4 Accurate Entries: 4 (100.00%) Average BLEU Score: 0.04 Average F1 Score: 0.24
These accuracy tests offer a detailed evaluation of the agent’s ability to generate relevant and accurate responses. Each test focuses on a specific aspect, from keyword inclusion to text similarity and response completeness. The summary consolidates these metrics to give an overall performance snapshot.
This tutorial guided you through building an autonomous agent powered by IBM’s Granite LLM and LangChain. Starting from text extraction to vectorization and query resolution, we covered the entire process of designing and implementing a functional LLM-based agent. Key steps included memory management with vector stores, query processing and generating responses by using Granite.
We evaluated the agent’s performance by using accuracy metrics such as keyword matching, BLEU scores, precision, recall and F1 scores. Visualizations such as bar charts, pie charts and word clouds provided additional insights into the agent’s behavior and effectiveness.
By completing this tutorial, you’ve learned how to design, test and visualize an LLM agent's performance. This foundation can be extended to tackle more complex datasets, improve accuracy and explore advanced features such as multiagent systems.
Build, deploy and manage powerful AI assistants and agents that automate workflows and processes with generative AI.
Build the future of your business with AI solutions that you can trust.
IBM Consulting AI services help reimagine how businesses work with AI for transformation.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com, openliberty.io