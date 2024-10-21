In this tutorial, you will create a LangChain agentic RAG system using the IBM Granite-3.0-8B-Instruct model now available on watsonx.ai that can answer complex queries about the 2024 US Open using external information.
RAG is a technique in natural language processing (NLP) that combines information retrieval and generative models to produce more accurate, relevant and contextually aware responses. In traditional language generation tasks, large language models (LLMs) such as Meta's Llama Models or IBM’s Granite Models are used to construct responses based on an input prompt. Common real-world use cases of these large language models are chatbots. When models are missing relevant information that is up to date in their knowledge base, RAG is a powerful tool.
At the core of agentic RAG systems are artificial intelligence (AI) agents. An AI agent refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and using available tools. Agentic technology implements tool use on the backend to obtain up-to-date information from various data sources, optimize workflow and create subtasks autonomously to solve complex tasks. These external tools can include external data sets, search engines, APIs and even other agents. Step-by-step, the agent reassesses its plan of action in real time and self-corrects.
Agentic RAG frameworks are powerful as they can encompass more than just one tool. In traditional RAG applications, the LLM is provided with a vector database to reference when forming its responses. In contrast, agentic RAG implementations are not restricted to document agents that only perform data retrieval. RAG agents can also have tools for tasks such as solving mathematical calculations, writing emails, performing data analysis and more. These tools can be supplemental to the agent's decision-making process. AI agents are context-aware in their multistep reasoning and can determine when to use appropriate tools.
AI agents, or intelligent agents, can also work collaboratively in multiagent systems, which tend to outperform singular agents. This scalability and adaptability is what sets apart agentic RAG agents from traditional RAG pipelines.
You need an IBM Cloud® account to create a watsonx.ai™ project.
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.
Log in to watsonx.ai using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.
This step will open a Notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This Jupyter Notebook along with the datasets used can be found on GitHub.
Create a Watson Machine Learning service instance (select your appropriate region and choose the Lite plan, which is a free instance).
Generate an API Key in WML.
Associate the WML service to the project that you created in watsonx.ai.
We'll need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this with a quick pip installation.
Common Python frameworks for building agentic RAG systems include LangChain and LlamaIndex. In this tutorial, we will be using LangChain.
Set up your credentials. Please store your PROJECT_ID and APIKEY in a separate .env file in the same level of your directory as this notebook.
This step is important as it will produce a clear example of an agent's behavior with and without external data sources. Let's start by setting our parameters.
The model parameters available can be found here. We experimented with various model parameters, including temperature, minimum and maximum new tokens and stop sequences. Learn more about model parameters and what they mean in the watsonx docs. It is important to set our stop_sequences here in order to limit agent hallucinations. This tells the agent to stop producing further output upon encountering particular substrings. In our case, we want the agent to end its response upon reaching an observation and to not hallucinate a human response. Hence, one of our stop_sequences is 'Human:' and another is Observation to halt once a final response is produced.
For this tutorial, we suggest using IBM's Granite-3.0-8B-Instruct model as the LLM to achieve similar results. You are free to use any AI model of your choice. The foundation models available through watsonx can be found here. The purpose of these models in LLM applications is to serve as the reasoning engine that decides which actions to take.
We'll set up a prompt template in case you want to ask multiple questions.
And now we can set up a chain with our prompt and our LLM. This allows the generative model to produce a response.
Let's test to see how our agent responds to a basic query.
Output: ' Do not try to make up an answer.\n\nThe sport played at the US Open is tennis.'
The agent successfully responded to the basic query with the correct answer. In the next step of this tutorial, we will be creating a RAG tool for the agent to access relevant information about IBM's involvement in the 2024 US Open. As we have covered, traditional LLMs cannot obtain current information on their own. Let's verify this.
Output: ' Do not make up an answer.\n\nThe 2024 US Open Tennis Championship has not been officially announced yet, so the location is not confirmed. Therefore, I do not know the answer to this question.'
Evidently, the LLM is unable to provide us with the relevant information. The training data used for this model contained information prior to the 2024 US Open and without the appropriate tools, the agent does not have access to this information.
The first step in creating the knowledge base is listing the URLs we will be extracting content from. In this case, our data source will be collected from our online content summarizing IBM’s involvement in the 2024 US Open. The relevant URLs are established in the
Next, load the documents using LangChain
Output: Document(metadata={'source': 'https://www.ibm.com/case-studies/us-open', 'title': 'U.S. Open | IBM', 'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en'}, page_content='\n\n\n\n\n\n\n\n\n\nU.S. Open | IBM\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\n\n\n\n\nCase Studies\n\n\n\nUS Open \n\n\n\n \n\n\n\n \n Acing the US Open digital experience\n\n\n\n\n\n\n \n\n\n \n\n \n\n\n \n \n AI models built with watsonx transform data into insight\n \n\n\n\n\n \n\n\n \n\n\nGet the latest AI and tech insights\n\n\nLearn More\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFor two weeks at the end of summer, nearly one million people make the journey to Flushing, New York, to watch the best tennis players in the world compete in the US Open Tennis Championships...')
In order to split the data in these documents to chunks that can be processed by the LLM, we can use a text splitter such as
Once the text splitter is initiated, we can apply it to our
The embedding model that we are using is an IBM Slate™ model through the watsonx.ai embeddings service. Let's initialize it.
In order to store our embedded documents, we will use Chroma DB, an open source vector store.
To access information in the vector store, we must set up a retriever.
Let's define the
Next, we will set up a new prompt template to ask multiple questions. This template is more complex. It is referred to as a structured chat prompt and can be used for creating agents that have multiple tools available. In our case, the tool we are using was defined in Step 6. The structured chat prompt will be made up of a
First, we will set up the
In the following code, we are establishing the
Next, we establish the order of our newly defined prompts in the prompt template. We create this new template to feature the
Now, lets finalize our prompt template by adding the tool names, descriptions and arguments using a partial prompt template. This allows the agent to access the information pertaining to each tool including the intended use cases. This also means we can add and remove tools without altering our entire prompt template.
An important feature of AI agents is their memory. Agents are able to store past conversations and past findings in their memory to improve the accuracy and relevance of their responses going forward. In our case, we will use LangChain's
And now we can set up a chain with our agent's scratchpad, memory, prompt and the LLM. The
We are now able to ask the agent questions. Recall the agent's previous inability to provide us with information pertaining to the 2024 US Open. Now that the agent has its RAG tool available to use, let's try asking the same questions again.
Output: (some
> Entering new AgentExecutor chain...
Thought: The human is asking about the location of the 2024 US Open Tennis Championship. I need to find out where it was held.
Action:
```
{
"action": "get_IBM_US_Open_context",
"action_input": "Where was the 2024 US Open Tennis Championship held?"
}
```
Observation[Document(metadata={'description': "IBM and the United States Tennis Association (USTA) announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world.", 'language': 'en-us', 'source': 'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms', 'title': 'IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms'}, page_content="IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms\n-New Match Report summaries offer...")]
Action:
```
{
"action": "Final Answer",
"action_input": "The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York."
}
```
Observation
> Finished chain.
{'input': 'Where was the 2024 US Open Tennis Championship?',
'history': '',
'output': 'The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.'}
Great! The agent used its available RAG tool to return the location of the 2024 US Open, per the user's query. We even get to see the exact document that the agent is retrieving its information from. Now, let's try a slightly more complex question query. This time, the query will be about IBM's involvement in the 2024 US Open.
Output: (some
> Entering new AgentExecutor chain...
```
{
"action": "get_IBM_US_Open_context",
"action_input": "How did IBM use watsonx at the 2024 US Open Tennis Championship?"
}
```
Observation[Document(metadata={'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en', 'source': 'https://www.ibm.com/case-studies/us-open', 'title': 'U.S. Open | IBM'}, page_content='The US Open is a sprawling, two-week tournament, with hundreds of matches played on 22 different courts. Keeping up with all the action is a challenge, both for tennis fans and the USTA editorial team covering the event...)]
Action:
```
{
"action": "Final Answer",
"action_input": "IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team."
}
```
Observation
> Finished chain.
{'input': 'How did IBM use watsonx at the 2024 US Open Tennis Championship?',
'history': 'Human: Where was the 2024 US Open Tennis Championship?\nAI: The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.',
'output': 'IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team.'}
Again, the agent was able to successfully retrieve the relevant information pertaining to the user query. Additionally, the agent is successfully updating its knowledge base as it learns new information and experiences new interactions as seen by the history output.
Now, let's test if the agent can decipher when tool calling is not necessary to answer the user query. We can test this by asking the RAG agent a question that is not about the US Open.
Output:
> Entering new AgentExecutor chain...
{
"action": "Final Answer",
"action_input": "The capital of France is Paris."
}
Observation
> Finished chain.
{'input': 'What is the capital of France?',
'history': 'Human: Where was the 2024 US Open Tennis Championship?\nAI: The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.\nHuman: How did IBM use watsonx at the 2024 US Open Tennis Championship?\nAI: IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team.',
'output': 'The capital of France is Paris.'}
As seen in the AgentExecutor chain, the agent recognized that it had the information in its knowledge base to answer this question without using its tools.
In this tutorial, you created a RAG agent using LangChain in python with watsonx. The LLM you worked with was the IBM Granite-3.0-8B-Instruct model. The sample output is important as it shows the significance of this generative AI advancement. The AI agent was successfully able to retrieve relevant information via the
For more AI agent content, we encourage you to check out our AI agent tutorial that returns today's Astronomy Picture of the Day using NASA's open source API and a date tool.
