LangChain 智能体式 RAG 教程使用 Granite

作者

AI Engineer, Developer Advocate

IBM

什么是智能体式 RAG？

在本教程中，您将使用 IBM Granite-3.0-8B-Instruct 模型（现已在 watsonx.ai 上提供），创建一个 LangChain 智能体式 RAG 系统，该系统能够利用外部信息回答关于 2024 年美国网球公开赛的复杂查询。

智能体式 RAG 概述

什么是 RAG？

检索增强生成 (RAG) 是自然语言处理 (NLP) 中的一种技术，它充分利用信息检索和生成模型来生成更准确、相关和情境感知的回复。在传统的语言生成任务中，大型语言模型 (LLM)（例如 Meta 的 Llama 模型或 IBM 的 Granite 模型）用于根据输入提示构建回复。这些大型语言模型的常见现实用例是聊天机器人。当模型缺少知识库中最新的相关信息时，RAG 是一个强大的工具。

什么是 AI 智能体？

智能体式 RAG 系统的核心是人工智能 (AI) 智能体。AI 智能体是指一个系统或程序，它能够通过设计其工作流和利用可用工具，代表用户或其他系统自主执行任务智能体技术实现了在后端使用工具，从各种数据源获取最新信息，自主优化工作流和创建子任务，以解决复杂任务。这些外部工具可能包括外部数据集、搜索引擎、API 以及其他智能体。智能体会逐步实时重新评估其行动计划并自我纠正。

智能体式 RAG 与传统 RAG

智能体式 RAG 框架功能强大，因为它们可以包含多种工具。在传统的 RAG 应用程序中，LLM 会提供一个矢量数据库以供在形成回复时参考。相比之下，智能体式 AI 应用程序并不局限于仅执行数据检索的文档智能体。RAG 智能体还可以使用工具来完成数学计算、撰写电子邮件、执行数据分析等任务。这些工具可以补充智能体的决策过程。AI 智能体在多步推理中具有情境感知能力，可以确定何时使用适当的工具。

AI 智能体或称智能代理也可以在多智能体系统中协同工作，其表现往往优于单一智能体。这种可扩展性和适应性是智能体式 RAG 智能体与传统 RAG 管道的区别所在。

前提条件

您需要一个 IBM® Cloud 帐户才能创建 watsonx.ai 项目。

步骤

第 1 步：设置环境

虽然您可以选择多种工具，本教程将引导您如何设置 IBM 帐户以使用 Jupyter Notebook。

登录 watsonx.ai使用您的 IBM Cloud 帐户。
创建 watsonx.ai 项目。

您可以从项目内部获取项目 ID。点击管理选项卡。然后，从常规页面的详细信息部分复制项目 ID。您需要此 ID 来完成本教程。
创建一个 Jupyter Notebook。

此步骤将打开一个 Notebook 环境，您可以在其中复制本教程中的代码。或者，您可以将此笔记本下载到本地系统并将其作为资产上传到您的 watsonx.ai 项目。要查看更多 Granite 教程，请访问 IBM Granite 社区。可以在 GitHub 上找到这个 Jupyter 笔记本以及使用的数据集。

第 2 步：设置 watsonx.ai 运行时实例和 API 密钥

创建一个 watsonx.ai 运行时服务实例（选择适当的区域并选择精简计划，这是一个免费实例）。
生成 API 密钥。
将 watsonx.ai 运行时服务实例与您在 watsonx.ai 中创建的项目相关联。

第 3 步：安装并导入相关库，并设置您的凭据

本教程需要一些依赖库。请确保导入以下内容；如果尚未安装，可以通过快速的 pip 安装来解决。

用于构建智能体 AI 系统的常见 Python 框架包括 LangChain、LangGraph 和 LlamaIndex。在本教程中，我们将使用 LangChain。

# imports

import os
from dotenv import load_dotenv
from langchain_ibm import WatsonxEmbeddings, WatsonxLLM
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.prompts import PromptTemplate
from langchain.tools import tool
from langchain.tools.render import render_text_description_and_args
from langchain.agents.output_parsers import JSONAgentOutputParser
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.agents import AgentExecutor
from langchain.memory import ConversationBufferMemory
from langchain_core.runnables import RunnablePassthrough
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes

设置您的凭据。请将您的 PROJECT_ID 和 APIKEY 存储在与此笔记本同一级目录中的单独 .env 文件中。

load_dotenv(os.getcwd()+"/.env", override=True)
credentials = {
"url": "https://us-south.ml.cloud.ibm.com",
"apikey": os.getenv("WATSONX_APIKEY", ""),
}
project_id = os.getenv("PROJECT_ID", "")

第 4 步：无需工具即可初始化基本智能体

此步骤很重要，因为它将生成一个清晰的示例，说明有和没有外部数据源的智能体行为。让我们先设定参数。

可用的模型参数可以在这里找到。我们尝试了各种模型参数，包括温度、最小和最大新生成令牌数量以及停止序列。在 watsonx 文档中了解有关模型参数及其含义的更多信息。在这里设置我们的 stop_sequences 以限制智能体幻觉非常重要。这告诉智能体在遇到特定子字符串时停止产生进一步的输出。在我们的案例中，我们希望智能体在达到一个观察点时结束回复，并且不生成虚假的人类回复。因此，我们设置的停止序列之一是“Human:”，另一个是“Observation”，以便在生成最终回复后停止。

对于本教程，我们建议使用 IBM 的 Granite-3.0-8B-Instruct模型作为 LLM 来实现类似的结果。您可以自由使用您选择的任何 AI 模型。通过 watsonx 提供的基础模型可在此处查看。这些模型在大型语言模型 (LLM) 应用中的作用是作为推理引擎，决定采取哪些行动。

llm = WatsonxLLM(
    model_id="ibm/granite-3-8b-instruct",
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params={
        GenParams.DECODING_METHOD: "greedy",
        GenParams.TEMPERATURE: 0,
        GenParams.MIN_NEW_TOKENS: 5,
        GenParams.MAX_NEW_TOKENS: 250,
        GenParams.STOP_SEQUENCES: ["Human:", "Observation"],
    },
)

我们将设置一个提示词模板，以备您需要提出多个问题。

template = "Answer the {query} accurately. If you do not know the answer, simply say you do not know."
prompt = PromptTemplate.from_template(template)

现在我们可以用提示模板和 LLM 来设置一个链了。这使生成模型能够生成回复。

agent = prompt | llm

让我们测试一下，看看我们的智能体如何响应基本查询。

agent.invoke({"query": 'What sport is played at the US Open?'})

Output: ' Do not try to make up an answer.\n\nThe sport played at the US Open is tennis.'

智能体成功地对基本查询做出了正确回答。在本教程的后续步骤中，我们将为智能体创建一个 RAG 工具，以便访问有关 IBM 参与 2024 年美国网球公开赛的相关信息。正如我们所介绍的，传统的 LLM 无法自行获取最新信息。让我们来验证一下。

agent.invoke({"query": 'Where was the 2024 US Open Tennis Championship?'})

Output: ' Do not make up an answer.\n\nThe 2024 US Open Tennis Championship has not been officially announced yet, so the location is not confirmed. Therefore, I do not know the answer to this question.'

显然，LLM 无法为我们提供相关信息。该模型使用的训练数据包含 2024 年美国网球公开赛之前的信息，如果没有适当的工具，智能体无法获得这些信息。

第 5 步. 建立知识库和检索器

创建知识库的第一步是列出我们将从中提取内容的 URL。本案例中，数据源自在线内容。这些内容总结了 IBM 参与 2024 年美国网球公开赛的情况。相关的 URL 已在 urls 列表中设定。

urls = ['https://www.ibm.com/cn-zh/case-studies/us-open',
        'https://www.ibm.com/cn-zh/sports/usopen',
        'https://newsroom.ibm.com/US-Open-AI-Tennis-Fan-Engagement',
        'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms']

接下来，使用 LangChain WebBaseLoader 为我们列出的 URL 加载文档。我们还将打印一份样本文档，以查看其加载情况。

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
docs_list[0]

Output: Document(metadata={'source': 'https://www.ibm.com/cn-zh/case-studies/us-open', 'title': 'U.S. Open | IBM', 'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en'}, page_content='\n\n\n\n\n\n\n\n\n\nU.S. Open | IBM\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\n\n\n\n\nCase Studies\n\n\n\nUS Open \n\n\n\n \n\n\n\n \n Acing the US Open digital experience\n\n\n\n\n\n\n \n\n\n \n\n \n\n\n \n \n AI models built with watsonx transform data into insight\n \n\n\n\n\n \n\n\n \n\n\nGet the latest AI and tech insights\n\n\nLearn More\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFor two weeks at the end of summer, nearly one million people make the journey to Flushing, New York, to watch the best tennis players in the world compete in the US Open Tennis Championships...')

为了将这些文档中的数据拆分为 LLM 可以处理的块，我们可以使用文本拆分器，例如 RecursiveCharacterTextSplitter 。此文本拆分器按以下字符拆分内容：["\n\n", "\n", " ", ""]。这样做的目的是将文本保持在同一块中，例如段落、句子和单词在一起。

一旦文本分割器被启动，我们就可以将其应用到我们的 docs_list 中。

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=250, chunk_overlap=0)
doc_splits = text_splitter.split_documents(docs_list)

我们正在使用的嵌入模型是通过 watsonx.ai 嵌入服务创建的 IBM© Slate 模型。让我们初始化它。

embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id,
)

为了存储我们的嵌入文档，我们将使用 Chroma DB，这是一个开源的向量数据库。

vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="agentic-rag-chroma",
    embedding=embeddings,
)

要访问矢量存储中的信息，我们必须设置一个检索器。

retriever = vectorstore.as_retriever()

第 6 步：定义智能体的 RAG 工具

让我们定义智能体将使用的 get_IBM_US_Open_context() 工具。该工具的唯一参数是用户查询。工具描述也已注明，以便告知智能体该工具的使用方式。这样，智能体就知道何时调用此工具。如果 IBM 参与 2024 年美国公开赛，智能体式 RAG 系统可以使用此工具将用户查询路由到矢量存储。

@tool
def get_IBM_US_Open_context(question: str):
    """Get context about IBM's involvement in the 2024 US Open Tennis Championship."""
    context = retriever.invoke(question)
    return context

tools = [get_IBM_US_Open_context]

第 7 步：建立提示模板

接下来，我们将设置一个新的提示模板来提出多个问题。此模板较为复杂。它被称为结构化聊天提示，可用于创建具有多种可用工具的智能体。在我们的案例中，我们使用的工具是在步骤 6 中定义的。结构化聊天提示将由 system_prompt、human_prompt 和我们的 RAG 工具组成。

首先，我们将设置 system_prompt。该提示指示智能体打印其“思维过程”，其中包括智能体的子任务、所使用的工具和最终输出。这有助于我们深入了解智能体的功能调用提示还指示智能体以 JSON Blob 格式返回其回复。

system_prompt = """Respond to the human as helpfully and accurately as possible. You have access to the following tools: {tools}
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or {tool_names}
Provide only ONE action per $JSON_BLOB, as shown:"
```
{{
"action": $TOOL_NAME,
"action_input": $INPUT
}}
```
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
"action": "Final Answer",
"action_input": "Final response to human"
}}
Begin! Reminder to ALWAYS respond with a valid json blob of a single action.
Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation"""

在以下代码中，我们正在设定 human_prompt。该提示指示智能体在显示用户输入的同时，展示其作为 agent_scratchpad 一部分所采取的中间步骤。

human_prompt = """{input}
{agent_scratchpad}
(reminder to always respond in a JSON blob)"""

接下来，我们在提示模板中确定我们新定义的各个提示的排列顺序。我们创建这个新模板，其结构包括：首先是 system_prompt，然后是（如果有的话）从智能体记忆中收集的消息列表，最后是 human_prompt，其中包含用户输入和 agent_scratchpad。

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", human_prompt),
    ]
)

现在，让我们通过使用部分提示模板添加工具名称、描述和参数来完成我们的提示模板。这使智能体能够访问与每个工具相关的信息，包括其用例。这也意味着我们可以在不更改整个提示模板的情况下添加和删除工具。

prompt = prompt.partial(
tools=render_text_description_and_args(list(tools)),
tool_names=", ".join([t.name for t in tools]),
)

第 8 步：设置智能体的内存和链

AI 智能体的一个重要特征是其记忆。智能体能够将过去的对话内容和发现结果存储在记忆中，以便在后续回复中提升回答的准确性和相关性。在我们的示例中，我们将使用 LangChain 的 ConversationBufferMemory() 作为存储手段。

memory = ConversationBufferMemory()

现在我们可以用智能体的暂存器、内存、提示和 LLM 建立一个链。AgentExecutor 类用于执行智能体。它采用智能体、其工具、错误处理方法、详细参数和内存。

chain = ( RunnablePassthrough.assign(
    agent_scratchpad=lambda x: format_log_to_str(x["intermediate_steps"]),
    chat_history=lambda x: memory.chat_memory.messages,
    )
    | prompt | llm | JSONAgentOutputParser())

agent_executor = AgentExecutor(agent=chain, tools=tools, handle_parsing_errors=True, verbose=True, memory=memory)

第 9 步：使用智能体式 RAG 系统生成回复

我们现在可以向智能体提出问题了。回想一下该智能体之前无法向我们提供有关 2024 年美国网球公开赛的信息。现在智能体可以使用其 RAG 工具，让我们再次尝试提出相同的问题。

agent_executor.invoke({"input": 'Where was the 2024 US Open Tennis Championship?'})

Output: (some description and page content fields were shortened to succinctly display results)

> Entering new AgentExecutor chain...

Thought: The human is asking about the location of the 2024 US Open Tennis Championship. I need to find out where it was held.
Action:
```
{
"action": "get_IBM_US_Open_context",
"action_input": "Where was the 2024 US Open Tennis Championship held?"
}
```
Observation[Document(metadata={'description': "IBM and the United States Tennis Association (USTA) announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world.", 'language': 'en-us', 'source': 'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms', 'title': 'IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms'}, page_content="IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms\n-New Match Report summaries offer...")]

Action:
```
{
"action": "Final Answer",
"action_input": "The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York."
}
```
Observation

> Finished chain.

{'input': 'Where was the 2024 US Open Tennis Championship?',
'history': '',
'output': 'The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.'}

太棒了！根据用户的查询，智能体使用其可用的 RAG 工具返回 2024 年美国公开赛的地点。我们甚至可以看到智能体从中检索其信息的确切文档。现在，让我们尝试一个稍微复杂一些的问题查询。这次，询问将是关于 IBM 参与 2024 年美国公开赛的情况。

agent_executor.invoke({"input": 'How did IBM use watsonx at the 2024 US Open Tennis Championship?'})

Output: (somedescription andpage content fields were shortened to succinctly display results)

> Entering new AgentExecutor chain...
```
{
    "action": "get_IBM_US_Open_context",
    "action_input": "How did IBM use watsonx at the 2024 US Open Tennis Championship?"
}
```
Observation[Document(metadata={'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en', 'source': 'https://www.ibm.com/cn-zh/case-studies/us-open', 'title': 'U.S. Open | IBM'}, page_content='The US Open is a sprawling, two-week tournament, with hundreds of matches played on 22 different courts. Keeping up with all the action is a challenge, both for tennis fans and the USTA editorial team covering the event...)]

Action:
```
{
    "action": "Final Answer",
    "action_input": "IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI     Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team."
}
```
Observation
> Finished chain.

{'input': 'How did IBM use watsonx at the 2024 US Open Tennis Championship?',
'history': 'Human: Where was the 2024 US Open Tennis Championship?\nAI: The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.',
'output': 'IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team.'}

同样地，智能体成功检索到了与用户查询相关的信息。此外，从历史输出可以看出，智能体在获取新信息和经历新交互的过程中，能够成功地更新其知识库。

现在，让我们测试智能体是否能够判断在回答用户查询时是否无需调用工具。我们可以通过向 RAG 智能体提出一个与美国网球公开赛无关的问题来测试这一点。

agent_executor.invoke({"input": 'What is the capital of France?'})

Output:

> Entering new AgentExecutor chain...

{
"action": "Final Answer",
"action_input": "The capital of France is Paris."
}

Observation
> Finished chain.

{'input': 'What is the capital of France?',
'history': 'Human: Where was the 2024 US Open Tennis Championship?\nAI: The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.\nHuman: How did IBM use watsonx at the 2024 US Open Tennis Championship?\nAI: IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team.',
'output': 'The capital of France is Paris.'}

正如在 AgentExecutor 链中所见，智能体识别出它在自身知识库中已有足够信息，因此无需使用工具即可回答该问题。

摘要

在本教程中，您使用 LangChain 和 watsonx 在 Python 中创建了一个 RAG 智能体。所使用的大型语言模型是 IBM Granite-3.0-8B-Instruct模型。示例输出非常重要，它展示了这一生成式 AI 进展的意义。该 AI 智能体成功通过 get_IBM_US_Open_context 工具检索了相关信息，并在每次交互中更新其记忆，输出了恰当的回复。同时，智能体能够判断在具体任务中是否需要调用工具。当智能体拥有足够信息回答输入问题时，它不会使用任何工具来进行问答。

想了解更多关于 AI 智能体的内容，欢迎查看我们的 AI 智能体教程，该教程演示了如何使用 NASA 的开源 API 和日期工具返回当天的天文图片。

2025 年主要战略技术趋势：AI智能体

下载这份 Gartner 研究报告，了解agentic AI 对 IT 领导者的潜在机遇和风险，以及如何为这一新一轮 AI 创新做好准备。

在 watsonx.ai 中使用 Granite-3.0-8B-Instruct 构建 LangChain 代理 RAG 系统

作者