LLM 智能体编排：分步指南

作者

AI Advocate | Technical Content Author

LLM 智能体编排是指管理和协调大型语言模型 (LLM) 与各种工具、API 或流程之间的交互，以执行 AI 系统中的复杂任务的过程。它涉及构建工作流，其中由人工智能驱动的 AI 智能体充当中央决策者或推理引擎，根据来自外部系统的输入、上下文和输出来协调动作。使用编排框架，LLM 可以与 API、数据库和其他 AI 应用程序无缝集成，从而实现聊天机器人和自动化工具等功能。开源框架进一步增强了这些系统的适应性，使得 LLM 在现实场景中更有效。

许多人误解了 LLM 编排和 LLM 智能体编排之间的区别。下图突出显示了主要区别：

LLM 编排和 LLM 智能体编排之间的主要区别

在本教程中，您将学习如何使用 IBM Granite 模型和 LangChain，构建一个由大型语言模型（LLM）驱动的自主智能体。我们将深入了解智能体如何充分利用记忆、规划和行动等组件来执行智能任务。您还将实施一个实用系统，该系统可处理书本中的文本，动态回答查询，并使用 BLEU、精确度、召回率和 F1 分数等准确性指标评估其性能。

基于 LLM 的自主智能体框架

图 1 所示框架为基于大型语言模型（LLM）的自主智能体提供了整体设计，强调了配置文件、记忆、规划和行动等关键组件之间的相互作用。每个组件都代表着构建能够推理、决策和与动态环境交互的自主智能体的关键阶段。¹

基于 LLM 的自主智能体框架

1. 配置文件：定义智能体的身份

配置文件通过嵌入人口统计信息、个性特征及社会背景等信息，为智能体赋予独特身份。此过程确保智能体能够以个性化的方式进行交互。配置文件可以手动制作，由 IBM Granite 模型或 OpenAI 的 GPT（生成式预训练转换器）等 AI 模型生成，也可以与特定数据集对齐以满足任务要求。利用提示工程，可以动态完善配置文件，优化响应。此外，在多智能体编排中，配置文件有助于定义角色和行为，确保跨 AI 算法和决策系统之间的无缝协调。

2. 记忆：存储和使用上下文

记忆帮助智能体保留和检索过去的交互信息，从而实现上下文响应。记忆可以是统一的（所有数据集中在一处），也可以是混合的（结构化和非结构化）。操作问题包括阅读、写作和反思，允许智能体从经验中学习，并提供一致、有根据的输出。结构良好的记忆通过确保不同类型的智能体（包括为特定任务设计的专用智能体）能够高效地分享和检索相关数据，从而提升多智能体协同控制的效能。在 AutoGen 和Crew AI 等框架中，记忆在维持协作式智能体生态系统的连续性、确保无缝协调和优化任务执行方面发挥着至关重要的作用。

3. 规划：制定操作策略

通过规划组件，智能体可以制定实现目标的策略。它可以遵循预定义的步骤，也可以根据环境、人类或 LLM 本身的反馈进行动态调整。通过集成 AI 算法和利用知识库，可以优化规划以提高推理效率和解决问题的准确性。在 LLM 应用程序中，规划在确保自然语言理解和决策过程与智能体目标保持一致方面起着至关重要的作用。此外，检索增强技术增强了智能体动态访问相关信息的能力，从而提高了响应准确性。这种灵活性可确保智能体在不断变化的场景中保持高效，尤其是在多智能体编排中，其中各个智能体协调计划以实现复杂目标，同时保持可处理大型和多样化任务的可扩展性。

4. 行动：执行决策

行动是智能体与世界交互的方式，无论是通过完成任务、收集信息还是通信。行动利用记忆和规划来指导执行，在需要时使用工具，并根据结果调整内部状态，以持续改进。优化行动执行算法可确保效率，尤其是在集成基于 GPT 的推理模型和用于实时决策的生成式 AI 技术时。

通过组合这些组件，该框架将 LLM 转变为能够自主推理、学习和执行任务的适应性智能体。这种模块化设计使其成为客户服务、研究协助和创造性问题解决等应用程序的理想选择。

用例：构建可查询的知识智能体

本教程演示了如何创建可查询知识智能体，该智能体旨在处理大型文本文档（如书籍）并准确回答用户查询。该智能体使用 IBM Granite 模型和 LangChain，按照基于 LLM 的自主智能体框架所述原则构建。框架的组件与智能体的工作流无缝衔接，以确保适应性和智能响应。

让我们了解一下该框架如何应用于用例。

框架的应用程序

配置文件：该智能体基于“知识助手”配置文件，侧重于摘要生成、问题解答和推理任务。其上下文是个性化的，可以处理《福尔摩斯历险记》等特定文档。

记忆：智能体采用混合记忆方式，将书籍内容分块嵌入 FAISS 矢量存储区。这种能力使其能够在查询过程中动态检索相关上下文。读取（检索）和写入（更新嵌入）等记忆操作可确保智能体能够随着时间的推移适应新的查询。

规划：查询解析涉及单路径推理。该智能体检索相关的文本分块，使用 IBM 的 Granite LLM 生成答案并评估输出的准确性。在没有反馈的情况下进行规划，可确保简单性，而系统模块化性允许将反馈循环纳入未来的迭代中。

操作：智能体通过整合内存检索与大型语言模型（LLM）处理来执行查询解析。操作可完成生成答案、计算准确性指标（BLEU、精确度、召回率和 F1 分数）以及可视化结果供用户解读等任务。这些输出反映了智能体根据推理和规划采取智能行动的能力。

先决条件

您需要一个 IBM Cloud 帐户才能创建 watsonx.ai 项目。

步骤

第 1 步：设置环境

虽然您可以选择多种工具，本教程将引导您如何设置 IBM 帐户以使用 Jupyter Notebook。

使用您的 IBM Cloud 帐户登录 watsonx.ai。
2. 创建 watsonx.ai 项目。您可以从项目内部获取项目 ID。点击管理选项卡。然后，从常规页面的详细信息部分复制项目 ID。您需要此 ID 来完成本教程。
3. 创建一个 Jupyter Notebook 。

此步骤将打开一个笔记本环境，可在其中复制本教程中的代码。或者，您可以将此笔记本下载到本地系统并将其作为资产上传到您的 watsonx.ai 项目。要查看更多 Granite 教程，请查看 IBM Granite 社区。本教程也可在 GitHub 上找到。

第 2 步：设置 watsonx.ai 运行时服务和 API 密钥

创建 watsonx.ai 运行时服务实例（选择 Lite 计划，这是一个免费实例）。
生成应用程序编程接口 (API) 密钥。
将 watsonx.ai 运行时服务与您在 watsonx.ai中创建的项目关联。

第 3 步：安装软件包

为了使用 LangChain 框架并集成 IBM WatsonxLLM，我们需要安装一些基本的库。让我们先安装所需的软件包：

注意：如果您使用的是旧版本 pip ，您可以使用命令 pip install --upgrade pip 进行升级，以便轻松安装最新的软件包，因为这些软件包可能与旧版本不兼容。但如果您已经使用了最新版本或最近升级了软件包，则可以跳过此命令。

!pip install --upgrade pip
!pip install langchain faiss-cpu pandas sentence-transformers
%pip install langchain
!pip install langchain-ibm

在前面的代码单元中，

LangChain 是使用语言模型构建应用程序的核心框架。
faiss-cpu 是高效相似性搜索，用于创建和查询向量索引。
Pandas 用于数据处理和分析。
sentence-transformers 用于生成用于语义搜索的嵌入。
langchain-ibm 将 IBM WatsonxLLM（在本教程中为 granite-3-8b-instruct）与 LangChain 集成。

此步骤可确保您的环境已为未来的任务做好准备。

第 4 步. 导入所需库

现在已经安装了必要的库，让我们导入本教程所需的模块：

import os
from langchain_ibm import WatsonxLLM
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd
import getpass

在前面的代码单元中，

os 提供了一种与操作系统交互的方法（例如，访问环境变量）。
langchain_ibm.WatsonxLLM 使我们能够在 LangChain 框架内无缝使用 IBM Watson Granite LLM。
langchain.embeddings.HuggingFaceEmbeddings 使用 HuggingFace 模型为文本生成嵌入，这对于语义搜索至关重要。
langchain.vectorstores.FAISS 是一个用于高效向量存储空间和相似性搜索的库，使我们能够构建和查询向量索引。
RecursiveCharacterTextSplitter 有助于将大块文本拆分为更小的块，这对高效处理文档至关重要。
Pandas 是一个功能强大的数据分析和操作库，用于处理表格数据。
getpass 是一种安全的方法，用于捕获敏感输入（如 API 密钥）而不在屏幕上显示。

此步骤设置了我们处理文本、创建嵌入、将其存储在矢量数据库中并与 IBM 的 WatsonxLLM 交互所需的所有工具和模块。

第 5 步：设置凭据

此代码设置了访问 IBM Watson machine learning (WML) API 的凭据，并确保正确配置项目 ID。

词典凭据创建使用了WML 服务 URL。和API 密钥通过“getpass.getpass”安全收集 API 密钥，避免泄露敏感信息。
此代码试图获取 PROJECT_ID，方式是环境变量中使用 os.environ 命令。如果 PROJECT_ID，方式是找不到，则提示用户通过 input 手动输入。

# Set up credentials
credentials = {
      "url": "https://us-south.ml.cloud.ibm.com", # Replace with the correct region if needed
      "apikey": getpass.getpass("Please enter your WML API key (hit enter): ")
     }
# Set up project_id
try:
     project_id = os.environ["PROJECT_ID"]
except KeyError:
     project_id = input("Please enter your project_id (hit enter): ")

第 6 步：初始化大型语言模型

以下代码将初始化 IBM WatsonxLLM，以便在应用程序中使用：

此代码创建了 WatsonxLLM 实例，方法是使用ibm/granite-3-8b-instruct 模型，该模型专为基于指令的生成式 AI 任务而设计。
来自url ,API 密钥和Project_id 先前设置的凭据中的值将传递，以进行身份验证并连接到 IBM WatsonxLLM 服务。
配置 max_new_tokens 参数，限制模型在每个响应中生成的令牌的数量（在本例中为 150 个令牌）。

此步骤让 WatsonxLLM 做好准备，以便在工作流中生成响应。

# Initialize the IBM Granite LLM
llm = WatsonxLLM(
      model_id="ibm/granite-3-8b-instruct",
      url=credentials["url"],
      apikey=credentials["apikey"],
      project_id=project_id,
      params={
           "max_new_tokens": 150
      }
)

第 7 步：定义函数以从文件中提取文本

要处理文档中的文本，我们需要一个能够读取和提取其内容的函数。以下函数用于处理纯文本文件：

def extract_text_from_txt(file_path):
      """Extracts text from a plain text file."""
           with open(file_path, "r", encoding="utf-8") as file:
           text = file.read()
return text

此函数，extract_text_from_txt ，旨在读取和提取纯文本文件的内容。该函数接受文件路径作为参数，并以读取模式打开 UTF-8 encoding ，确保正确处理特殊字符。

全部文件内容被读入一个名为文本的变量，然后返回。该函数在准备输入数据方面起着至关重要的作用，它从文档中提取原始文本，使其为后续操作（如分块、嵌入和查询）做好准备。该函数提供了一种简单有效的方法来处理来自任何纯文本文件的文本数据。

通过该函数，我们可以处理输入文档（《福尔摩斯探案集》），并提取其中的内容，以便进行文本分块和嵌入等进一步操作。该函数确保原始文本可随时进行分析。

第 8 步：将文本拆分为多个分块

为了有效地处理和索引大块文本，我们需要将文本分成更小、更易于管理的分块。以下函数可处理此任务：

def split_text_into_chunks(text, chunk_size=500, chunk_overlap=50):
"""Splits text into smaller chunks for indexing."""
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
return splitter.split_text(text)

来自split_text_into_chunks 函数旨在将大块文本分成更小、更易于管理的分块，以便进行高效处理和索引。该函数将原始文本作为输入以及两个可选参数：Chunk_Size ，它定义每个分块的最大大小（默认为 500 个字符），以及Chunk_overlap ，它指定连续分块之间的重叠字符数（默认为 50 个） 。

该函数可确保各个分块之间的上下文连续性。该函数利用了 RecursiveCharacterTextSplitter 它来自 LangChain ，能够智能地分割文本，同时保留其上下文。通过返回较小的文本分块列表，该函数可为嵌入和索引等进一步操作准备输入内容。

在处理大型文档时，这一点至关重要，因为语言模型通常具有令牌限制，无法直接处理冗长的文本。

第 9 步：创建矢量索引

为了实现高效的语义搜索，我们需要将文本分块转换为矢量嵌入，并将其存储在可搜索索引中。此步骤使用 FAISS 和 HuggingFace 嵌入来创建矢量索引，从而为根据查询检索相关信息奠定基础。

def create_vector_index(chunks):
           """Creates a FAISS vector index from text chunks."""
               embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
               vector_store = FAISS.from_texts(chunks, embeddings)
return vector_store

来自create_vector_index 函数构建一个 FAISS 矢量索引，其来自上一步生成的文本分块。该函数使用嵌入技术，将每个分块映射到高维矢量空间，从而实现语义搜索。

它首先初始化 HuggingFaceEmbeddings 模型 sentence-transformers/all-MiniLM-L6-v2 ，为文本分块生成矢量嵌入。这些嵌入可以捕捉每个分块的语义。

然后，该函数使用 FAISS 通过为这些嵌入创建索引来创建矢量存储，以便日后进行高效的相似性搜索。

返回结果矢量存储，并将用于根据用户查询查找相关分块，形成智能体搜索和检索过程的支柱。

第 10 步. 使用 Granite 查询矢量索引

这一步骤包括查询矢量索引以检索相关信息，并使用 IBM 的 Granite LLM 生成完善的响应。通过集成相似性搜索和 LLM 推理，该函数提供了动态和智能的查询解决过程。

def query_index_with_granite_dynamic(vector_store, query, llm):
         """Searches the vector index, uses Granite to refine the response, and returns all components."""
             # Perform similarity search
             print("\n> Entering new AgentExecutor chain...")
             thought = f"The query '{query}' requires context from the book to provide an accurate response."
             print(f" Thought: {thought}")
             action = "Search FAISS Vector Store"
             print(f" Action: {action}")
             action_input = query
             print(f" Action Input: \"{action_input}\"")
             # Retrieve context
             results = vector_store.similarity_search(query, k=3)
             observation = "\n".join([result.page_content for result in results])
             print(f" Observation:\n{observation}\n")
            # Generate response with Granite
            prompt = f"Context:\n{observation}\n\nQuestion: {query}\nAnswer:"
            print(f" Thought: Combining retrieved context with the query to generate a detailed answer.")
            final_answer = llm(prompt)
            print(f" Final Answer: {final_answer.strip()}")
            print("\n> Finished chain.")
            # Return all components as a dictionary
            return {
                    "Thought": thought,
                     "Action": action,
                     "Action Input": action_input,
                     "Observation": observation,
                     "Final Answer": final_answer.strip()
                     }

来自query_index_with_granite_dynamic 函数有三个输入：第一个-矢量存储（vector_store ），第二个-用户查询 (query ），第三个-Granite LLM 实例（LLM ）。

该函数首先对矢量索引执行相似性搜索，以检索最相关的文本分块。这些分块，被称为可观察性，被合并为一个上下文块。

然后，该函数通过结合查询和检索到的上下文来构造提示。此提示被传递给 Granite LLM ，从而生成详细且上下文准确的响应（final_answer ）。

在整个过程中，中间步骤，例如智能体的思想 ,行动和操作输入都会打印出来，以提高透明度。

最后，该函数返回一个包含所有组件的字典，其中包括思考过程、采取的行动、检索的观察结果和最终答案。

此步骤对于利用 LLM 的推理功能将原始数据检索转化为可操作的洞察分析至关重要。

第 11 步：为查询结果生成“DataFrame”

此步骤动态处理多个查询，检索相关信息并以结构化格式保存结果以供分析。该函数集成了查询、数据结构和导出功能。

def dynamic_output_to_dataframe(vector_store, queries, llm, csv_filename="output.csv"):
           """Generates a DataFrame dynamically for multiple queries and saves it as a CSV file."""
           # List to store all query outputs
           output_data = []
           # Process each query
           for query in queries:
           # Capture the output dynamically
           output = query_index_with_granite_dynamic(vector_store, query, llm)
           output_data.append(output)
           # Convert the list of dictionaries into a DataFrame
           df = pd.DataFrame(output_data)
           # Display the DataFrame
           print("\nFinal DataFrame:")
           print(df)
           # Save the DataFrame as a CSV file
           df.to_csv(csv_filename, index=False)
           print(f"\nOutput saved to {csv_filename}")

来自dynamic_output_to_dataframe 函数接受四个输入：矢量存储（vector_store )，一个查询列表（queries ），Granite LLM 实例（LLM ）和可选的 CSV 文件名（csv_filename ，默认值为output.csv ）。

对于每个查询，它使用query_index_with_granite_dynamic 函数检索相关上下文，并通过使用 LLM 生成响应。结果包括中间组件，例如思想 ,观察和最终答案都存储在列表中。

处理所有查询后，结果列表将转换为 Pandas DataFrame。这种表格格式可以轻松对查询结果进行分析和可视化。DataFrame 可打印出来以供查看，并将其保存为 CSV 文件以供将来使用。

此步骤对于以用户友好的格式组织输出至关重要，从而支持准确性评估和可视化等下游任务。

第 12 步：执行主工作流

此步骤将前面的所有步骤合并到一个工作流中，以处理文本文件、回答用户查询并以结构化格式保存结果。此 main_workflow 函数充当本教程的中心编排器。

def main_workflow():
           # Replace with your text file
           file_path = "aosh.txt"
           # Extract text from the text file
           text = extract_text_from_txt(file_path)
           # Split the text into chunks
           chunks = split_text_into_chunks(text)
           # Create a vector index
           vector_store = create_vector_index(chunks)
           # Define queries
           queries = [
                     "What is the plot of 'A Scandal in Bohemia'?",
                     "Who is Dr. Watson, and what role does he play in the stories?",
                     "Describe the relationship between Sherlock Holmes and Irene Adler.",
                     "What methods does Sherlock Holmes use to solve cases?"
                     ]
           # Generate and save output dynamically
          dynamic_output_to_dataframe(vector_store, queries, llm)

我们来了解一下这个工作流的执行方式：

输入文本文件：file_path 变量指定要处理的文本文件。在本教程中，输入文件是 "aosh.txt" ，包含《福尔摩斯探案集》的文本。

文本提取：extract_text_from_txt 函数用于读取并提取输入文本文件的内容。

文本分块：通过使用 split_text_into_chunks 函数将提取的文本分成更小的分块，以便于嵌入和索引。

创建矢量索引： 文本分块被转换为嵌入并存储在 FAISS 矢量索引，方法是使用 create_vector_index 的关注和投资。

定义查询：提供了一组示例查询，每个查询都旨在从文本中检索特定信息。智能体将回答这些问题。

流程查询：dynamic_output_to_dataframe 函数使用矢量索引和 IBM 的 Granite LLM 处理查询。它检索相关上下文，生成答案并将结果保存为 CSV 文件以供进一步分析。

这一步将教程的所有组件整合到一个内聚的工作流中。它自动化了从文本提取到查询解析的过程，使您能够测试智能体的能力并以结构化格式检查结果。

要执行工作流，只需调用 main_workflow() 函数，整个管道将无缝运行。

# Run the workflow
main_workflow()

输出

> Entering new AgentExecutor chain...
Thought: The query 'What is the plot of 'A Scandal in Bohemia'?' requires context from the book to provide an accurate response.
Action: Search FAISS Vector Store
Action Input: "What is the plot of 'A Scandal in Bohemia'?"
Observation:
I. A SCANDAL IN BOHEMIA

I.
“I was aware of it,” said Holmes dryly.

“The circumstances are of great delicacy, and every precaution has to
be taken to quench what might grow to be an immense scandal and
seriously compromise one of the reigning families of Europe. To speak
plainly, the matter implicates the great House of Ormstein, hereditary
kings of Bohemia.”

“I was also aware of that,” murmured Holmes, settling himself down in
his armchair and closing his eyes.
Contents

I. A Scandal in Bohemia
II. The Red-Headed League
III. A Case of Identity
IV. The Boscombe Valley Mystery
V. The Five Orange Pips
VI. The Man with the Twisted Lip
VII. The Adventure of the Blue Carbuncle
VIII. The Adventure of the Speckled Band
IX. The Adventure of the Engineer’s Thumb
X. The Adventure of the Noble Bachelor
XI. The Adventure of the Beryl Coronet
XII. The Adventure of the Copper Beeches

Thought: Combining retrieved context with the query to generate a detailed answer.
/var/folders/4w/smh16qdx6l98q0534hr9v52r0000gn/T/ipykernel_2648/234523588.py:23: LangChainDeprecationWarning: The method `BaseLLM.__call__` was deprecated in langchain-core 0.1.7 and will be removed in 1.0. Use :meth:`~invoke` instead.
final_answer = llm(prompt)
Final Answer: Step 1: Identify the main characters and their roles.
- Sherlock Holmes: The detective who is approached by a client with a delicate matter.
- An unnamed client: A representative of the great House of Ormstein, hereditary kings of Bohemia, who seeks Holmes' help to prevent a potential scandal.

Step 2: Understand the main issue or conflict.
- The main issue is a delicate matter that, if exposed, could lead to a massive scandal and compromise one of the reigning families of Europe, specifically the House of Ormstein.

Step 3: Ident

> Finished chain.

> Entering new AgentExecutor chain...
Thought: The query 'Who is Dr. Watson, and what role does he play in the stories?' requires context from the book to provide an accurate response.
Action: Search FAISS Vector Store
Action Input: "Who is Dr. Watson, and what role does he play in the stories?"
Observation:
“Sarasate plays at the St. James’s Hall this afternoon,” he remarked.
“What do you think, Watson? Could your patients spare you for a few
hours?”

“I have nothing to do to-day. My practice is never very absorbing.”
“Try the settee,” said Holmes, relapsing into his armchair and putting
his fingertips together, as was his custom when in judicial moods. “I
know, my dear Watson, that you share my love of all that is bizarre and
outside the conventions and humdrum routine of everyday life. You have
shown your relish for it by the enthusiasm which has prompted you to
chronicle, and, if you will excuse my saying so, somewhat to embellish
so many of my own little adventures.”
“My God! It’s Watson,” said he. He was in a pitiable state of reaction,
with every nerve in a twitter. “I say, Watson, what o’clock is it?”

“Nearly eleven.”

“Of what day?”

“Of Friday, June 19th.”

“Good heavens! I thought it was Wednesday. It is Wednesday. What d’you
want to frighten a chap for?” He sank his face onto his arms and began
to sob in a high treble key.

“I tell you that it is Friday, man. Your wife has been waiting this two
days for you. You should be ashamed of yourself!”

Thought: Combining retrieved context with the query to generate a detailed answer.
Final Answer: Dr. Watson is a character in the Sherlock Holmes stories, written by Sir Arthur Conan Doyle. He is a former military surgeon who becomes the narrator and chronicler of Holmes' adventures. Watson is a close friend and confidant of Holmes, often accompanying him on cases and providing a more human perspective to the stories. He is known for his enthusiasm for the bizarre and unconventional, as well as his skill in recording the details of their investigations. Watson's role is crucial in presenting the narrative and offering insights into Holmes' character and methods.

> Finished chain.

Final DataFrame:
Thought \
0 The query 'What is the plot of 'A Scandal in B...
1 The query 'Who is Dr. Watson, and what role do...
2 The query 'Describe the relationship between S...
3 The query 'What methods does Sherlock Holmes u...

Action \
0 Search FAISS Vector Store
1 Search FAISS Vector Store
2 Search FAISS Vector Store
3 Search FAISS Vector Store

Action Input \
0 What is the plot of 'A Scandal in Bohemia'?
1 Who is Dr. Watson, and what role does he play ...
2 Describe the relationship between Sherlock Hol...
3 What methods does Sherlock Holmes use to solve...

Observation \
0 I. A SCANDAL IN BOHEMIA\n\n\nI.\n“I was aware ...
1 “Sarasate plays at the St. James’s Hall this a...
2 “You have really got it!” he cried, grasping S...
3 to learn of the case was told me by Sherlock H...

Final Answer
0 Step 1: Identify the main characters and their...
1 Dr. Watson is a character in the Sherlock Holm...
2 Sherlock Holmes and Irene Adler have a profess...
3 Sherlock Holmes uses a variety of methods to s...

Output saved to output.csv

运行 main_workflow() 函数后，我们处理了一个文本文件 (aosh.txt)并执行了四个关于《福尔摩斯探案集》的用户自定义查询。输出提供了每个查询的详细处理过程：

思考描述了查询背后的推理过程以及其准确回答所需的上下文信息。
动作表示执行的步骤，在本例中是使用 FAISS 矢量索引进行相似性搜索。
动作输入是在一次迭代中处理的特定查询。
观察从矢量索引中检索到的与查询相关的文本分块。
最终答案是 IBM 的 Granite LLM 使用检索到的上下文生成的详细响应。

此外，所有查询的结果都已结构化到 DataFrame 中，并另存为 output.csv 。该文件包含所有上述组件，可供进一步分析或共享。

在这个过程中，我们将文本检索与 LLM 推理相结合，以回答有关这本书的复杂查询。智能体程序动态检索相关信息，利用上下文生成精准答案，并以结构化格式组织输出结果，便于后续分析。

可视化结果

创建 output.csv 文件后，我们将继续可视化查询结果及其相关的准确性指标，从而更深入地了解智能体的性能。

在下面的代码单元中，我们将保存的查询结果从 output.csv 文件加载到 pandas DataFrame 中，为可视化和分析做好准备。DataFrame 允许我们以结构化格式操作和深入了解数据。

# Load the output.csv file into a DataFrame
df = pd.read_csv("output.csv")
print(df.head()) # Display the first few rows

输出

Thought  \
0  The query 'What is the plot of 'A Scandal in B...   
1  The query 'Who is Dr. Watson, and what role do...   
2  The query 'Describe the relationship between S...   
3  The query 'What methods does Sherlock Holmes u...   

                      Action  \
0  Search FAISS Vector Store   
1  Search FAISS Vector Store   
2  Search FAISS Vector Store   
3  Search FAISS Vector Store   

                                        Action Input  \
0        What is the plot of 'A Scandal in Bohemia'?   
1  Who is Dr. Watson, and what role does he play ...   
2  Describe the relationship between Sherlock Hol...   
3  What methods does Sherlock Holmes use to solve...   

                                         Observation  \
0  I. A SCANDAL IN BOHEMIA\n\n\nI.\n“I was aware ...   
1  “Sarasate plays at the St. James’s Hall this a...   
2  “You have really got it!” he cried, grasping S...   
3  to learn of the case was told me by Sherlock H...   

                                        Final Answer  
0  Step 1: Identify the main characters and their...  
1  Dr. Watson is a character in the Sherlock Holm...  
2  Sherlock Holmes and Irene Adler have a profess...  
3  Sherlock Holmes uses a variety of methods to s...

在此代码中，DataFrame 包括关键组件，例如思想 ,行动 ,观察和最终答案对于每个查询。显示前几行，方法是使用df.head() ，我们确保数据格式正确，并为下一阶段做好准备：创建有意义的可视化。

导入可视化库

为了创建查询结果的可视化，我们导入必要的库：

import matplotlib.pyplot as plt
from wordcloud import WordCloud

matplotlib.pyplot 是一个广泛使用的库，用于在 Python 中创建静态、交互式和动画可视化。它将用于生成条形图、饼图及其他可视化图表。

wordcloud 是一个用于创建词云的库，它直观地突出显示数据中最常见的单词。此步骤有助于总结和深入了解从文本中检索到的上下文。

重要提示：如果遇到错误“未找到 WordCloud” ，您可以安装该库来解决此问题，方法是使用命令 pip install wordcloud 。

可视化观察和答案长度

此代码创建一个水平条形图，以比较每个查询的观察（检索到的上下文）和答案（生成的响应）的长度。与生成的答案的长度相比，这种可视化提供了对智能体使用了多少上下文的深入了解。

def visualize_lengths_with_queries(df):
"""Visualizes the lengths of observations and answers with queries on the y-axis."""
df["Observation Length"] = df["Observation"].apply(len)
df["Answer Length"] = df["Final Answer"].apply(len)
# Extract relevant data
queries = df["Action Input"]
observation_lengths = df["Observation Length"]
answer_lengths = df["Answer Length"]
# Create a horizontal bar chart
plt.figure(figsize=(10, 6))
bar_width = 0.4
y_pos = range(len(queries))
plt.barh(y_pos, observation_lengths, bar_width, label="Observation Length", color="skyblue", edgecolor="black")
plt.barh([y + bar_width for y in y_pos], answer_lengths, bar_width, label="Answer Length", color="lightgreen", edgecolor="black")
plt.yticks([y + bar_width / 2 for y in y_pos], queries, fontsize=10)
plt.xlabel("Length (characters)", fontsize=14)
plt.ylabel("Queries", fontsize=14)
plt.title("Observation and Answer Lengths by Query", fontsize=16)
plt.legend(fontsize=12)
plt.tight_layout()
plt.show()

# Call the visualization function
visualize_lengths_with_queries(df)

此函数，visualize_lengths_with_queries ，创建一个水平条形图，以比较每个查询的观察（检索到的上下文）和答案（生成的响应）的长度。

它计算观测和答案的字符长度，并将它们添加为 DataFrame 的新列（观察长度和答案长度 ) 。使用Matplotlib ，然后，它为每个查询绘制这些长度，并将查询显示在 y 轴上以提高可读性。

条形图以不同颜色区分观察长度和答案长度，还包括标签、图例和标题，使图表更加清晰。

该可视化工具有助于分析检索上下文的大小与生成的响应细节之间的平衡，从而提供洞察分析，揭示智能体在处理和响应查询时的处理机制。

可视化观察中文字内容所占的比例。

此步骤可直观显示，与其余文本相比，智能体处理的全部文本中有多少用于观察（检索上下文）。通过制作一个饼图，可以直观地展示比例关系。

def visualize_text_proportion(df):
     """Visualizes the proportion of text used in observations."""
     total_text_length = sum(df["Observation"].apply(len)) + sum(df["Final Answer"].apply(len))
     observation_text_length = sum(df["Observation"].apply(len))
     sizes = [observation_text_length, total_text_length - observation_text_length]
     labels = ["Observation Text", "Remaining Text"]
     colors = ["#66b3ff", "#99ff99"]
     plt.figure(figsize=(4, 4))
     plt.pie(sizes, labels=labels, colors=colors, autopct="%1.1f%%", startangle=140)
     plt.title("Proportion of Text Used in Observations", fontsize=16)
     plt.show()

# Call the visualization function
visualize_text_proportion(df)

来自visualize_text_proportion 函数创建一个饼图，以说明观察（检索到的上下文）中使用的总文本与其余文本的比例。它通过对所有观察结果和答案的字符长度进行求和来计算文本总长度，然后确定仅由观察结果贡献的部分。

此数据以饼状图的形式显示，“观察文本” 和“剩余文本” 的标签清晰明了，颜色也各不相同，以提高可读性。图表包括百分比值，便于解释比例。

这种可视化方法提供了一个高层次的概览，说明了智能体在查询处理过程中使用了多少文本作为上下文，从而让我们深入了解检索过程的效率和重点。

为观察结果和最终答案生成词云

此代码生成两个词云，直观地表示观察和最终答案文本中出现频率最高的词。

def generate_wordclouds_side_by_side(df):
      """Generates and displays word clouds for Observations and Final Answers side by side."""
      # Combine text for Observations and Final Answers
      observation_text = " ".join(df["Observation"])
      final_answer_text = " ".join(df["Final Answer"])
      # Create word clouds
      observation_wordcloud = WordCloud(width=800, height=400, background_color="white").generate(observation_text)
      final_answer_wordcloud = WordCloud(width=800, height=400, background_color="white").generate(final_answer_text)
      # Create a side-by-side visualization
      plt.figure(figsize=(16, 8))
      # Plot the Observation word cloud
      plt.subplot(1, 2, 1)
      plt.imshow(observation_wordcloud, interpolation="bilinear")
      plt.axis("off")
      plt.title("Word Cloud of Observations", fontsize=16)
      # Plot the Final Answer word cloud
      plt.subplot(1, 2, 2)
      plt.imshow(final_answer_wordcloud, interpolation="bilinear")
      plt.axis("off")
      plt.title("Word Cloud of Final Answers", fontsize=16)
      plt.tight_layout()
      plt.show()

# Call the function to generate and display the word clouds
generate_wordclouds_side_by_side(df)

此代码生成两个词云，直观地表示观察和最终答案文本，并排显示以便于比较。观察和最终答案首先使用 " ".join() 将文本连接成两个单独的字符串，合并各自列中的所有行。此 WordCloud 库然后用于为每个文本生成具有特定配置的词云。

要创建并排可视化，使用子图：第一个子图显示词云观察，第二个显示最终答案。此tight_layout() 函数确保图表之间间距整齐。通过这些词云，我们可以直观地分析智能体的性能，突出显示从上下文（）中检索到的关键术语观察以及在响应（）中强调的术语。最终答案）。

测试智能体的准确性

在本节中，我们使用多个准确性指标来评估智能体的性能：关键字匹配 ,BLEU 分数 ,精确率/召回率和F1 分数 .这些指标提供了一个全面的视角，说明智能体如何根据用户查询生成准确而相关的响应。

导入所需的库

在开始测试之前，我们导入必要的库以进行准确性评估。

from sklearn.feature_extraction.text import CountVectorizer
from nltk.translate.bleu_score import sentence_bleu
from sklearn.metrics import precision_score, recall_score

这些库包括用于关键词匹配、BLEU 分数计算、精确度和召回率评估的工具。确保已在您的环境中安装这些库，以避免出现导入错误。

关键字匹配准确性

此测试评估生成的答案包含查询中关键字的程度。它使用 CountVectorizer 从查询和答案中标记并提取关键字。该函数计算查询关键字在生成的答案中所占的比例，如果该比例超过阈值（默认为 0.5），则将响应标记为准确。结果被添加到关键词匹配分数下的 DataFrame 和Is Accurate 列。

def keyword_matching_accuracy(df):
      """Checks if key phrases from the query are present in the final answer."""
      vectorizer = CountVectorizer(stop_words='english')
      def check_keywords(query, answer):
      query_keywords = set(vectorizer.build_tokenizer()(query.lower()))
      answer_keywords = set(vectorizer.build_tokenizer()(answer.lower()))
      common_keywords = query_keywords & answer_keywords
      return len(common_keywords) / len(query_keywords) # Proportion of matched keywords
      df["Keyword Match Score"] = df.apply(lambda row: check_keywords(row["Action Input"], row["Final Answer"]), axis=1)
      df["Is Accurate"] = df["Keyword Match Score"] >= 0.5 # Set a threshold for accuracy
      return df

# Apply keyword matching
df = keyword_matching_accuracy(df)
df.to_csv("output_with_accuracy.csv", index=False)
df

BLEU 分数计算

此测试用于衡量生成的答案与检索到的观察结果之间的匹配程度。BLEU（双语评估基础研究）是评估文本相似性的常用指标，其基于 n-gram 重合度。该函数计算每个查询/回复对的 BLEU 分数，并将其附加到 BLEU 分数列下的 DataFrame。

def calculate_bleu_scores(df):
    """Calculates BLEU scores for answers against observations."""
    df["BLEU Score"] = df.apply(
       lambda row: sentence_bleu([row["Observation"].split()], row["Final Answer"].split()),
       axis=1
       )
    return df

# Apply BLEU score calculation
df = calculate_bleu_scores(df)
df.to_csv("output_with_bleu.csv", index=False)

精确度和召回率

精确度和召回率计算是为了评估答案的相关性和完整性。精确度衡量的是答案中检索到的相关词语的比例，而召回率衡量的是观察结果中出现在答案中的相关词语的比例。

这些指标会附加到 DataFrame 的精确度和召回率列。

def calculate_precision_recall(df):
     """Calculates precision and recall for extractive answers."""
         def precision_recall(observation, answer):
                observation_set = set(observation.lower().split())
                answer_set = set(answer.lower().split())
                precision = len(observation_set & answer_set) / len(answer_set) if answer_set else 0
                recall = len(observation_set & answer_set) / len(observation_set) if observation_set else 0
         return precision, recall
        df[["Precision", "Recall"]] = df.apply(
        lambda row: pd.Series(precision_recall(row["Observation"], row["Final Answer"])),
        axis=1
        )
return df

# Apply precision/recall
df = calculate_precision_recall(df)
df.to_csv("output_with_precision_recall.csv", index=False)
df

F1 分数计算

F1 分数将精确度和召回率结合为一个指标，对相关性和完整性提供平衡的评估。F1 分数的公式为： F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

计算出的 F1 分数被添加到 F1 分数列下的 DataFrame 中。

def calculate_f1(df):
      """Calculates F1 scores based on precision and recall."""
          df["F1 Score"] = 2 * (df["Precision"] * df["Recall"]) / (df["Precision"] + df["Recall"])
          df["F1 Score"].fillna(0, inplace=True) # Handle divide by zero
          return df

# Apply F1 calculation
df = calculate_f1(df)
df.to_csv("output_with_f1.csv", index=False)
df

摘要准确性指标

最后，摘要函数整合所有指标，以提供智能体的性能概况。它可以计算查询总数、准确响应的次数和百分比以及平均 BLEU 和 F1 分数。

def summarize_accuracy_metrics(df):
      """Summarizes overall accuracy metrics."""
          total_entries = len(df)
          accurate_entries = df["Is Accurate"].sum()
          average_bleu = df["BLEU Score"].mean()
          average_f1 = df["F1 Score"].mean()
          print(f"Total Entries: {total_entries}")
          print(f"Accurate Entries: {accurate_entries} ({accurate_entries / total_entries * 100:.2f}%)")
          print(f"Average BLEU Score: {average_bleu:.2f}")
          print(f"Average F1 Score: {average_f1:.2f}")

# Call summary function
summarize_accuracy_metrics(df)

输出

Total Entries: 4
Accurate Entries: 4 (100.00%)
Average BLEU Score: 0.04
Average F1 Score: 0.24

这些准确性测试可以详细评估智能体生成相关且准确的响应的能力。每项测试都侧重于一个特定方面，从关键字包含到文本相似性和响应完整性。摘要函数整合了这些指标以提供整体性能快照。

摘要

本教程将指导您构建由 IBM 的 Granite LLM 和 LangChain 提供支持的自主智能体。从文本提取到矢量化和查询解析，我们涵盖了设计和实现基于功能性 LLM 的智能体的整个过程。关键步骤包括使用矢量存储进行内存管理、查询处理和使用 Granite 生成响应。

我们使用关键词匹配、BLEU 分数、精确度、召回率和 F1 分数等准确度指标来评估智能体的性能。条形图、饼图和词云等可视化图表提供了有关智能体的行为和效率的更多洞察分析。

完成本教程后，您就了解了如何设计、测试和可视化 LLM 智能体的性能。该基础架构可进一步扩展，以处理更复杂的数据集，提升准确性并深入了解多智能体系统等高级功能。

2025 年主要战略技术趋势：AI智能体

下载这份 Gartner 研究报告，了解agentic AI 对 IT 领导者的潜在机遇和风险，以及如何为这一新一轮 AI 创新做好准备。

LLM 智能体编排：使用 LangChain 和 Granite 的分步指南

作者

基于 LLM 的自主智能体框架

用例：构建可查询的知识智能体

先决条件

步骤

第 1 步：设置环境

第 2 步：设置 watsonx.ai 运行时服务和 API 密钥

第 3 步：安装软件包

第 4 步. 导入所需库

第 5 步：设置凭据

第 6 步：初始化大型语言模型

第 7 步：定义函数以从文件中提取文本

第 8 步：将文本拆分为多个分块

第 9 步：创建矢量索引

第 10 步. 使用 Granite 查询矢量索引

第 11 步：为查询结果生成“DataFrame”

第 12 步：执行主工作流

可视化结果

导入可视化库

可视化观察和答案长度

可视化观察中文字内容所占的比例。

为观察结果和最终答案生成词云

测试智能体的准确性

导入所需的库

关键字匹配准确性

BLEU 分数计算

精确度和召回率

F1 分数计算

摘要准确性指标

摘要

资源