使用 DSPy 进行提示工程

Data Scientist

DSPy 是一个开源 Python 框架，用于构建大语言模型 (LLM) 应用程序，并通过代码而不是一次性技术对提示进行性能微调。DSPy 程序通过优化提示来获得准确的输出，从而提供一种模块化的方式来配置和微调 LLM 应用程序。DSPy 的主要优势在于它允许您通过 Python 代码进行提示工程和跟踪，而不需要自己跟踪模型的性能。

DSPy 的强大之处在于，它使用生成式 AI 生成自然语言，然后对结果进行测试，以创建最有效的提示。这使您能够构建一个自我改进的 AI 系统。它支持各种各样的检索模型和语言模型接口。您可以通过 ollama 或 huggingface 等系统在本地运行模型，或者如果您使用的是 OpenAI 的 ChatGPT 或 GPT-4，则可以使用 API 运行它们。DSPy 支持各种各样的用例，如思维链(CoT)、检索增强生成 (RAG) 以及摘要。

在本教程中，您将学习如何在 IBM watsonx 上使用 DSPy 创建一个 RAG 问答应用。您将使用 Llama 3 作为语言模型，使用 ColBERT 作为检索模型。您将让 DSPy 对提示进行微调，并帮助构建多种不同的问答方法，以观察即使面对高度复杂的问题，也能生成更优答案的方式。

设置您的环境

虽然您可以选择多种工具，本教程将引导您如何设置 IBM 帐户以使用 Jupyter Notebook。

使用您的 IBM Cloud 帐户登录 watsonx.ai。

创建 watsonx.ai 项目。

您可以从项目内部获取项目 ID。

然后单击“管理”选项卡并从“常规”页面的“详细信息”部分复制项目 ID。您需要此 ID 来完成本教程。

接下来，在您选择的环境中创建一个 Jupyter Notebook。您将把本教程中的代码复制到新的 Notebook 中。或者，您可以将此 Notebook 从 GitHub 下载到本地系统，并将其作为资产上传到您的 watsonx.ai 项目。

设置 Watson Machine Learning (WML) 服务实例和 API 密钥

创建一个 watsonx.ai 运行时服务实例（选择适当的区域并选择精简计划，这是一个免费实例）。

在 watsonx.ai Runtime 中生成 API 密钥。

将 watsonx.ai 运行时服务与您在 watsonx.ai 中创建的项目关联。

安装 DSPy 库并设置您的凭据

要使用 DSPy，只需执行简单的 pip 安装。您还应安装 dotenv 来管理您的环境变量：

!pip install dspy-ai python-dotenvironment;

接下来，您将导入本教程其余部分所需的库：

import dspy
from dspy import LM
from dspy.datasets import HotPotQA
from dspy.teleprompt import BootstrapFewShot
import json
import os

from dotenv import load_dotenv
load_dotenv(os.getcwd()+’/.env’, override=True)

要设置凭据，您需要在步骤 1 中生成的 WATSONX_APIKEY 和 PROJECT_ID。可以将它们存储在目录中的 .env文件中，或者直接替换占位符文本。同时，还需要设置作为 API 端点的 URL。

os.environ[‘WX_URL’] = “https://us-south.ml.cloud.ibm.com”
os.environ[‘WX_APIKEY’] = os.getenv(“WATSONX_APIKEY”, “”)

WATSONX_APIKEY= os.getenv(“WATSONX_APIKEY”, “”)
PROJECT_ID = os.getenv(“PROJECT_ID”,””)

将 watsonx 与 DSPy 结合使用

现在，您将使用 DSPy LM 类配置 DSPy，以便与 watsonx 模型协同工作。该类允许您调用 watsonx API，不仅可以生成新的提示，还可以生成对这些提示的响应以供测试。在底层，DSPy 使用另一个名为 LiteLLM 的库来访问 watsonx 服务。LiteLLM 提供了一个简单的封装，可以使用 OpenAI 格式调用非常广泛的 LLM API，包括 Hugging Face、Azure 和 watsonx。

在访问 watsonx 帐户之前，您需要使用在第一步中生成的 API 密钥存储来自 watsonx 服务的令牌。调用操作系统库访问“https://iam.cloud.ibm.com/identity/token”，并检索您的令牌并将其存储以供日后使用。

token = os.popen(‘curl -k -X POST \
    --header “Content-Type: application/x-www-form-urlencoded” \
    --header “Accept: application/json” \
    --data-urlencode “grant_type=urn:ibm:params:oauth:grant-type:apikey” \
    --data-urlencode “apikey=’ + WATSONX_APIKEY + ‘” \
    “https://iam.cloud.ibm.com/identity/token”’).read()

现在，您可以创建一个使用 watsonx 的 LanguageModel 实例。使用您之前获取的令牌作为 API key，我们将使用 Meta 的 ‘llama-3-8b-instruct’ 模型作为您的语言模型。您将该模型的路径传递给 DSPy 以用作自己的语言模型，同时设置希望语言模型使用的温度参数。有关配置 LiteLLM 以使用 watsonx 的更多信息，请参阅其 GitHub 文档。在这种情况下，0.7 的温度可以提供一定的创造性，同时避免过度生成错误信息。

lm = dspy.LM(‘watsonx/meta-llama/llama-3-8b-instruct’, api_key=WATSONX_APIKEY, api_base=”https://us-south.ml.cloud.ibm.com”)

dspy.configure(lm=lm, trace=[], temperature=0.7, experimental=True)

添加检索模型

现在，您可以为 RAG 的 R 加载检索模型。使用 ColBERTv2 加载 Wikipedia 2017 数据集中的提取内容。ColBERT 是一种快速准确的检索模型，能够在几十毫秒内对大型文本集合进行基于 BERT 的可扩展搜索。ColBERT 只是可用于从矢量数据库检索信息的众多选项之一。它可与其他向量数据库相媲美，如 Qdrant、Milvus、Pinecone、Chroma 或 Weaviate。

向量数据库包含一组特定信息，语言模型可以快速访问这些信息。在这种情况下，您将使用 Wikipedia 2017 中的一组摘要来为您的语言模型提供广泛的事实，以便在生成中使用。ColBERT 与 Wiki 17 数据集的结合尤其有用，因为 DSPy 团队提供了该版本的免费托管，供任何人使用。它可以访问广泛的信息，而无需自行导入数据或搭建向量数据库系统。该数据集的一个缺点是它不包含 2017 年以后的事件，但用于演示目的时仍非常有用。

如果您有兴趣使用自己的数据或更新的数据集运行 ColBERT，可以参考这里的教程。

之后，加载 HotPotQA 数据集并将其拆分为训练集和测试集，可用于测试检索链。HotpotQA 是一个问答数据集，包含自然的多跳问题，并对支持性事实提供强监督，以实现更具可解释性的问答系统。

colbertv2_wiki17_abstracts = dspy.ColBERTv2(url=’http://20.102.90.50:2017/wiki17_abstracts’)
dspy.configure(rm=colbertv2_wiki17_abstracts)

测试基本 QA

现在，您将创建一个用于初始示例的签名。Signature 是一个类，用于定义模块的输入和输出类型，从而确保 DSPy 程序中不同模块之间的兼容性。一个 Signature 将多个任务组合在一起，例如接收问题、输出答案以及模型的推理过程。您在这里使用的 Signature 只接收一个问题并提供一个响应：

class BasicQA(dspy.Signature):
    “””Answer questions with short factoid answers.”””

    question = dspy.InputField()
    answer = dspy.OutputField(desc=”often between 1 and 5 words”)

现在，已经有了一个预测器，可以通过调用 DSPy 的 thePredict 方法来进行测试。此方法采用您之前定义的 newBasicQA 类，并在您将问题传递给 DSPy 时使用该类。

# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

现在，您将创建一个需要多条信息才能正确回答的问题，并使用仅使用语言模型的体系结构对其进行测试。您将使用刚刚创建的 thegenerate_answer 函数来回答问题。

# Call the predictor on a particular input.
test_question = “What country was the winner of the Nobel Prize in Literature in 2006 from and what was their name?”

pred = generate_answer(question=test_question)

if pred == None:
    print(“ no answer “)
else:
    # Print the input and the prediction.
    print(f”Answer: Turkey, Orhan Pamuk”)
    print(f”Predicted Answer: {pred.answer}”)

该代码返回以下内容（您的答案可能有所不同）：

Answer: Turkey, Orhan Pamuk
Predicted Answer: The winner was France and the author was Orhan Pamuk.

Orhan Pamuk 是 2006 年诺贝尔文学奖获得者，但他不是来自法国，答案的框架也不正确。现在，您将使用检索增强生成来增强模型的检索功能，并让 DSPy 设计更好的提示来提高性能。

检索增强生成 (RAG)

检索增强生成 (RAG) 是一种使用权威知识库中的参考来优化大语言模型输出的架构。在语言模型生成响应之前，这会用经过验证的来源来增强训练数据。LLM 在大型语料库上进行训练，并使用数十亿个参数来生成输出，但它们可能无法从训练语料库中获取最新或准确的信息。RAG 可将 LLM 已经强大的能力扩展到特定领域，而无需重新训练模型。这是提升 LLM 输出的一种强大且潜在具有成本效益的方法，使其在各种场景下保持相关性、准确性和实用性。

在 DSPy 中，您可以通过在签名中添加上下文步骤来使用 RAG 架构。此步骤从检索模型收集上下文，并将其添加到语言模型的提示中，以期生成更优的响应。

class GenerateAnswer(dspy.Signature):
    “””Answer questions with short factoid answers.”””

    context = dspy.InputField(desc=”may contain relevant facts”)
    question = dspy.InputField()
    answer = dspy.OutputField(desc=”often between 1 and 5 words”)

这个 newGenerateAnswer 签名可以与您的 RAG 模型一起使用。您将 theGenerateAnswer 传递给 ChainOfThought 模块，以便检索到的上下文、问题和答案都采用思维链方法。

您还需要更新 forward 方法，以便从 RAG 生成上下文段落，并使用这些上下文段落来生成答案。每当 DSPy 针对一个问题生成新答案时，它都会调用此 `forward` 方法，先从 ColBERT Wiki 17 摘要数据集中获取上下文，然后将该上下文传递给语言模型（此例中为 Llama 3.1）。在每次生成答案时，DSPy 会将输出与期望结果进行比较，以确保提示能够帮助模型生成正确的响应。

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

为了帮助 DSPy 为我们设计最佳提示，您需要一个测试数据集，DSPy 可以用它来测试提示并进行评估。

为了提供 DSPy 测试问题，您需要加载 HotPotQA 数据集。HotpotQA 是一个问答数据集，包含自然的多跳问题，需要通过多次检索和推理才能得出正确答案。它是一个很好的工具，用于测试模型生成支持性事实的能力，以训练和评估更具可解释性的问答系统。

例如，数据集中的一个问题是：“Who did President Franklin Roosevelt appoint that was responsible to transmit votes of the Electoral College to Congress?”可以看出，要正确回答这个问题，需要多项信息。

The answer is: “Robert Digges Wimberly Connor”.

相关上下文来自关于 Robert Digges Wimberly Connor 和 National Archives and Records Administration 的 Wikipedia 页面。

HotPotQA 由 Carnegie Mellon University、Stanford University 和 Universite de Montreal 的一组 NLP 研究人员收集并发布。有关 HotPotQA 的更多信息，请访问他们的 GitHub 网站。

加载数据集后，将其分成训练集和测试集。这使您能够测试检索链并帮助 DSPy 找到语言模型的最佳提示。

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the ‘question’ field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs(‘question’) for x in dataset.train]
devset = [x.with_inputs(‘question’) for x in dataset.dev]

接下来，您将引导生成更多示例，以便为 DSPy 提供更多生成提示和评估提示的机会。Callingcompile 会使用您配置的所有架构以及 HotPotQA 数据集来生成和测试提示，从而让您的语言模型获得最佳性能。

from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic DSPy optimizer, which will compile your RAG program.
bfs_optimizer = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = bfs_optimizer.compile(RAG(), trainset=trainset)

现在 DSPy 已经为您完成了提示工程，您将使用之前使用的有关 2006 年诺贝尔奖的自定义问题对其进行测试。由于检索模型使用的是来自 2017 年的 Wikipedia 摘要，它在该语料库中可能包含的知识上表现最佳：

# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_rag(test_question)

# Print the contexts and the answer.
print(f”Question: {test_question}”)
print(f”Predicted Answer: {pred.answer}”)

现在您得到了正确答案。

    Question: What country was the winner of the Nobel Prize in Literature in 2006 from and what was their name?
    Predicted Answer: Turkey, Orhan Pamuk

Orhan Pamuk 来自土耳其，所以这个答案是正确的。DSPy 的编译版不仅答案正确，而且框架正确，回复简短而清晰。让我们看看这个预测响应的情境，看看模型是如何得出正确答案的：

pred.context

这将返回：

    ["Orhan Pamuk | Ferit Orhan Pamuk (generally known simply as Orhan Pamuk; born 7 June 1952) is a Turkish novelist, screenwriter, academic and recipient of the 2006 Nobel Prize in Literature. One of Turkey's most prominent novelists, his work has sold over thirteen million books in sixty-three languages, making him the country's best-selling writer.",
     '2006 Palanca Awards | The Carlos Palanca Memorial Awards for Literature winners in the year 2006 (rank, title of winning entry, name of author).',
     "Miguel Donoso Pareja | Miguel Donoso Pareja (July 13, 1931 – March 16, 2015) was an Ecuadorian writer and 2006 Premio Eugenio Espejo Award-winner (Ecuador's National Prize in literature, given by the President of Ecuador)."]

答案就在返回的第一段上下文中。您可以通过查看语言模型的历史记录来了解 DSPy 是如何设计出最优提示的，方法是使用语言模型的 inspect_history() 方法。

lm.inspect_history()

这个历史记录非常长，因为它包含了 DSPy 在编译过程中测试其生成提示的所有示例。历史记录的最后部分显示了模型如何以正确格式得出正确答案：

    [[ ## context ## ]]
    [1] «Orhan Pamuk | Ferit Orhan Pamuk (generally known simply as Orhan Pamuk; born 7 June 1952) is a Turkish novelist, screenwriter, academic and recipient of the 2006 Nobel Prize in Literature. One of Turkey's most prominent novelists, his work has sold over thirteen million books in sixty-three languages, making him the country's best-selling writer.»
    [2] «2006 Palanca Awards | The Carlos Palanca Memorial Awards for Literature winners in the year 2006 (rank, title of winning entry, name of author).»
    [3] «Miguel Donoso Pareja | Miguel Donoso Pareja (July 13, 1931 – March 16, 2015) was an Ecuadorian writer and 2006 Premio Eugenio Espejo Award-winner (Ecuador's National Prize in literature, given by the President of Ecuador).»
    
    [[ ## question ## ]]
    What country was the winner of the Nobel Prize in Literature in 2006 from and what was their name?
    
    Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
    
    
    [31mResponse:[0m
    
    [32m[[ ## reasoning ## ]]
    The text mentions the 2006 Nobel Prize in Literature and states that Orhan Pamuk, a Turkish novelist, was the winner.
    
    [[ ## answer ## ]]
    Turkey, Orhan Pamuk
    
    [[ ## completed ## ]][0m

您可以看到 DSPy 使用模型生成了提示：

Respond with the corresponding output fields, starting with the field [[ ## reasoning ## ]] , then [[ ## answer ## ]] , and then ending with the marker for [[ ## completed ## ]] .

这样就得出了正确的答案和框架。