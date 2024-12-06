Retrieval-augmented generation (RAG) is an architecture that optimizes the output of a large language model using references from an authoritative knowledge base. This augments the training data with verified sources before the language model generates a response. LLMs are trained on large corpuses and use billions of parameters to generate output, but they may not be able to access up to date or accurate information from their training corpuses. RAG extends the already powerful capabilities of LLMs to a specific domain without requiring that the model be retrained. It's a powerful and potentially cost-effective way to improve the outputs of LLMs so they remain relevant, accurate and useful in various contexts.

In DSPy, you use a RAG architecture by adding a context step in the Signature. This step gathers context from the retrieval model and adds it into the prompt to the language model to hopefully prompt a better response.

class GenerateAnswer(dspy.Signature):

"""Answer questions with short factoid answers."""



context = dspy.InputField(desc="may contain relevant facts")

question = dspy.InputField()

answer = dspy.OutputField(desc="often between 1 and 5 words")

That new GenerateAnswer signature can be used with your RAG model. You pass the GenerateAnswer to the `ChainOfThought` module so that the context retrieved and the question and answer use a Chain of Thought approach.

You also update the forward method in order to generate context passages from the RAG and use those contextual passages to generate answers. DSPy will call this `forward` method each time it generates a new answer in response to a question, gathering both context from the ColBERT Wiki 17 abstracts dataset and then passing that context to the language model, in this case, Llama 3.1. As each answer is generated, DSPy will compare the output to the desired output to ensure that the prompts are helping the model generate the correct responses.

class RAG(dspy.Module):

def __init__(self, num_passages=3):

super().__init__()



self.retrieve = dspy.Retrieve(k=num_passages)

self.generate_answer = dspy.ChainOfThought(GenerateAnswer)



def forward(self, question):

context = self.retrieve(question).passages

prediction = self.generate_answer(context=context, question=question)

return dspy.Prediction(context=context, answer=prediction.answer)

In order to help DSPy engineer the best prompts for us, you need a test dataset that it can use to test prompts and then evaluate them.

To give DSPy test questions, you'll load the HotPotQA dataset. HotpotQA is a question answering dataset featuring natural multi-hop questions that require multiple retrievals and inferences in order to arrive at the correct answer. It's a great tool for testing how well models generate supporting facts to train and test more explainable question answering systems.

For instance, one question from the dataset is: "Who did President Franklin Roosevelt appoint that was responsible to transmit votes of the Electoral College to Congress?" You can see that this question requires several pieces of information to answer correctly.

The answer is: "Robert Digges Wimberly Connor".

The supporting context comes from Wikipedia pages about Robert Digges Wimberly Connor and about the National Archives and Records Administration.

HotPotQA is collected and published by a team of NLP researchers at Carnegie Mellon University, Stanford University and Universite de Montreal. More information about HotPotQA is available at their GitHub site.

After you load the dataset, split it into train and test sets. This enables you to test the retrieval chain and help DSPy locate the best prompts for the language model.

# Load the dataset.

dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)



# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.

trainset = [x.with_inputs('question') for x in dataset.train]

devset = [x.with_inputs('question') for x in dataset.dev]

Next, you'll bootstrap more examples in order to give DSPy more opportunities to generate prompts and evaluate them. Calling compile is what uses all the architecture that you've configured as well as the HotPotQA dataset to generate and test prompts and get the best performance from your language model.

from dspy.teleprompt import BootstrapFewShot



# Validation logic: check that the predicted answer is correct.

# Also check that the retrieved context does actually contain that answer.

def validate_context_and_answer(example, pred, trace=None):

answer_EM = dspy.evaluate.answer_exact_match(example, pred)

answer_PM = dspy.evaluate.answer_passage_match(example, pred)

return answer_EM and answer_PM



# Set up a basic DSPy optimizer, which will compile your RAG program.

bfs_optimizer = BootstrapFewShot(metric=validate_context_and_answer)



# Compile!

compiled_rag = bfs_optimizer.compile(RAG(), trainset=trainset)

Now that DSPy has done prompt engineering for you, you'll test it with the custom question about the 2006 Nobel Prize that you used before. Because the retrieval model is using Wikipedia extracts from 2017, it will perform the best with knowledge that might be present in that corpus:

# Get the prediction. This contains `pred.context` and `pred.answer`.

pred = compiled_rag(test_question)



# Print the contexts and the answer.

print(f"Question: {test_question}")

print(f"Predicted Answer: {pred.answer}")

Now you get back the correct answer.

Question: What country was the winner of the Nobel Prize in Literature in 2006 from and what was their name? Predicted Answer: Turkey, Orhan Pamuk

Orhan Pamuk is from Turkey so this answer is correct. The compiled version of DSPy not only got the answer correct but also framed it correctly, replying with a short and clear response. Let's see the context for this predicted response to see how the model arrived at the correct answer:

pred.context

This returns:

["Orhan Pamuk | Ferit Orhan Pamuk (generally known simply as Orhan Pamuk; born 7 June 1952) is a Turkish novelist, screenwriter, academic and recipient of the 2006 Nobel Prize in Literature. One of Turkey's most prominent novelists, his work has sold over thirteen million books in sixty-three languages, making him the country's best-selling writer.", '2006 Palanca Awards | The Carlos Palanca Memorial Awards for Literature winners in the year 2006 (rank, title of winning entry, name of author).', "Miguel Donoso Pareja | Miguel Donoso Pareja (July 13, 1931 – March 16, 2015) was an Ecuadorian writer and 2006 Premio Eugenio Espejo Award-winner (Ecuador's National Prize in literature, given by the President of Ecuador)."]

The answer is in the first chunk of context returned. You can see how DSPy engineered optimal prompts by looking at the history of the language model using the inspect_history() method of the language model.

lm.inspect_history()

This history is very long since it includes all of the examples from the compiling process where DSPy tested its generated prompts. The last part of the history shows how the model arrived at the right answer and in the correct format:

[[ ## context ## ]] [1] «Orhan Pamuk | Ferit Orhan Pamuk (generally known simply as Orhan Pamuk; born 7 June 1952) is a Turkish novelist, screenwriter, academic and recipient of the 2006 Nobel Prize in Literature. One of Turkey's most prominent novelists, his work has sold over thirteen million books in sixty-three languages, making him the country's best-selling writer.» [2] «2006 Palanca Awards | The Carlos Palanca Memorial Awards for Literature winners in the year 2006 (rank, title of winning entry, name of author).» [3] «Miguel Donoso Pareja | Miguel Donoso Pareja (July 13, 1931 – March 16, 2015) was an Ecuadorian writer and 2006 Premio Eugenio Espejo Award-winner (Ecuador's National Prize in literature, given by the President of Ecuador).» [[ ## question ## ]] What country was the winner of the Nobel Prize in Literature in 2006 from and what was their name? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. [31mResponse:[0m [32m[[ ## reasoning ## ]] The text mentions the 2006 Nobel Prize in Literature and states that Orhan Pamuk, a Turkish novelist, was the winner. [[ ## answer ## ]] Turkey, Orhan Pamuk [[ ## completed ## ]][0m

You can see that DSPy used the model to generate the prompt:

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

This leads to the correct answer and framing.