Building a retrieval-augmented generation (RAG) pattern

You can build a retrieval-augmented generation (RAG) pattern to generate factual output that is grounded in information from a knowledge base.

Required services

Some features require one or more of the following additional services:

watsonx.ai Studio
Watson Machine Learning
watsonx.data intelligence
watsonx.data integration

A RAG pattern processes content from a knowledge base into a vector format that is easy to search. When a user asks a question, the RAG pattern retrieves a set of relevant passages and provides them to the LLM. The LLM generates a factual answer to the user's question.

RAG pattern capabilities and resources

You can create a RAG pattern with the following capabilities and resources:

Vector stores: Create vector embeddings to encode the meaning of a sentence or passage as a numerical representation. Vectors provide an efficient way to find passages of text in your knowledge base that are most similar to the question that the user asks. Vectors are stored in vector databases and retrieved with a vector index.; The watsonx.data Milvus vector store is automatically set up. See Working with Milvus.; You can integrate with OpenSearch to store your documents. See Integration with OpenSearch.; You can integrate with DataStax to store your documents. See Integration with DataStax.
Encoder models for embedding: Choose from IBM and third-party encoder models for creating embeddings.; See Supported encoder foundation models.
Foundation models for inferencing: Choose from a range of foundation models.; See Foundation models that support your use case.
Retrieval service: Run data retrieval query and retrieve results from a vector database and an SQL database. See Retrieval service.
AI agent integration: Agents can access and retrieve information from document libraries, document sets, and data assets using natural language through an MCP server. See Integrating your RAG pipeline with AI agents.

Ways to work

You can work in a no-code or low-code experience with tools in the UI:

Prompt Lab: You can chat with an uploaded document or with a vector index.
Unstructured data curation: You can analyze documents, convert your documents into a simpler textual format, and extract text and other structured content.
Unstructured data integration: You can ingest, cleanse, transform, and enrich unstructured data.

The retrieval-augmented generation pattern architecture

You can scale out the technique of including context in your prompts by using information from a knowledge base.

The following diagram illustrates the retrieval-augmented generation pattern at run time. Although the diagram shows a question-answering example, the same workflow supports other use cases.

Diagram that shows adding search results that are derived from a vector store to the input for retrieval-augmented generation

The retrieval-augmented generation pattern involves the following steps:

Your knowledge base is preprocessed to convert the content to plain text and vectorize it. Preprocessing can include text extraction to convert information in tables and images into text that the LLM can interpret.
A user asks a question.
The question is converted to a text embedding.
The vector store that contains the vectorized content of the knowledge base is searched for content that is similar to the user's question.
The most relevant search results are added to the prompt in text format, along with an instruction, such as “Answer the following question by using only information from the following passages.”
The combined prompt text (instruction + search results + question) is sent to the foundation model.
The foundation model uses contextual information from the prompt to generate a factual answer.

Knowledge base

Your knowledge base can be any collection of information-containing artifacts, such as:

Process information in internal company wiki pages
Files in GitHub
Messages in a collaboration tool
Topics in product documentation, which can include long text blocks
Text passages in a database that supports structured query language (SQL) queries, such as Db2
A document store with a collection of files, such as legal contracts that are stored as PDF files
Customer support tickets in a content management system

You can optimize your RAG solution by adapting the content in your knowledge base to be more accessible to AI. See Optimizing your knowledge base for retrieval-augmented generation.

Content preprocessing

When you set up your RAG pattern, you preprocess the documents in your knowledge base. Preprocessing first converts the content to plain text. You can configure text extraction to convert information in tables and images into text that the LLM can interpret. Then, the embedding model vectorizes the text, stores it in the vector database, and creates a vector index for retrieving the content.

When a user asks a question, that text is vectorized by the embedding model.

Content retrieval and ranking

The retriever searches for the content in the vector database that is most similar to the vector embedding of the query text. The retrieved passages are ranked by similarity to the question. You can add a reranking model to your RAG pattern to evaluate those top retrieved passages for relevance to answering the question.

Answer generation

The retrieved passage, the user's question, and the instructions are sent to the foundation model in a prompt. The foundation model generates an answer and returns it to the user.