Retrieval-augmented generation (RAG) connects an LLM with an external source of data to expand its knowledge base. When a user submits a query, the RAG system searches the paired database for relevant information, then combines that with the query to give the LLM more context when generating a response.

RAG uses embeddings to transform a database, source code or other information in a searchable vector database. Embeddings mathematically plot each data point in a three-dimensional vector space. To find relevant data, the information retrieval model in a RAG system converts user queries into embeddings and locates similar embeddings in the vector database.

RAG systems typically follow the same standard sequence:

Prompting: The user submits a prompt into the user interface, such as an AI-powered chatbot. Querying: An information retrieval model converts the prompt into an embedding and queries the database for similar data. Retrieval: The retrieval model retrieves the relevant data from the database. Generation: The RAG system combines the retrieved data with the user’s query and submits it to the LLM, which generates a response. Delivery: The RAG system returns the generated response to the user.

RAG gets its name because of the way RAG systems retrieve relevant data and use it to augment the LLM’s generated response. More complex RAG systems introduce additional components to refine the process and further enhance response quality.