Setting up retrieval augmented generation (RAG)

RAG (retrieval augmented generation) is the process of optimizing the large language model (LLM) output through prompt augmentation with additional context. When you submit a query, watsonx Code Assistant uses RAG tools to retrieve information from your code bases or documentation.

Before you begin

Before you set up RAG, ensure that you meet the following requirements:

You have cluster administrator access to enable RAG on your IBM® Software Hub instance.
You have access to the code repositories or documentation that you want to index.
You have a GitHub personal access token for accessing private repositories.

About this task

watsonx Code Assistant uses RAG to enhance response quality by retrieving relevant, up-to-date context from your code bases and documentation. This relevant context is appended to your query before it is sent to the large language model (LLM), which reduces model hallucinations and improves the accuracy of generated responses.

You can configure watsonx Code Assistant to use specific code repositories and project documentation that are stored in Git repositories. Supported documentation formats include API documents, readme files, technical and design documents, Markdown files, PDFs, Word documents, and PowerPoint presentations.

The RAG system determines which sources to include or exclude to generate responses with the most useful information. The following figure illustrates the RAG configuration workflow:

Diagram showing the RAG configuration workflow with four main steps: create OpenSearch cluster, index repositories, set up connection assets, and configure Git personal access token — Figure 1. RAG configuration workflow

Procedure

Enable RAG on your IBM Software Hub instance.
watsonx Code Assistant uses OpenSearch as the vector store for its RAG implementation. Run the following command on your OpenShift® shell to enable RAG:
```
oc patch -n $PROJECT_CPD_INST_OPERANDS wca/wca-cr --type=merge -p '{"spec":{"rag_enabled":true}}'
```
Obtain the OpenSearch instance connection details.
1. Extract the hostname by running the following command:
```
oc get -n $PROJECT_CPD_INST_OPERANDS route cpd -o jsonpath='{.spec.host}{"\n"}'
```
2. Construct the URL for OpenSearch using the following format:
```
https://<hostname>/wca-core-rag/
```
  Replace <hostname> with the IBM Software Hub hostname you got in the previous step.
3. Retrieve the username and password for the vector store by running the following command:
```
oc extract -n $PROJECT_CPD_INST_OPERANDS secret/wca-core-rag --to=-
```
  Save these credentials for use when you create connection assets.
Index your code repositories and documentation in the vector store.

You must index your repositories before watsonx Code Assistant can use them as context for RAG. For detailed instructions, see Indexing code repositories and documentation.
Create connection assets for your indexed repositories.

You must create a connection asset for each index in the vector store. The connection asset enables watsonx Code Assistant to access the indexed content. For detailed instructions, see Setting up connection assets.
Configure your GitHub personal access token in Visual Studio Code.
1. In GitHub, navigate to Settings > Developer Settings > Personal Access Tokens > Tokens (Classic).
2. Generate a new token or copy an existing personal access token.
3. In Visual Studio Code, open the Command Palette by clicking View > Command Palette.
4. Search for WCA and select Enter GitHub Personal Access Token for WCA.
5. Enter your GitHub personal access token and press Enter.
Use RAG-enabled prompts in your chat conversations.
To generate responses that use context from your indexed repositories, use the following command syntax in chat messages:
```
@repo 
```
Replace with your question or instruction.

Example:
```
@repo how is a chat message processed?
```
To use indexed documentation, use the @docs command instead:
```
@docs What are the steps to setup a connection to the user data store?
```

Results

watsonx Code Assistant uses the indexed repositories based on the following conditions:

If one repository is opened in Visual Studio Code, watsonx Code Assistant searches for context in the opened repository by default.
If multiple repositories are opened in Visual Studio Code, watsonx Code Assistant searches for context from the repository that is associated with the most recently accessed file.
When you use the @repo command, watsonx Code Assistant checks for a repo.yaml file in the indexed repository. If one or more YAML configuration files are configured, watsonx Code Assistant uses all the configured repositories to generate a response. If no YAML configuration is found, watsonx Code Assistant uses the currently selected repository.

watsonx Code Assistant uses the indexed document collections based on the following conditions:

If a document is opened in Visual Studio Code, watsonx Code Assistant searches for context in the opened document collection by default.
If multiple document collections are opened in Visual Studio Code, watsonx Code Assistant searches for context from the most recently accessed document collection.
When you use the @docs command, watsonx Code Assistant checks for a docs.yaml file in the indexed repository. If one or more YAML configuration files are configured, watsonx Code Assistant uses all the configured document collections to generate a response. If no YAML configuration is found, watsonx Code Assistant uses all documents with the docs_name prefix in your deployment space.

What to do next

Review the use case scenarios to understand how to implement RAG for different team structures and access requirements. For more information, see Use case scenarios for RAG.
Optionally, set up a YAML configuration to allow watsonx Code Assistant to search multiple repositories simultaneously or use specific indexed code repositories or documents in the vector store. For more information, see Setting up YAML configuration for RAG.