Coding an AutoAI RAG experiment with a Chroma vector store

Review the guidelines and code samples to learn how to code an AutoAI RAG experiment using the default, in-memory Chroma database as a vector store.

Storing vectorized content in a Chroma database

When you set up your AutoAI RAG experiment and don't specify a connection to a vector store, the vectorized content is saved to the default, in-memory Chroma database. The content does not persist beyond the experiment, so this is not a viable production method for deploying a RAG pattern. However, it provides a fastpath for creating a RAG pattern.

The following sections expand on the annotated sample code provided with the Automating RAG pattern with Chroma database notebook.

The notebook uses the watsonx.ai Python client library (version 1.1.11 or later).

Follow these steps to code an AutoAI RAG experiment for your use case.

Prepare the prerequisites for preparing data and set up the experiment
Configure the RAG experiment
Run the experiment
Review the patterns and select the best one

Step 1: Prepare the prerequisites for preparing data and set up the experiment

Prepare the prerequisites for the experiment.

Install and import the required modules and dependencies. For example:
```
pip install 'ibm-watsonx-ai[rag]>=1.3.33'
pip install langchain-community==0.2.4
```
- Add task credentials. See Adding task credentials.
- Add the watsonx.ai Runtime service. See Creating services.
- Enter your API key. See Managing the user API key.

Use these to initialize the client. For example:

from ibm_watsonx_ai import APIClient, Credentials

credentials = Credentials(
                url = "https://us-south.ml.cloud.mydomain.com",
                api_key = "***********"
                )

client = APIClient(credentials)

Create a project or space for your work. See Creating a project or Creating a space.
Get the ID for the project or space. See Finding the project ID.
Set a default project or space:

client.set.default_project("<Project ID>")

client.set.default_space("<Space GUID>")

Grounding documents

Prepare and connect to the grounding documents you will use to run the RAG experiment. For details, see Getting and preparing data in a project.

Supported formats: PDF, HTML, DOCX, Markdown, plain text
Connect to data in a Cloud Object Storage bucket, a folder in a bucket, or specify up to 20 files.
AutoAI uses sample of documents for running the experiment

For example, to create a data connection when documents are stored in a Cloud Object Storage bucket:

from ibm_watsonx_ai.helpers import DataConnection, S3Location

conn_meta_props= {
    client.connections.ConfigurationMetaNames.NAME: f"Connection to input data - {datasource_name} ",
    client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name),
    client.connections.ConfigurationMetaNames.DESCRIPTION: "ibm-watsonx-ai SDK documentation",
    client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': <BUCKET_NAME>,
        'access_key': <ACCESS_KEY>,
        'secret_key': <SECRET_ACCESS_KEY>,
        'iam_url': 'https://iam.cloud.ibm.com/identity/token',
        'url': <ENDPOINT_URL>
    }
}

conn_details = client.connections.create(meta_props=conn_meta_props)
cos_connection_id = client.connections.get_id(conn_details)

input_data_references = [DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(
        bucket=<BACKET_NAME>,
        path=<BACKET_PATH>
    )
)]

The following example shows how to use the data asset created in the project (or promoted to the space).

Note:

core_api.html is an example of a grounding document file used in the sample notebooks.

import os, wget
from ibm_watsonx_ai.helpers import DataConnection

input_data_filename = "core_api.html"
input_data_path = f"https://ibm.github.io/watsonx-ai-python-sdk/{input_data_filename}"

if not os.path.isfile(input_data_filename):
    wget.download(input_data_path, out=input_data_filename)

asset_details = client.data_assets.create(input_data_filename, input_data_filename)
asset_id = client.data_assets.get_id(asset_details)
asset_id

input_data_references = [DataConnection(data_asset_id=asset_id)]

Tip:

input_data_references supports up to 20 DataConnection instances.

Evaluation data

Evaluation data must be in JSON format with a fixed schema with these fields: question, correct_answer, correct_answer_document_ids

For example:

[
    {
        "question": "What is the purpose of get_token()?",
        "correct_answer": "get_token() is used to retrieve an authentication token for secure API access.",
        "correct_answer_document_ids": [
            "core_api.html"
        ]
    },
    {
        "question": "How does the delete_model() function operate?",
        "correct_answer": "delete_model() method allows users to delete models they've created or managed.",
        "correct_answer_document_ids": [
            "core_api.html"
        ]
    }
]

To prepare the evaluation data:

import os, wget
from ibm_watsonx_ai.helpers import DataConnection

test_data_filename = "benchmarking_data_core_api.json"
test_data_path = f"https://github.com/IBM/watsonx-ai-samples/blob/master/cloud/data/autoai_rag/{test_data_filename}"

if not os.path.isfile(test_data_filename):
    wget.download(test_data_path, out=test_data_filename)

test_asset_details = client.data_assets.create(name=test_data_filename, file_path=test_data_filename)
test_asset_id = client.data_assets.get_id(test_asset_details)

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

Step 2: Configure the RAG optimizer

The rag_optimizer object provides a set of methods for working with the AutoAI RAG experiment. In this step, enter the details to define the experiment. These are the available configuration options:

Parameter	Description	Values
name	Enter a valid name	Experiment name
description	Experiment description	Optionally describe the experiment
embedding_models	Embedding models to try	`ibm/slate-125m-english-rtrvr` `intfloat/multilingual-e5-large`
retrieval_methods	Retrieval methods to use	`simple` retrieves and ranks all relevant documents `window` retrieves and ranks a fixed number of relevant documents
foundation_models	Foundation models to try	See Foundation models by task
max_number_of_rag_patterns	Maximum number of RAG patterns to create	4-20
optimization_metrics	Metric name(s) to use for optimization	`faithfulness` `answer_correctness`

This sample code shows the configuration options for running the experiment with the ibm-watsonx-ai SDK documentation:

from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Tip: You can modify the configuration using supported values as described in the configuration table.

rag_optimizer = experiment.rag_optimizer(
    name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
    embedding_models=["ibm/slate-125m-english-rtrvr"],
    foundation_models=["ibm/granite-13b-chat-v2","mistralai/mixtral-8x7b-instruct-v01"],
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Step 3: Run the experiment

Run the optimizer to create the RAG patterns using the specified configuration options. In this code sample for running a Chroma experiment, the task is run in interactive mode. You can run the task in the background by changing the background_mode to True.

run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False
)

Step 4: Review the patterns and select the best one

After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns, ranked according to performance against the optimized metric.

summary = rag_optimizer.summary()
summary

For example, pattern results display like this:

Pattern	mean_answer_correctness	mean_faithfulness	mean_context_correctness	chunking.chunk_size	embeddings.model_id	vector_store.distance_metric	retrieval.method	retrieval.number_of_chunks	generation.model_id
Pattern1	0.6802	0.5407	1.0000	512	ibm/slate-125m-english-rtrvr	euclidean	window	5	meta-llama/llama-3-70b-instruct
Pattern2	0.7172	0.5950	1.0000	1024	intfloat/multilingual-e5-large	euclidean	window	5	ibm/granite-13b-chat-v2
Pattern3	0.6543	0.5144	1.0000	1024	intfloat/multilingual-e5-large	euclidean	simple	5	ibm/granite-13b-chat-v2
Pattern4	0.6216	0.5030	1.0000	1024	intfloat/multilingual-e5-large	cosine	window	5	meta-llama/llama-3-70b-instruct
Pattern5	0.7369	0.5630	1.0000	1024	intfloat/multilingual-e5-large	cosine	window	3	mistralai/mixtral-8x7b-instruct-v01

Select a pattern to test locally

The next step is select a pattern and test it locally. Because Chroma is in-memory, you must recreate the document index.

Tip:

In the following code sample, the index is built with the documents core_api.html and fm_embeddings.html.

from langchain_community.document_loaders import WebBaseLoader

best_pattern = rag_optimizer.get_pattern()

urls = [
    "https://ibm.github.io/watsonx-ai-python-sdk/core_api.html",
    "https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)
best_pattern.indexing_function(doc_splits)

Query the RAG pattern locally.

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": ["How to use new approach of providing credentials to APIClient?"],
        }
    ]
}

best_pattern.query(payload)

The model's response looks like this:

According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:


from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)


This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.

Tip:

To retrieve a specific pattern, pass the pattern number to rag_optimizer.get_pattern().

Reviewing experiment results in Cloud Object Storage

If the final status of the experiment is failed or error, use rag_optimizer.get_logs() or refer to experiment results to understand what went wrong. Experiment results and logs are stored in the default Cloud Object Storage instance linked to your account. By default, results are saved in the default_autoai_rag_out directory.

Results are organized by pattern. For example:

|-- Pattern1
|      | -- evaluation_results.json
|      | -- indexing_inference_notebook.ipynb (Chroma)
|-- Pattern2
|    ...
|-- training_status.json

Each pattern contains these results:

The evaluation_results.json file contains evaluation results for each benchmark question.
The indexing_inference_notebook.ipynb contains the python code for building vector database index as well as building retrieval and generation function. The notebook introduces commands for retrieving data, chunking, and embeddings creation as well as for retrieving chunks, building prompts and generating answers.

Note:

The results notebook indexing_notebook.ipynb contains the code for embedding and indexing the documents. You can accelerate the document indexing task by changing vector_store.add_documents() to vector_store.add_documents_async().

Get inference and indexing notebook

To download a specified inference notebook, use the get_inference_notebook(). If you leave pattern_name empty, the method downloads the notebook of the best computed pattern.

rag_optimizer.get_inference_notebook(pattern_name='Pattern3')