Coding an AutoAI RAG experiment with a Chroma vector store
Review the guidelines and code samples to learn how to code an AutoAI RAG experiment using the default, in-memory Chroma database as a vector store.
Storing vectorized content in a Chroma database
When you set up your AutoAI RAG experiment and don't specify a connection to a vector store, the vectorized content is saved to the default, in-memory Chroma database. The content does not persist beyond the experiment, so this is not a viable production method for deploying a RAG pattern. However, it provides a fastpath for creating a RAG pattern.
The following sections expand on the annotated sample code provided with the Automating RAG pattern with Chroma database notebook.
The notebook uses the watsonx.ai Python client library (version 1.1.11 or later).
Follow these steps to code an AutoAI RAG experiment for your use case.
Step 1: Prepare the prerequisites for preparing data and set up the experiment
Prepare the prerequisites for the experiment.
Before you use the sample code, you must perform the following setup task:
- Contact your Cloud Pak for Data administrator and ask them for your account credentials
-
Install and import the required modules and dependencies. For example:
pip install 'ibm-watsonx-ai[rag]>=1.1.11' pip install "langchain_community>=0.3,<0.4"
-
Connect to WML.
Authenticate the Watson Machine Learning service on IBM Cloud Pak for Data. You need to provide the platform
url
, yourusername
, and yourapi_key
.username = 'PASTE YOUR USERNAME HERE' api_key = 'PASTE YOUR API_KEY HERE' url = 'PASTE THE PLATFORM URL HERE'
-
Use these to initialize the client. For example:
from ibm_watsonx_ai import APIClient, Credentials credentials = Credentials( username = "username", api_key = "***********", url = "url", instance_id = "openshift", version = "5.1" ) client = APIClient(credentials)
Alternatively, you can use your
username
andpassword
to authenticate WML services.credentials = Credentials( username=***, password=***, url=***, instance_id="openshift", version="5.1" ) client = APIClient(credentials)
-
Create a project or space for your work. See Creating a project.
-
Set a default project or space. For example:
client.set.default_project("<Project ID>")
-
Prepare the grounding documents.
-
Prepare the evaluation data.
Grounding documents
Prepare and connect to the grounding documents you will use to run the RAG experiment. For details, see Getting and preparing data in a project.
- Supported formats: PDF, HTML, DOCX, Markdown, plain text
- Connect to data in a Cloud Object Storage bucket, a folder in a bucket, or specify up to 20 files.
- AutoAI uses sample of documents for running the experiment
For example, to create a data connection when documents are stored in a Cloud Object Storage bucket:
from ibm_watsonx_ai.helpers import DataConnection, S3Location
conn_meta_props= {
client.connections.ConfigurationMetaNames.NAME: f"Connection to input data - {datasource_name} ",
client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name),
client.connections.ConfigurationMetaNames.DESCRIPTION: "ibm-watsonx-ai SDK documentation",
client.connections.ConfigurationMetaNames.PROPERTIES: {
'bucket': <BUCKET_NAME>,
'access_key': <ACCESS_KEY>,
'secret_key': <SECRET_ACCESS_KEY>,
'iam_url': 'https://iam.cloud.ibm.com/identity/token',
'url': <ENDPOINT_URL>
}
}
conn_details = client.connections.create(meta_props=conn_meta_props)
cos_connection_id = client.connections.get_id(conn_details)
input_data_references = [DataConnection(
connection_asset_id=cos_connection_id,
location=S3Location(
bucket=<BACKET_NAME>,
path=<BACKET_PATH>
)
)]
The following example shows how to use the data asset created in the project (or promoted to the space).
core_api.html
is an example of a grounding document file used in the sample notebooks.
import os, wget
from ibm_watsonx_ai.helpers import DataConnection
input_data_filename = "core_api.html"
input_data_path = f"https://ibm.github.io/watsonx-ai-python-sdk/{input_data_filename}"
if not os.path.isfile(input_data_filename):
wget.download(input_data_path, out=input_data_filename)
asset_details = client.data_assets.create(input_data_filename, input_data_filename)
asset_id = client.data_assets.get_id(asset_details)
asset_id
input_data_references = [DataConnection(data_asset_id=asset_id)]
input_data_references
supports up to 20 DataConnection
instances.
Evaluation data
- Evaluation data must be in JSON format with a fixed schema with these fields:
question
,correct_answer
,correct_answer_document_ids
For example:
[
{
"question": "What is the purpose of get_token()?",
"correct_answer": "get_token() is used to retrieve an authentication token for secure API access.",
"correct_answer_document_ids": [
"core_api.html"
]
},
{
"question": "How does the delete_model() function operate?",
"correct_answer": "delete_model() method allows users to delete models they've created or managed.",
"correct_answer_document_ids": [
"core_api.html"
]
}
]
To prepare the evaluation data:
import os, wget
from ibm_watsonx_ai.helpers import DataConnection
test_data_filename = "benchmarking_data_core_api.json"
test_data_path = f"https://github.com/IBM/watsonx-ai-samples/blob/master/cloud/data/autoai_rag/{test_data_filename}"
if not os.path.isfile(test_data_filename):
wget.download(test_data_path, out=test_data_filename)
test_asset_details = client.data_assets.create(name=test_data_filename, file_path=test_data_filename)
test_asset_id = client.data_assets.get_id(test_asset_details)
test_data_references = [DataConnection(data_asset_id=test_asset_id)]
Step 2: Configure the RAG optimizer
The rag_optimizer
object provides a set of methods for working with the AutoAI RAG experiment. In this step, enter the details to define the experiment. These are the available configuration options:
Parameter | Description | Values |
---|---|---|
name | Enter a valid name | Experiment name |
description | Experiment description | Optionally describe the experiment |
embedding_models | Embedding models to try | ibm/slate-125m-english-rtrvr intfloat/multilingual-e5-large |
retrieval_methods | Retrieval methods to use | simple retrieves and ranks all relevant documentswindow retrieves and ranks a fixed number of relevant documents |
foundation_models | Foundation models to try | See Foundation models by task |
max_number_of_rag_patterns | Maximum number of RAG patterns to create | 4-20 |
optimization_metrics | Metric name(s) to use for optimization | faithfulness answer_correctness |
This sample code shows the configuration options for running the experiment with the ibm-watsonx-ai SDK documentation:
from ibm_watsonx_ai.experiment import AutoAI
experiment = AutoAI(credentials, project_id=project_id)
rag_optimizer = experiment.rag_optimizer(
name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
max_number_of_rag_patterns=5,
optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
rag_optimizer = experiment.rag_optimizer(
name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
embedding_models=["ibm/slate-125m-english-rtrvr"],
foundation_models=["ibm/granite-13b-chat-v2","mistralai/mixtral-8x7b-instruct-v01"],
max_number_of_rag_patterns=5,
optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
Step 3: Run the experiment
Run the optimizer to create the RAG patterns using the specified configuration options. In this code sample for running a Chroma experiment, the task is run in interactive mode. You can run the task in the background by changing the background_mode
to True.
run_details = rag_optimizer.run(
input_data_references=input_data_references,
test_data_references=test_data_references,
background_mode=False
)
Step 4: Review the patterns and select the best one
After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary
method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns,
ranked according to performance against the optimized metric.
summary = rag_optimizer.summary()
summary
For example, pattern results display like this:
Pattern | mean_answer_correctness | mean_faithfulness | mean_context_correctness | chunking.chunk_size | embeddings.model_id | vector_store.distance_metric | retrieval.method | retrieval.number_of_chunks | generation.model_id |
---|---|---|---|---|---|---|---|---|---|
Pattern1 | 0.6802 | 0.5407 | 1.0000 | 512 | ibm/slate-125m-english-rtrvr | euclidean | window | 5 | meta-llama/llama-3-70b-instruct |
Pattern2 | 0.7172 | 0.5950 | 1.0000 | 1024 | intfloat/multilingual-e5-large | euclidean | window | 5 | ibm/granite-13b-chat-v2 |
Pattern3 | 0.6543 | 0.5144 | 1.0000 | 1024 | intfloat/multilingual-e5-large | euclidean | simple | 5 | ibm/granite-13b-chat-v2 |
Pattern4 | 0.6216 | 0.5030 | 1.0000 | 1024 | intfloat/multilingual-e5-large | cosine | window | 5 | meta-llama/llama-3-70b-instruct |
Pattern5 | 0.7369 | 0.5630 | 1.0000 | 1024 | intfloat/multilingual-e5-large | cosine | window | 3 | mistralai/mixtral-8x7b-instruct-v01 |
Select a pattern to test locally
The next step is select a pattern and test it locally. Because Chroma is in-memory, you must recreate the document index.
In the following code sample, the index is built with the documents core_api.html
and fm_embeddings.html
.
from langchain_community.document_loaders import WebBaseLoader
best_pattern = rag_optimizer.get_pattern()
urls = [
"https://ibm.github.io/watsonx-ai-python-sdk/core_api.html",
"https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html",
]
docs_list = WebBaseLoader(urls).load()
doc_splits = best_pattern.chunker.split_documents(docs_list)
best_pattern.indexing_function(doc_splits)
Query the RAG pattern locally.
payload = {
client.deployments.ScoringMetaNames.INPUT_DATA: [
{
"values": ["How to use new approach of providing credentials to APIClient?"],
}
]
}
best_pattern.query(payload)
The model's response looks like this:
According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
credentials = Credentials(
url = "https://us-south.ml.cloud.ibm.com",
token = "***********",
)
client = APIClient(credentials)
This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.
To retrieve a specific pattern, pass the pattern number to rag_optimizer.get_pattern()
.
Get inference and indexing notebook
To download a specified inference notebook, use the get_inference_notebook()
. If you leave pattern_name
empty, the method downloads the notebook of the best computed pattern.
rag_optimizer.get_inference_notebook(pattern_name='Pattern3')
Parent topic: Automating a RAG pattern with the AutoAI SDK