Coding an AutoAI RAG experiment with a Milvus vector store

Review the guidelines and code samples to learn how to code an AutoAI RAG experiment with a Milvus database as a vector store.

Important: This feature is a beta release. It is not intended for production use.

For an enterprise or production RAG solution, set up a vector database with Milvus. The vectorized content persists for future patterns and integrations. For details, see Working with Milvus.

The following sections expand on the annotated sample code provided with the Automating RAG pattern with Milvus database notebook.

The notebook uses the watsonx.ai Python client library (version 1.1.11 or later).

Follow these steps to code an AutoAI RAG experiment for your use case.

  1. Prepare the prerequisites for preparing data and set up the experiment
  2. Configure the RAG experiment
  3. Run the experiment
  4. Review the patterns and select the best one
  5. Deploy the pattern

Step 1: Prepare the prerequisites for preparing data and set up the experiment

Prepare the prerequisites for the experiment.

  • Install and import the required modules and dependencies. For example:
pip install 'ibm-watsonx-ai[rag]>=1.1.11'
pip install langchain-community==0.2.4
from ibm_watsonx_ai import APIClient, Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.mydomain.com",
                   api_key = "***********"
                  )

client = APIClient(credentials)
client.set.default_project("<Project ID>")
client.set.default_space("<Space GUID>")
  • Prepare the grounding documents
  • Prepare the evaluation data

Grounding documents

Prepare and connect to the grounding documents you use to run the RAG experiment. For details, see Getting and preparing data in a project.

  • Supported formats: PDF, HTML, DOCX, plain text
  • Connect to data in a Cloud Object Storage bucket, a folder in a bucket, or specify up to 20 files.
  • AutoAI samples the documentations for running the experiment

For example, to create a data connection when documents are stored in a Cloud Object Storage bucket:

from ibm_watsonx_ai.helpers import DataConnection, S3Location

conn_meta_props= {
    client.connections.ConfigurationMetaNames.NAME: f"Connection to input data - {datasource_name} ",
    client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name),
    client.connections.ConfigurationMetaNames.DESCRIPTION: "ibm-watsonx-ai SDK documentation",
    client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': <BUCKET_NAME>,
        'access_key': <ACCESS_KEY>,
        'secret_key': <SECRET_ACCESS_KEY>,
        'iam_url': 'https://iam.cloud.ibm.com/identity/token',
        'url': <ENDPOINT_URL>
    }
}

conn_details = client.connections.create(meta_props=conn_meta_props)
cos_connection_id = client.connections.get_id(conn_details)

input_data_references = [DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(
        bucket=<BACKET_NAME>,
        path=<BACKET_PATH>
    )
)]

The following example demonstrates how to use the data asset created in the project (or promoted to the space):

Note:

core_api.html is an example of a grounding document file used in the sample notebooks.

import os, wget
from ibm_watsonx_ai.helpers import DataConnection

input_data_filename = "core_api.html"
input_data_path = f"https://ibm.github.io/watsonx-ai-python-sdk/{input_data_filename}"

if not os.path.isfile(input_data_filename): 
    wget.download(input_data_path, out=input_data_filename)
    
asset_details = client.data_assets.create(input_data_filename, input_data_filename)
asset_id = client.data_assets.get_id(asset_details)

input_data_references = [DataConnection(data_asset_id=asset_id)]
Tip:

input_data_references supports up to 20 DataConnection instances.

Evaluation data

Evaluation data must be in JSON format with a fixed schema with these fields: question, correct_answer, correct_answer_document_ids

For example:

[
    {
        "question": "What is the purpose of get_token()?",
        "correct_answer": "get_token() is used to retrieve an authentication token for secure API access.",
        "correct_answer_document_ids": [
            "core_api.html"
        ]
    },
    {
        "question": "How does the delete_model() function operate?",
        "correct_answer": "delete_model() method allows users to delete models they've created or managed.",
        "correct_answer_document_ids": [
            "core_api.html"
        ]
    }
]

To prepare the evaluation data:

import os, wget
from ibm_watsonx_ai.helpers import DataConnection

test_data_filename = "benchmarking_data_core_api.json"
test_data_path = f"https://github.com/IBM/watson-machine-learning-samples/raw/master/cloud/data/autoai_rag/{test_data_filename}"

if not os.path.isfile(test_data_filename): 
    wget.download(test_data_path, out=test_data_filename)

test_asset_details = client.data_assets.create(name=test_data_filename, file_path=test_data_filename)
test_asset_id = client.data_assets.get_id(test_asset_details)

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

Connecting to a Milvus vector database

This code snippet demonstrates how to connect to a Milvus vector database.

Note: If you have an existing Milvus connection, you do not need to create the connection.
from ibm_watsonx_ai.helpers import DataConnection

milvus_data_source_type_id = client.connections.get_datasource_type_uid_by_name("milvus")
details = client.connections.create(
    {
        client.connections.ConfigurationMetaNames.NAME: "Milvus Connection",
        client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: milvus_data_source_type_id,
        client.connections.ConfigurationMetaNames.PROPERTIES: {
            "host": <PASTE MILVUS HOST HERE>,
            "port": <PASTE MILVUS PORT HERE>,
            "username": <PASTE MILVUS USERNAME HERE>,
            "password": <PASTE MILVUS PASSWORD HERE>,
            "ssl": True,
        },
    }
)

milvus_connection_id = client.connections.get_id(details)
vector_store_references = [DataConnection(connection_asset_id=milvus_connection_id)]

Step 2: Configure the RAG optimizer

The rag_optimizer object provides a set of methods for working with the AutoAI RAG experiment. In this step, enter the details to define the experiment. The available configuration options are as follows:

Parameter Description Values
name Enter a valid name Experiment name
description Experiment description Optionally describe the experiment
embedding_models Embedding models to try ibm/slate-125m-english-rtrvr
intfloat/multilingual-e5-large
retrieval_methods Retrieval methods to use simple retrieves and ranks all relevant documents
window retrieves and ranks a fixed number of relevant documents
foundation_models Foundation models to try See Foundation models by task
max_number_of_rag_patterns Maximum number of RAG patterns to create 4-20
optimization_metrics Metric name(s) to use for optimization faithfulness
answer_correctness

The following sample code shows the configuration options for running the experiment with the ibm-watsonx-ai SDK documentation:

from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
Tip: You can modify the configuration by using supported values as described in the configuration table.
rag_optimizer = experiment.rag_optimizer(
    name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
    embedding_models=["ibm/slate-125m-english-rtrvr"],
    foundation_models=["ibm/granite-13b-chat-v2","mistralai/mixtral-8x7b-instruct-v01"],
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Step 3: Run the experiment

Run the optimizer to create the RAG patterns by using the specified configuration options. In this code sample, the task is run in interactive mode. You can run the task in the background by changing the background_mode to True.

run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    vector_store_references=vector_store_references,
    background_mode=False
)

Step 4: Review the patterns and select the best one

After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns, ranked according to performance against the optimized metric.

summary = rag_optimizer.summary()
summary

For example, pattern results display like this:

Pattern mean_answer_correctness mean_faithfulness mean_context_correctness chunking.chunk_size embeddings.model_id vector_store.distance_metric retrieval.method retrieval.number_of_chunks generation.model_id
Pattern1 0.6802 0.5407 1.0000 512 ibm/slate-125m-english-rtrvr euclidean window 5 meta-llama/llama-3-70b-instruct
Pattern2 0.7172 0.5950 1.0000 1024 intfloat/multilingual-e5-large euclidean window 5 ibm/granite-13b-chat-v2
Pattern3 0.6543 0.5144 1.0000 1024 intfloat/multilingual-e5-large euclidean simple 5 ibm/granite-13b-chat-v2
Pattern4 0.6216 0.5030 1.0000 1024 intfloat/multilingual-e5-large cosine window 5 meta-llama/llama-3-70b-instruct
Pattern5 0.7369 0.5630 1.0000 1024 intfloat/multilingual-e5-large cosine window 3 mistralai/mixtral-8x7b-instruct-v01

Select a pattern to test locally

The next step is to select a pattern and test it locally.

best_pattern = rag_optimizer.get_pattern()
payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": ["How to use new approach of providing credentials to APIClient?"],
        }
    ]
}

resp = best_pattern.query(payload)
print(resp["predictions"][0]["values"][0][0])

Model's response:

According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:


from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)


This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.

Tip:

To retrieve a specific pattern, pass the pattern name to rag_optimizer.get_pattern().

Step 5: Deploy a pattern

After you test a pattern locally, you can deploy the pattern to get the endpoint and include it in apps. Deployment is done by storing the defined RAG function, then creating a deployed asset. For more information on deployments, see Deploying and managing AI assets and Online deployments.

To create the deployment:

deployment_details = best_pattern.deploy(
    name="AutoAI RAG deployment - ibm_watsonx_ai documentataion",
    space_id=space_id
)

Retrieve the deployment ID for the deployed asset.

deployment_id = client.deployments.get_id(deployment_details)
deployment_scoring_href = client.deployments.get_scoring_href(deployment_details)
print(deployment_scoring_href)

The RAG service is now deployed in a space and available to test.

Testing the deployed pattern

This code sample demonstrates how to test the deployed solution. Enter test questions in the payload, by using the following format:

questions = ["How to use new approach of providing credentials to APIClient?"]

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": questions,
            "access_token": client.service_instance._get_token()
        }
    ]
}

resp = client.deployments.score(deployment_id, payload)
print(resp["predictions"][0]["values"][0][0])

Model's response:

According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:


from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)


This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.

Reviewing experiment results in Cloud Object Storage.

If the final status of the experiment is failed or error, use rag_optimizer.get_logs() or refer to experiment results to understand what went wrong. Experiment results and logs are stored in the default Cloud Object Storage instance that is linked to your account. By default, results are saved in the default_autoai_rag_out directory.

Results are organized by pattern. For example:

|-- Pattern1
|      | -- evaluation_results.json
|      | -- indexing_notebook.ipynb (Milvus)
|      | -- inference_notebook.ipynb (Milvus)
|-- Pattern2
|    ...
|-- training_status.json

Each pattern contains these results:

  • The evaluation_results.json file contains evaluation results for each benchmark question.
  • The indexing_notebook.ipynb contains the python code for building a vector database index. It introduces commands for retrieving data, chunking, and embeddings creation.
  • The inference_notebook.ipynb notebook focuses on retrieving relevant passages from a knowledge base for user queries and generating responses by feeding the retrieved passages into a large language model.

You can review the notebooks or run them by adding authentication credentials.

Note:

The results notebook indexing_notebook.ipynb contains the code for embedding and indexing the documents. You can accelerate the document indexing task by changing vector_store.add_documents() to vector_store.add_documents_async().

Get inference and indexing notebooks

To download the specified inference notebook from Service, use get_inference_notebook(). If you leave pattern_name empty, the method downloads the notebook for the highest-ranked pattern.

rag_optimizer.get_inference_notebook(pattern_name='Pattern3')

Next steps

  • Run the inference notebook with new questions to use the selected RAG pattern.
  • Use the indexed documents from this experiment in the Prompt Lab to ground prompts for a foundation model. See Using an AutoAI Rag index to chat with documents.

Parent topic: Automating a RAG pattern with the AutoAI SDK