Coding an AutoAI RAG experiment with text extraction

Review the guidelines and code samples to learn how to code an AutoAI RAG experiment by using watsonx text extraction to process input documents.

You can use text extraction to process input documents for an AutoAI RAG experiment. Text extraction transforms high-quality business documents with tables, images, and diagrams into markdown format. The resulting markdown files can then be used in an AutoAI RAG experiment to enhance the quality of generated patterns.

The text extraction service uses the watsonx.ai Python client library (version 1.1.11 or later). For more information about using text extraction from watsonx.ai Python SDK, see Text Extractions.

Follow these steps to use text extraction in your AutoAI RAG experiment.

Prepare the prerequisites for preparing data and set up the experiment
Process input documents with text extraction
Configure the RAG optimizer
Run the experiment
Review the patterns and select the best one

Step 1: Prepare the prerequisites for preparing data and set up the experiment

Prepare the prerequisites for the experiment.

Before you use the sample code, you must perform the following setup task:

Contact your Cloud Pak for Data administrator and ask them for your account credentials

Install and import the required modules and dependencies. For example:

pip install 'ibm-watsonx-ai[rag]>=1.1.11'
pip install "langchain_community>=0.3,<0.4"

Connect to WML.

Authenticate the Watson Machine Learning service on IBM Cloud Pak for Data. You need to provide the platform url, your username, and your api_key.
```
username = 'PASTE YOUR USERNAME HERE'
api_key = 'PASTE YOUR API_KEY HERE'
url = 'PASTE THE PLATFORM URL HERE'
```

Use these to initialize the client. For example:

from ibm_watsonx_ai import APIClient, Credentials

credentials = Credentials(
    username = "username",
    api_key = "***********",
    url = "url",
    instance_id = "openshift",
    version = "5.1"
)

client = APIClient(credentials)

Alternatively, you can use your username and password to authenticate WML services.

credentials = Credentials(
    username=***,
    password=***,
    url=***,
    instance_id="openshift",
    version="5.1"
)

client = APIClient(credentials)

Create a space for your work. See Creating a space.

Set a default space. For example:

client.set.default_space("<Space GUID>")

Prepare the grounding documents.
Prepare the evaluation data.

Grounding documents

Prepare and connect to the grounding documents that you will use to run the AutoAI RAG experiment with the text extraction service.

Create a connection to Cloud Object Storage and fetch the ID.

conn_meta_props= {
    client.connections.ConfigurationMetaNames.NAME: f"Connection to input data - {datasource_name} ",
    client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name(datasource_name),
    client.connections.ConfigurationMetaNames.DESCRIPTION: "ibm-watsonx-ai SDK documentation",
    client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': <BUCKET_NAME>,
        'access_key': <ACCESS_KEY>,
        'secret_key': <SECRET_ACCESS_KEY>,
        'iam_url': 'https://iam.cloud.ibm.com/identity/token',
        'url': <ENDPOINT_URL>
    }
}

conn_details = client.connections.create(meta_props=conn_meta_props)
cos_connection_id = client.connections.get_id(conn_details)

Prepare two connection assets, one for input and one for the text extraction service output.

from ibm_watsonx_ai.helpers import DataConnection, S3Location

input_data_reference = DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(
        bucket=<BUCKET_NAME>,
        path=<TEXT EXTRACTION INPUT FILENAME>
    ),
)
input_data_reference.set_client(client)

result_data_reference = DataConnection(
    connection_asset_id=cos_connection_id, 
    location=S3Location(
        bucket=<BUCKET_NAME>,
        path=<TEXT EXTRACTION OUTPUT FILENAME>
    )
)
result_data_reference.set_client(client)

Evaluation data

For evalutation data input:

Data must be in the JSON format with a fixed schema with these fields: question, correct_answer, correct_answer_document_ids
correct_answer_document_ids must refer to the text extraction service output file

benchmarking_data = [
     {
        "question": "What are the two main variants of Granite Code models?",
        "correct_answer": "The two main variants are Granite Code Base and Granite Code Instruct.",
        "correct_answer_document_ids": <TEXT EXTRACTION OUTPUT FILENAME>
     },
     {
        "question": "What is the purpose of Granite Code Instruct models?",
        "correct_answer": "Granite Code Instruct models are finetuned for instruction-following tasks using datasets like CommitPack, OASST, HelpSteer, and synthetic code instruction datasets, aiming to improve reasoning and instruction-following capabilities.",
        "correct_answer_document_ids": <TEXT EXTRACTION OUTPUT FILENAME>
     },
     {
        "question": "What is the licensing model for Granite Code models?",
        "correct_answer": "Granite Code models are released under the Apache 2.0 license, ensuring permissive and enterprise-friendly usage.",
        "correct_answer_document_ids": <TEXT EXTRACTION OUTPUT FILENAME>
     },
]

import os

test_filename = "benchmark.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data, json_file, indent=4)

cos_client.upload_file(test_filename, cos_bucket_name, test_filename)

test_data_reference = DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(bucket=cos_bucket_name, path=test_filename),
)
test_data_reference.set_client(client)

test_data_references = [test_data_reference]

Step 2: Process input documents with Text Extraction

Initialize the text extraction service.

from ibm_watsonx_ai.foundation_models.extractions import TextExtractions

extraction = TextExtractions(
    credentials=credentials,
    space_id=<Space GUID>,
)

Run the text extraction job.

from ibm_watsonx_ai.metanames import TextExtractionsMetaNames

response = extraction.run_job(
    document_reference=input_data_reference,
    results_reference=result_data_reference,
    steps={
        TextExtractionsMetaNames.OCR: {
            "process_image": True,
            "languages_list": ["en"],
        },
        TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True},
    },
    results_format="markdown",
)

job_id = response['metadata']['id']

Get job details.
```
extraction.get_job_details(job_id)
```
When the status is completed, move to the next step.

Step 3: Configure the RAG optimizer

The rag_optimizer object provides a set of methods for working with the AutoAI RAG experiment. In this step, enter the details to define the experiment. These are the available configuration options:

Parameter	Description	Values
name	Enter a valid name	Experiment name
description	Experiment description	Optionally describe the experiment
embedding_models	Embedding models to try	`ibm/slate-125m-english-rtrvr` `intfloat/multilingual-e5-large`
retrieval_methods	Retrieval methods to use	`simple` retrieves and ranks all relevant documents `window` retrieves and ranks a fixed number of relevant documents
foundation_models	Foundation models to try	See Foundation models by task
max_number_of_rag_patterns	Maximum number of RAG patterns to create	4-20
optimization_metrics	Metric name(s) to use for optimization	`faithfulness` `answer_correctness`

This sample code shows the configuration options for running the experiment with the ibm-watsonx-ai SDK documentation:

from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Tip: You can modify the configuration by using supported values as described in the configuration table.

rag_optimizer = experiment.rag_optimizer(
    name='DEMO - AutoAI RAG ibm-watsonx-ai SDK documentation',
    description="AutoAI RAG experiment grounded with the ibm-watsonx-ai SDK documentation",
    embedding_models=["ibm/slate-125m-english-rtrvr"],
    foundation_models=["ibm/granite-13b-chat-v2","mistralai/mixtral-8x7b-instruct-v01"],
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)

Step 4: Run the experiment

Run the optimizer to create the RAG patterns by using the specified configuration options. Use the output from text extraction as input in your AutoAI RAG experiment.

In this code sample for running a Chroma experiment, the task is run in interactive mode. You can run the task in the background by changing the background_mode to True.

input_data_references = [result_data_reference]

rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False
)

Step 5: Review the patterns and select the best one

After the AutoAI RAG experiment completes successfully, you can review the patterns. Use the summary method to list completed patterns and evaluation metrics information in the form of a Pandas DataFrame so you can review the patterns, ranked according to performance against the optimized metric.

summary = rag_optimizer.summary()
summary

For example, pattern results display like this:

Pattern	mean_answer_correctness	mean_faithfulness	mean_context_correctness	chunking.chunk_size	embeddings.model_id	vector_store.distance_metric	retrieval.method	retrieval.number_of_chunks	generation.model_id
Pattern1	0.6802	0.5407	1.0000	512	ibm/slate-125m-english-rtrvr	euclidean	window	5	meta-llama/llama-3-70b-instruct
Pattern2	0.7172	0.5950	1.0000	1024	intfloat/multilingual-e5-large	euclidean	window	5	ibm/granite-13b-chat-v2
Pattern3	0.6543	0.5144	1.0000	1024	intfloat/multilingual-e5-large	euclidean	simple	5	ibm/granite-13b-chat-v2
Pattern4	0.6216	0.5030	1.0000	1024	intfloat/multilingual-e5-large	cosine	window	5	meta-llama/llama-3-70b-instruct
Pattern5	0.7369	0.5630	1.0000	1024	intfloat/multilingual-e5-large	cosine	window	3	mistralai/mixtral-8x7b-instruct-v01

Select a pattern to test locally

The next step is to select a pattern and test it locally.

best_pattern = rag_optimizer.get_pattern()

Query the RAG pattern locally.

payload = {
    client.deployments.ScoringMetaNames.INPUT_DATA: [
        {
            "values": ["How to use new approach of providing credentials to APIClient?"],
        }
    ]
}

best_pattern.query(payload)

The model's response looks like this:

According to the document, the new approach to provide credentials to APIClient is by using the Credentials class. Here's an example:


from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials

credentials = Credentials(
                   url = "https://us-south.ml.cloud.ibm.com",
                   token = "***********",
                  )

client = APIClient(credentials)


This replaces the old approach of passing a dictionary with credentials to the APIClient constructor.

Tip:

To retrieve a specific pattern, pass the pattern number to rag_optimizer.get_pattern().

Get an inference and indexing notebook

To download a specified inference notebook, use the get_inference_notebook(). If you leave pattern_name empty, the method downloads the notebook of the best computed pattern.

rag_optimizer.get_inference_notebook(pattern_name='Pattern3')

For more information and code samples, refer to the AutoAI RAG with watsonx Text Extraction service notebook.

Parent topic: Automating the search for a RAG pattern