Retrieval service in watsonx.data with gen AI lakehouse experience

A retrieval service refers to a system or component that enables efficient access to relevant data or information often unstructured or semi structured from a large-scale data lake or lakehouse architecture to support AI and machine learning tasks.

With the retrieval service in watsonx.data with gen AI lakehouse experience, users can run a data retrieval query and retrieve results from two sources:

Vector database: Text chunks or passages that match the search query, ranked by relevance.
SQL database: Tabular results from SQL queries against the data lakehouse.

Users can choose to retrieve from both vectors and SQL or one of them. The service ensures that Access Control Lists (ACLs) are managed according to the source system. Also, only governed data is presented to the user, ensuring that sensitive information is protected.

Additional features include,

Prompt template suggestion: The API provides structured prompt templates to help craft effective inputs for the language model, guiding it to generate clear, intuitive, and user-friendly responses.
Prompt lab: You can use an external prompt lab to experiment with and fine-tune your prompts based on specific requirements.

A document library is the search landscape for the retrieval service API in watsonx.data. The document library is collection of document sets that are created during the ingestion and curation of unstructured data. As part of the retrieval process, two types of queries are constructed:

Text-to-SQL – where the query from the user is converted into an SQL and an SQL query is run on the data source to fetch the answer. Presto engines are used for SQL queries.
Text-to-vector (Milvus) – where the query from the user is converted into a numerical vector that captures the semantic meaning. Milvus service is used for vector queries.

Tip: Either one of them or both types can be configured for queries.

Performance is optimized through connection management, caching, parallel processing, resource management, and query optimization. Security considerations include JWT-based authentication, service-to-service authentication and authorization, data protection, and enforcement of ACLs.

How to use the retrieval service?

Applications such as, watsonx.ai, chatbots, or notebooks can use the retrieval service API, which is the public endpoint. This API provides the result context with suggested templates which can be used by any chatbots to generate the inference. The result context helps in understanding the relevance and accuracy of the output and suggested templates can be used by the chat applications to generate inferences or responses based on the search results.

Note:

For detailed information on watsonx.data Unstructured Retrieval API version v1, see API documentation.
For detailed information on watsonx.data Unstructured Retrieval API version v2, see API documentation.

To establish a secure connection with the retrieval service from your client workstation using curl or any HTTP client, provide the ca.crt, tls.crt, and tls.key files.

Complete the following steps to retrive and use the certificates:

Run the following command separatly for each of the following tls.crt, tls.key, and ca.crt files to extract the certificates from the host.
```
`oc extract secret/ibm-lh-tls-secret --to=- --keys=file name > file name`
```
Replace file name with tls.crt, tls.key, and ca.crt .
Use the certificates with curl or any HTTP client. The following example demonstrates the use of certificates with curl.

curl --cacert /tmp/certs/ca.crt --key /tmp/certs/tls.key --cert /tmp/certs/tls.crt <URL>

Replace <URL> with the endpoint of the Retrieval service, and provide the exact location for tls.crt, tls.key, and ca.crt files in the curl.

Following are the details of the API request:

POST /v1/retrieval/search: The primary endpoint for API version v1 where users (applications) send a query to retrieve relevant context.
POST /v2/retrieval/search: The primary endpoint for API version v2 where users (applications) send a query to retrieve relevant context.

Following is an example request and response:

Sample request for v1:

{
      "query": "Find all document ids",
      "document_library_id": "38bf1ea6-d395-4811-9a6f-9b022815b9e5",
      "container_id": "7482c15f-018f-42df-94b5-19b6fd05b5e3",
      "container_type": "project",
      "searchConfig": [
          {
              "type": "vector",
              "limit": 30,
              "enabled": true,
              "sub_type": "topk"
          },
          {
              "type": "sql",
              "enabled": true
          }
      ],
      "search_inference_config": {
          "enabled": false,
          "inference_model_id": "meta-llama/llama-3-3-70b-instruct"
      },
      "provide_suggested_template": true
}

Sample request for v2:

Note:

The request must include either document_library_id or document_set_id, or both asset_id and asset_type.

{
      "query": "Find all document ids",
      "asset_id": "052acf2e-d1cc-4bab-b34f-5700821b7ec3",
      "asset_type": "document_library",
      "document_set_id": "473daef4-cd2b-427b-afcb-6e9d67d85cf3",
      "document_library_id": "38bf1ea6-d395-4811-9a6f-9b022815b9e5",
      "container_id": "7482c15f-018f-42df-94b5-19b6fd05b5e3",
      "container_type": "project",
      "searchConfig": [
          {
              "type": "vector",
              "limit": 30,
              "enabled": true,
              "sub_type": "topk"
          },
          {
              "type": "sql",
              "enabled": true
          }
   ],
   "search_inference_config": {
       "enabled": false,
       "inference_model_id": "meta-llama/llama-3-3-70b-instruct"
   },
   "provide_suggested_template": true
 }

Understanding the sample requests

The following table provides details of the requests:

Request details
Field	Description	Version
query	A natural language question. In this case, the user wants to know who sent invoice number 2.	v1, v2
asset_id	Unique identifier of the target document set or library for the query.	v2 only
asset_type	Specifies the asset type. This can be either a `document_library` or a `document_set`.	v2 only
document_set_id	Unique identifier of the document set to be queried.	v2 only
document_library_id	The unique identifier of the Document library where the search is performed.	v1, v2
container_id	The ID of your project in watsonx.data.	v1, v2
container_type	Specifies the type of container, in this case, it is a project.	v1, v2
searchConfig.type	Enables vector-based semantic search using embeddings.	v1, v2
searchConfig.limit	Limits the number of vector search results.	v1, v2
searchConfig.enabled	Indicates search vector/SQL or both searches enabled.	v1, v2
searchConfig.sub_type	Specifies the vector search strategy. In this case, it is Top-K (most similar results).	v1, v2
searchConfig.type	Enables SQL-based structured search.	v1, v2
search_inference_config_enabled	Inference based search. In this example, it is disabled.	v1, v2
search_inference_config.inference_model_id	Specifies the model to use if inference-based search were enabled.	v1, v2
provide_suggested_template	Requests the API to return a suggested query template or refinement.	v1, v2

What happens when watsonx.data receives this request?

watsonx.data receives the query and metadata.
It performs:

a. SQL search using Presto.

b. Vector search using Milvus (up to 30 results).
Combines results from both the searches
Responds with relevant documents, metadata, and possibly a suggested query template.

Response

If all the required parameters are met, following is a 200 OK response for the sample request:


{
    "request_id": "e3ddc750-8e13-43a4-baaf-b480af6f1129",
    "results": [
        {
            "source": "sql",
            "data": [
                {
                    "seller_id": 12,
                    "tax_amount": 105.6
                }
            ],
            "metadata": {
                "execution_time": 6631,
                "row_count": 48,
                "sql": "SELECT seller_id, tax_amount FROM invoices"
            },
            "summary": ""
        },
        {
            "source": "vector",
            "data": [
                {
                    "dimension": "768",
                    "distance": "0.883103",
                    "document_id": "D00AB694-0000-CE1B-BEA8-B870AD41196E",
                    "document_name": "",
                    "pk": "199",
                    "text": "INFORMATION SPECIAL INSTRUCTIONS (NOTES) bury FREIGHT TAX SUB TOTAL 0.00 0.00 0.00 GLRGXX: 0.00% soa TAXABLE AMOUNT 'TOTAL 508.60 Currency: USD. [E-& O.E. Seller reserve the Fight to correct clerical errors & omissions"
                }
            ],
            "metadata": {
                "execution_time": 3171
            },
            "summary": ""
        }
    ],
    "suggested_template": {
        "template_name": "Template Llama Models",
        "template_text": "<|start_header_id|>system<|end_header_id|>\nYou are a world-class information retrieval and response agent. Analyse user query and generate accurate responses based on provided context.The context has two parts: the sql search and vector search. The sql search context provides precise data which should be considered the source of truth for any discrepancies. You can use it for natural language response\n\t\t\n\t\t##Format of context:\n\t\t{\n\t\t\"sql_search\": [\n\t\t\t{\n\t\t\t\"sql_query\": \"SQL Query\",\n\t\t\t\"context\": [\n\t\t\t\t{multiple key value pairs corresponding to row1 in sql query result},\n\t\t\t\t{multiple key value pairs corresponding to row2 in sql query result}\n\t\t\t]\n\t\t\t}\n\t\t],\n\t\t\"vector_search\": {\n\t\t\t\"context\": [\n\t\t\t\"Chunk of text top1\",\n\t\t\t\"Chunk of text top2\",\n\t\t\t\"...\",\n\t\t\t\"Chunk of text topk\"\n\t\t\t]\n\t\t}\n\t\t}\n\t\t\n\t\t\n###Context\n{\"sql_search\":[{\"context\":[{\"document_id\":\"4663a561-47b2-4308-81ef-6a0fb9c7276a\"},{\"document_id\":\"05d7f5e3-777b-4bba-afb6-b2566463a802\"},{\"document_id\":\"f91cc0ae-7140-4f7e-ae15-b66203406452\"},{\"document_id\":\"09f34660-04e7-4b1f-ac75-d511ad19dde2\"}],\"sql_query\":\"SELECT document_id FROM invoice_set_1_entities\"}],\"vector_search\":{\"context\":[\"\",\"\",\"\",\"\"]}}<|eot_id|><|start_header_id|>user<|end_header_id|>\nFind all document ids<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
        "reasoning": "Template chosen based on model group Llama Models and context type"

},
"status": "success",
"execution_time": 10337,
"status": "success",
"execution_time": 203624

}

Understanding the response

Following are the details of the response:

General fields:
Field	Description
requestId	Unique ID to track the request.
status	The query/request was processed successfully.
executionTime	Total time taken (in ms) to process the query.
results	Array of results from different sources (sql, vector).
suggestedPromptTemplates	A recommended prompt format for future queries.

SQL result:
Field	Description
source	"sql" — result from SQL-based search.
summary	"This provides the seller's ID"
data	[{"sender_ID": "12"}] — the sender ID is 12.
metadata.sql	SQL query used: "SELECT seller_id, tax_amount FROM invoices"

Vector result:
Field	Description
source	"vector" — result from vector-based semantic search.
data	Contains details about the results.
dimension	Dimensionality of the embedding space.
distance	Similarity score from vector search.
document_id	Identifier of the matched document.
pk	Primary key of the matched document.
text	Snippet of the matched document text.
execution_time	Time taken to execute the vector search in milliseconds

Suggested template result:
Field	Description
template_name	Suggested template for rendering the results.
template_text	This is the actual prompt that will be injected into the system message for the model.
reasoning	Reason why the template was used.

Suggested prompt template:

A recommended prompt format for LLM to generate responses:

Treat SQL as the source of truth.
Use vector data to enrich the response.
Present results as if from a unified system.
Format clearly using lists, tables, etc.

Limitations

Assumption is user will have only 1 milvus collection and 1 iceberg schema table for GA. This is not restricted at system level, but we will recommend user to use same for single use case.
If they use different collections/tables, it will degrade performance and not supported for GA.

Retrieval service in watsonx.data with gen AI lakehouse experience

How to use the retrieval service?

Understanding the sample requests

What happens when watsonx.data receives this request?

Response

Understanding the response

Suggested prompt template:

Limitations

Learn more