Retrieval service in watsonx.data with gen AI lakehouse experience
A retrieval service refers to a system or component that enables efficient access to relevant data or information often unstructured or semi structured from a large-scale data lake or lakehouse architecture to support AI and machine learning tasks.
With the retrieval service in watsonx.data with gen AI lakehouse experience, users can run a data retrieval query and retrieve results from two sources:
- Vector database: Text chunks or passages that match the search query, ranked by relevance.
- SQL database: Tabular results from SQL queries against the data lakehouse.
Users can choose to retrieve from both vectors and SQL or one of them. The service ensures that Access Control Lists (ACLs) are managed according to the source system. Also, only governed data is presented to the user, ensuring that sensitive information is protected.
Additional features include,
- Prompt template suggestion: The API provides structured prompt templates to help craft effective inputs for the language model, guiding it to generate clear, intuitive, and user-friendly responses.
- Prompt lab: You can use an external prompt lab to experiment with and fine-tune your prompts based on specific requirements.
A document library is the search landscape for the retrieval service API in watsonx.data. The document library is collection of document sets that are created during the ingestion and curation of unstructured data. As part of the retrieval process, two types of queries are constructed:
- Text-to-SQL – where the query from the user is converted into an SQL and an SQL query is run on the data source to fetch the answer. Presto engines are used for SQL queries.
- Text-to-vector (Milvus) – where the query from the user is converted into a numerical vector that captures the semantic meaning. Milvus service is used for vector queries.
Performance is optimized through connection management, caching, parallel processing, resource management, and query optimization. Security considerations include JWT-based authentication, service-to-service authentication and authorization, data protection, and enforcement of ACLs.
How to use the retrieval service?
Applications such as, watsonx.ai, chatbots, or notebooks can use the retrieval service API, which is the public endpoint. This API provides the result context with suggested templates which can be used by any chatbots to generate the inference. The result context helps in understanding the relevance and accuracy of the output and suggested templates can be used by the chat applications to generate inferences or responses based on the search results.
For detailed information on watsonx.data Unstructured Retrieval API version v2, see API documentation.
To establish a secure connection with the retrieval service from your client workstation using curl or any HTTP client, provide the ca.crt, tls.crt, and tls.key files.
Complete the following steps to retrive and use the certificates:
-
Run the following command separatly for each of the following
tls.crt,tls.key, andca.crtfiles to extract the certificates from the host.`oc extract secret/ibm-lh-tls-secret --to=- --keys=file name > file name`Replace file name with
tls.crt,tls.key, andca.crt. -
Use the certificates with
curlor anyHTTPclient. The following example demonstrates the use of certificates withcurl.curl --cacert /tmp/certs/ca.crt --key /tmp/certs/tls.key --cert /tmp/certs/tls.crt <URL>Replace
<URL>with the endpoint of the Retrieval service, and provide the exact location fortls.crt,tls.key, andca.crtfiles in the curl.
Following are the details of the API request:
- POST /v1/retrieval/search: The primary endpoint for API version v1 where users (applications) send a query to retrieve relevant context.
- POST /v2/retrieval/search: The primary endpoint for API version v2 where users (applications) send a query to retrieve relevant context.
Following is an example request and response:
Sample request for v1:
{
"query": "Find all document ids",
"document_library_id": "38bf1ea6-d395-4811-9a6f-9b022815b9e5",
"container_id": "7482c15f-018f-42df-94b5-19b6fd05b5e3",
"container_type": "project",
"searchConfig": [
{
"type": "vector",
"limit": 30,
"enabled": true,
"sub_type": "topk"
},
{
"type": "sql",
"enabled": true
}
],
"search_inference_config": {
"enabled": false,
"inference_model_id": "meta-llama/llama-3-3-70b-instruct"
},
"provide_suggested_template": true
}
Sample request for v2:
document_library_id or document_set_id, or both asset_id and asset_type.
{
"query": "Find all document ids",
"asset_id": "052acf2e-d1cc-4bab-b34f-5700821b7ec3",
"asset_type": "document_library",
"document_set_id": "473daef4-cd2b-427b-afcb-6e9d67d85cf3",
"document_library_id": "38bf1ea6-d395-4811-9a6f-9b022815b9e5",
"container_id": "7482c15f-018f-42df-94b5-19b6fd05b5e3",
"container_type": "project",
"searchConfig": [
{
"type": "vector",
"limit": 30,
"enabled": true,
"sub_type": "topk"
},
{
"type": "sql",
"enabled": true
}
],
"search_inference_config": {
"enabled": false,
"inference_model_id": "meta-llama/llama-3-3-70b-instruct"
},
"provide_suggested_template": true
}
Understanding the sample requests
The following table provides details of the requests:
| Field | Description | Version |
|---|---|---|
| query | A natural language question. In this case, the user wants to know who sent invoice number 2. | v1, v2 |
| asset_id | Unique identifier of the target document set or library for the query. | v2 only |
| asset_type | Specifies the asset type. This can be either a document_library or a document_set. |
v2 only |
| document_set_id | Unique identifier of the document set to be queried. | v2 only |
| document_library_id | The unique identifier of the Document library where the search is performed. | v1, v2 |
| container_id | The ID of your project in watsonx.data. | v1, v2 |
| container_type | Specifies the type of container, in this case, it is a project. | v1, v2 |
| searchConfig.type | Enables vector-based semantic search using embeddings. | v1, v2 |
| searchConfig.limit | Limits the number of vector search results. | v1, v2 |
| searchConfig.enabled | Indicates search vector/SQL or both searches enabled. | v1, v2 |
| searchConfig.sub_type | Specifies the vector search strategy. In this case, it is Top-K (most similar results). | v1, v2 |
| searchConfig.type | Enables SQL-based structured search. | v1, v2 |
| search_inference_config_enabled | Inference based search. In this example, it is disabled. | v1, v2 |
| search_inference_config.inference_model_id | Specifies the model to use if inference-based search were enabled. | v1, v2 |
| provide_suggested_template | Requests the API to return a suggested query template or refinement. | v1, v2 |
What happens when watsonx.data receives this request?
-
watsonx.data receives the query and metadata.
-
It performs:
a. SQL search using Presto.
b. Vector search using Milvus (up to 30 results).
-
Combines results from both the searches
-
Responds with relevant documents, metadata, and possibly a suggested query template.
Response
If all the required parameters are met, following is a 200 OK response for the sample request:
{
"request_id": "e3ddc750-8e13-43a4-baaf-b480af6f1129",
"results": [
{
"source": "sql",
"data": [
{
"seller_id": 12,
"tax_amount": 105.6
}
],
"metadata": {
"execution_time": 6631,
"row_count": 48,
"sql": "SELECT seller_id, tax_amount FROM invoices"
},
"summary": ""
},
{
"source": "vector",
"data": [
{
"dimension": "768",
"distance": "0.883103",
"document_id": "D00AB694-0000-CE1B-BEA8-B870AD41196E",
"document_name": "",
"pk": "199",
"text": "INFORMATION SPECIAL INSTRUCTIONS (NOTES) bury FREIGHT TAX SUB TOTAL 0.00 0.00 0.00 GLRGXX: 0.00% soa TAXABLE AMOUNT 'TOTAL 508.60 Currency: USD. [E-& O.E. Seller reserve the Fight to correct clerical errors & omissions"
}
],
"metadata": {
"execution_time": 3171
},
"summary": ""
}
],
"suggested_template": {
"template_name": "Template Llama Models",
"template_text": "<|start_header_id|>system<|end_header_id|>\nYou are a world-class information retrieval and response agent. Analyse user query and generate accurate responses based on provided context.The context has two parts: the sql search and vector search. The sql search context provides precise data which should be considered the source of truth for any discrepancies. You can use it for natural language response\n\t\t\n\t\t##Format of context:\n\t\t{\n\t\t\"sql_search\": [\n\t\t\t{\n\t\t\t\"sql_query\": \"SQL Query\",\n\t\t\t\"context\": [\n\t\t\t\t{multiple key value pairs corresponding to row1 in sql query result},\n\t\t\t\t{multiple key value pairs corresponding to row2 in sql query result}\n\t\t\t]\n\t\t\t}\n\t\t],\n\t\t\"vector_search\": {\n\t\t\t\"context\": [\n\t\t\t\"Chunk of text top1\",\n\t\t\t\"Chunk of text top2\",\n\t\t\t\"...\",\n\t\t\t\"Chunk of text topk\"\n\t\t\t]\n\t\t}\n\t\t}\n\t\t\n\t\t\n###Context\n{\"sql_search\":[{\"context\":[{\"document_id\":\"4663a561-47b2-4308-81ef-6a0fb9c7276a\"},{\"document_id\":\"05d7f5e3-777b-4bba-afb6-b2566463a802\"},{\"document_id\":\"f91cc0ae-7140-4f7e-ae15-b66203406452\"},{\"document_id\":\"09f34660-04e7-4b1f-ac75-d511ad19dde2\"}],\"sql_query\":\"SELECT document_id FROM invoice_set_1_entities\"}],\"vector_search\":{\"context\":[\"\",\"\",\"\",\"\"]}}<|eot_id|><|start_header_id|>user<|end_header_id|>\nFind all document ids<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
"reasoning": "Template chosen based on model group Llama Models and context type"
},
"status": "success",
"execution_time": 10337,
"status": "success",
"execution_time": 203624
}
Understanding the response
Following are the details of the response:
| Field | Description |
|---|---|
| requestId | Unique ID to track the request. |
| status | The query/request was processed successfully. |
| executionTime | Total time taken (in ms) to process the query. |
| results | Array of results from different sources (sql, vector). |
| suggestedPromptTemplates | A recommended prompt format for future queries. |
| Field | Description |
|---|---|
| source | "sql" — result from SQL-based search. |
| summary | "This provides the seller's ID" |
| data | [{"sender_ID": "12"}] — the sender ID is 12. |
| metadata.sql | SQL query used: "SELECT seller_id, tax_amount FROM invoices" |
| Field | Description |
|---|---|
| source | "vector" — result from vector-based semantic search. |
| data | Contains details about the results. |
| dimension | Dimensionality of the embedding space. |
| distance | Similarity score from vector search. |
| document_id | Identifier of the matched document. |
| pk | Primary key of the matched document. |
| text | Snippet of the matched document text. |
| execution_time | Time taken to execute the vector search in milliseconds |
| Field | Description |
|---|---|
| template_name | Suggested template for rendering the results. |
| template_text | This is the actual prompt that will be injected into the system message for the model. |
| reasoning | Reason why the template was used. |
Suggested prompt template:
A recommended prompt format for LLM to generate responses:
- Treat SQL as the source of truth.
- Use vector data to enrich the response.
- Present results as if from a unified system.
- Format clearly using lists, tables, etc.
Limitations
- Assumption is user will have only 1 milvus collection and 1 iceberg schema table for GA. This is not restricted at system level, but we will recommend user to use same for single use case.
- If they use different collections/tables, it will degrade performance and not supported for GA.