GenaiMultiDocumentQuery class
The GenaiMultiDocumentQuery class performs vector queries across multiple documents and operates in four modes: RAG, entire document, adaptive entire document, and adaptive RAG.
Processing modes
The GenaiMultiDocumentQuery API supports multiple processing modes that
determine how document content is supplied to the LLM for processing. The API caller controls the
mode by specifying a value in the GenaiMaxDocumentChunks parameter.
GenaiMultiDocumentQuery API processing modes (RAG mode, entire document mode, adaptive entire document mode, and adaptive RAG mode) are available in Content Platform Engine 5.7 IF04 and later.- RAG mode
-
By default,
GenaiMultiDocumentQueryuses a retrieval‑augmented generation (RAG) approach to process document content. In this mode, the API performs a vector search to identify the most relevant portions of the selected documents and sends only those chunks to the generative model. Full document text is not sent.The caller controls chunk selection through the
GenaiMaxDocumentChunksparameter:- If the caller passes
null, the API uses a default value of 8 document chunks. - If the caller specifies a positive integer, the API uses that value to determine the number of chunks returned.
Only the selected chunks are sent to the generative model for response generation. The
GenaiVectorChunksoutput parameter contains the document chunks that were used. At least one of the selected documents must be vector indexed.Use RAG mode when you need selective retrieval to limit the amount of content that is sent to the model. This mode is appropriate when document citations are sufficient without full document context.
- If the caller passes
- Entire document mode
-
Entire document mode bypasses retrieval-augmented generation (RAG) and sends full document content to the generative model. To enable this mode, the caller passes
-1in theGenaiMaxDocumentChunksparameter. In entire document mode, the API does not perform a vector search. Instead, it sends the entire extracted text of the selected documents to the generative model for processing.The API enforces a maximum combined document size of 400,000 characters. If the total extracted text of all selected documents exceeds this limit, the API truncates the text by using the following behavior:
- All text from smaller documents is preserved.
- Remaining documents are truncated to a common maximum length.
The
GenaiVectorChunksoutput parameter is returned as an empty string if no truncation occurs, or set toTRUNCATEDif truncation occurs. Documents do not need to be vector indexed for this mode.Use entire document mode when full document context is required and the overall document size is predictable. This mode never returns document citations.
- Adaptive entire document mode
-
Adaptive entire document mode automatically selects between entire document processing and RAG based on the total document size. To enable this mode, the caller passes
0in theGenaiMaxDocumentChunksparameter.The API measures the total extracted text of all selected documents and applies the following logic:
- If the total text is less than 400,000 characters, the API sends the entire document text to the
model without performing a RAG query, and
GenaiVectorChunksis not populated. - If the total text exceeds 400,000 characters, the API reverts to RAG processing, selects
relevant document chunks using vector search, and
GenaiVectorChunkscontains the chunks that were used.
At least one of the selected documents must be vector indexed. Use this mode when document size varies or is unknown and you want the API to choose the optimal strategy. This mode returns document citations only when documents exceed the 400,000 character threshold.
- If the total text is less than 400,000 characters, the API sends the entire document text to the
model without performing a RAG query, and
- Adaptive RAG mode
-
Adaptive RAG mode combines the benefits of RAG mode (document citations) with entire document mode (full context). Unlike standard RAG mode that sends only document chunks, this mode sends both chunks and full documents to the generative model. To enable this mode, the caller passes
-2in theGenaiMaxDocumentChunksparameter.In this mode, the API combines vector search with full document processing:
- The API performs a vector search based on the list of document IDs to find the top N document chunks that match the user query.
- Both the selected chunks and the full extracted text of the documents are sent to the generative model. The chunks are sent first, followed by the complete document text. If necessary, the full document text is truncated to ensure that the combined content does not exceed the LLM context window.
The
GenaiVectorChunksoutput parameter always contains the document chunks that were used, ensuring document citations are always available. All selected documents must be vector indexed.Use this mode when you need document citations while also providing full document context to the model for comprehensive analysis. This mode addresses the limitation of standard RAG mode (chunks only) and entire document mode (no citations).
| Property | Data type | Description |
|---|---|---|
GenaiLLMPrompt |
String | The input prompt from the user. The maximum length for the value is 4000 characters. |
GenaiLLMModelName |
String | An optional watsonx LLM model name. The maximum length for the value is 256 characters. |
GenaiLLMResponse |
String | The response from the watsonx LLM. |
GenaiLLMMaxOutputTokens |
Integer | If you set this parameter, it overrides the LLM maximum output tokens parameter which has a default value of 4096. You cannot set a value less than 10 or greater than 8192. |
GenaiPromptTemplate |
String | An optional parameter that specifies the prompt template for your query operation. If you set this parameter, the template that you pass overrides all the default prompt templates that are already configured in Content Platform Engine. |
GenaiVectorChunks |
String | A JSON value that contains the document chunks that the vector search returns. If
GenaiPerformLLMQuery is false, then this property holds all the vector query
results. If GenaiPerformLLMQuery is true, then this property holds the chunks that
were submitted to the LLM as context for the LLM query. |
GenaiPerformLLMQuery |
Boolean | If the value is true, the prompt is submitted to an LLM with the vector chunks as context. |
GenaiMaxDocumentChunks |
Integer | Controls the processing mode and the maximum number of document chunks. Valid values:
|
GenaiRelevancyFilterLevel |
Float | When you set this parameter, it overrides the LLM relevancy score filter level which has a default value of 0.5. You can set a value between 0.0 and 1.0. |
GenaiContextDocuments |
List of Id | The list of document id's that are used as context for the query. |