Adding vectorized documents for grounding foundation model prompts

Add grounding documents to a vector index that can be used to add contextual information to foundation model prompts for retrieval-augmented generation tasks.

Required permissions
To create vector index assets and associate them with a prompt, you must have the Admin or Editor role in a project.
Data format
Differ by vector store.
Data size
Maximum file sizes differ by file type.

For details about data format and size, see Grounding document file types.

When you use foundation models for question-answering tasks, you can help the foundation model generate factual and up-to-date answers by adding contextual information to the foundation model prompt. When a foundation model is given factual information as input, it is more likely to incorporate that factual information in its output.

For more information, see Retrieval-augmented generation pattern.

To provide contextual information to a prompt, first add grounding documents to a vector index asset, and then associate the vector index with a foundation model prompt.

The task of adding grounding documents to an index is depicted in the retrieval-augmented generation diagram by the preprocessing step, where company documents are vectorized.

Close up of the preprocessing step in a RAG with vector embeddings pattern where company documents are vectorized by and embedding model and stored in a vector data store.

Ways to work

You can use various methods to create a vector index asset and add grounding documents to the asset in watsonx.ai including:

Types of vector stores

You can use one of the following vector stores to store your grounding documents:

  • In memory: A Chroma database vector index that is associated with your project and provides temporary vector storage. An embedding model must be installed in your cluster for the in-memory vector store to be accessible.

    Note: The in-memory vector index asset is created for you automatically; you don't need to set up the vector store.
  • Elasticsearch: A third-party vector index that you set up and connect to your project.

  • Milvus: A third-party vector index that you can set up and connect to your project.

Choosing a vector store

When you create a vector index for your documents, you can choose the vector store to use. To determine the right vector store for your use case, consider the following factors:

  • What types of files can the vector store index?

    The supported file types differ by vector store. For details, see Supported grounding document file types.

  • What embedding models can be used with the vector store?

    The embedding models that you can use to vectorize documents that you add to the index differ by vector store. For details, see Embedding models and vectorization settings.

  • How many grounding documents do you want to be able to search from your foundation model prompts?

    When you connect to a third-party vector store, you can choose to do one of the following tasks:

    • Add files to vectorize and store in a new vector index or collection in the vector store.
    • Use vectorized data from an existing index or collection in the vector store.

    The number of files that you can add to the vector store at the time that you create the vector index is limited. You can upload upto 10 documents at a time from an in-memory vector store.

    If you want to vectorize more documents, such as a set of PDF files that is larger than 50 MB, use a third-party vector store. With a third-party vector store, you can create a collection or index with more documents directly from the data store first. Then, you can connect to the existing collection or index when you create a vector index asset to associate with your prompt.

    Caution: Do not add more than 10 files in a single upload when you create a vector index in Prompt Lab.

Grounding document file types

When you add grounding documents to create a new vector index, you can upload files or connect to a data asset that contains files.

The following table lists the supported file types and maximum file sizes that you can add when you create a new vector index. The supported file types differ by vector store.

File types are listed in the first column. The maximum total file size that is allowed by default for each file type is listed in the second column. A checkmark (✓) indicates that the vector store that is named in the column header supports the file type that is listed in the first column.

Note: The maximum allowable size for each file type applies independently. For example, you can simultaneously upload multiple plain text files whose sizes add up to a maximum of 5 MB and multiple PDF files whose sizes add up to a maximum of 50 MB.
Table 1. Supported file types for grounding documents in differnt vector stores
File type In-memory store maximum total file size Elasticsearch maximum total file size Milvus maximum total file size
CSV Not supported 50 MB 50 MB
DOCX 50 MB 500 MB 500 MB
HTML Not supported 50 MB 50 MB
JSON Not supported 50 MB 50 MB
PDF 50 MB 500 MB 500 MB
PPTX 300 MB 300 MB 300 MB
TXT 5 MB 50 MB 50 MB
XLSX Not supported 50 MB 50 MB
XML Not supported 50 MB 50 MB
YAML Not supported 50 MB 50 MB

Embedding models

When you upload grounding documents, an embedding model is used to calculate vectors that numerically represent the document text. You can choose the embedding model to use.

For in-memory and Milvus data stores, the following embedding models are supported:

all-minilm-l6-v2
Requires a smaller chunk size than the IBM Slate embedding models.
all-minilm-l12-v2
Requires a smaller chunk size than the IBM Slate embedding models.
granite-embedding-107m-multilingual
Standard sentence transformer model based on bi-encoders and part of the IBM Granite Embeddings suite.
granite-embedding-278m-multilingual
Standard sentence transformer model based on bi-encoders and part of the IBM Granite Embeddings suite.
granite-embedding-reranker-english-r2
Standard reranker transformer model based on cross-encoders and part of the IBM Granite Embeddings suite.
slate-30m-english-rtrvr
IBM model that is faster than the 125m version.
slate-125m-english-rtrvr
IBM model that is more precise than the 30m version.

For more information about the IBM-provided embedding models, see Supported encoder models.

For the Elasticsearch data store, ELSER (Elastic Learned Sparse EncodeR) embedding models are supported. For more information, see ELSER – Elastic Learned Sparse EncodeR

Learn more