Supported embedding models available with watsonx.ai

Use embedding models that are deployed in IBM watsonx.ai to help with semantic search and document comparison tasks.

Embedding models are encoder-only foundation models that create text embeddings. A text embedding encodes the meaning of a sentence or passage in an array of numbers known as a vector. For more information, see Text embedding generation.

The following embedding models are available in watsonx.ai:

For more information about generative foundation models, see Supported foundation models.

To find out which embedding models are available for use, use the List the available foundation models method in the watsonx.ai API. Specify the filters=function_embedding parameter to return only the available embedding models.

curl -X GET \
  'https://{cluster_url}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'

IBM embedding models

The following table lists the supported embedding models that IBM provides.

Table 1. IBM embedding model in watsonx.ai
Model name API model_id Maximum input tokens Number of dimensions More information
slate-125m-english-rtrvr ibm/slate-125m-english-rtrvr 512 768 Model card
slate-30m-english-rtrvr ibm/slate-30m-english-rtrvr 512 384 Model card

Third-party embedding models

The following table lists the supported third-party embedding models.

Table 2. Supported third-party embedding model in watsonx.ai
Model name API model_id Provider Maximum input tokens Number of dimensions More information
all-minilm-l6-v2 sentence-transformers/all-minilm-l6-v2 Open source natural language processing (NLP) and computer vision (CV) community 256 384 Model card
multilingual-e5-large intfloat/multilingual-e5-large Microsoft 512 1024 Model card
Research paper

Embedding model details

You can use the watsonx.ai Python library or REST API to submit sentences or passages to one of the supported embedding models.

all-minilm-l6-v2

This model was introduced with the 5.0.3 release

The all-minilm-l6-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, it generates a vector which captures the semantic information in the text.

Usage: Use the sentence vectors that are generated by the all-minilm-l6-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.

Number of dimensions: 384

Input token limits: 256

Supported natural languages: English

Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L6-H384-uncased model from Microsoft that is fine-tuned on a dataset that contains 1 billion sentence pairs.

Model architecture: Encoder-only

License: Apache 2.0 license

Learn more

multilingual-e5-large

This model was introduced with the 5.0.3 release

The multilingual-e5-large embedding model is built by Microsoft and provided by Hugging Face.

The embedding model architecture has 24 layers that are used sequentially to process data.

Usage: Use for use cases where you want to generate text embeddings for text in a language other than English. When you submit input to the model, follow these guidelines:

  • Prefix the inputs with query: and passage: respectively for tasks such as passage or information retrieval.
  • Prefix the input text with query: for tasks such as semantic similarity, bitext mining, and paraphrase retrieval.
  • Prefix the input text with query: if you want to use embeddings as features, such as in linear probing classification or for clustering.

Number of dimensions: 1024

Input token limits: 512

Supported natural languages: Up to 100 languages. See the model card for details.

Fine-tuning information: This embedding model is a version of the XLM-RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5TB of filtered CommonCrawl data. This embedding model was continually trained on a mixture of multilingual datasets.

Model architecture: Encoder-only

License: Microsoft Open Source Code of Conduct

Learn more

slate-125m-english-rtrvr

This model was updated to version 2.0.1

The slate-125m-english-rtrvr foundation model is provided by IBM. The model generates embeddings for various inputs such as queries, passages, or documents.

The training objective is to maximize cosine similarity between a query and a passage. This process yields two sentence embeddings, one that represents the question and one that represents the passage, allowing for comparison of the two through cosine similarity.

Usage: Two to three times slower but performs slightly better than the IBM Slate 30m embedding model.

Number of dimensions: 768

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

slate-30m-english-rtrvr

This model was updated to version 2.0.1

The slate-30m-english-rtrvr foundation model is a distilled version of the slate-125m-english-rtrvr, and provided by IBM. The IBM Slate embedding model is trained to maximize the cosine similarity between two text inputs so that embeddings can be evaluated based on similarity later.

The embedding model architecture has 6 layers that are used sequentially to process data.

Usage: Two to three times faster and has slightly lower performance scores than the IBM Slate 125m embedding model.

Try it out: Using vectorized text with retrieval-augmented generation tasks

Number of dimensions: 384

Input token limits: 512

Supported natural languages: English

Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.

Model architecture: Encoder-only

License: Terms of use

Learn more

Parent topic: Text embedding generation