Supported embedding models available with watsonx.ai
Use embedding models that are deployed in IBM watsonx.ai to help with semantic search and document comparison tasks.
Embedding models are encoder-only foundation models that create text embeddings. A text embedding encodes the meaning of a sentence or passage in an array of numbers known as a vector. For more information, see Text embedding generation.
The following embedding models are available in watsonx.ai:
For more information about generative foundation models, see Supported foundation models.
To find out which embedding models are available for use, use the List the available foundation models method in the watsonx.ai API. Specify the filters=function_embedding
parameter to return only the available embedding models.
curl -X GET \
'https://{cluster_url}/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'
IBM embedding models
The following table lists the supported embedding models that IBM provides.
Model name | API model_id | Maximum input tokens | Number of dimensions | More information |
---|---|---|---|---|
slate-125m-english-rtrvr | ibm/slate-125m-english-rtrvr |
512 | 768 | Model card |
slate-30m-english-rtrvr | ibm/slate-30m-english-rtrvr |
512 | 384 | Model card |
Third-party embedding models
The following table lists the supported third-party embedding models.
Model name | API model_id | Provider | Maximum input tokens | Number of dimensions | More information |
---|---|---|---|---|---|
all-minilm-l6-v2 | sentence-transformers/all-minilm-l6-v2 | Open source natural language processing (NLP) and computer vision (CV) community | 256 | 384 | • Model card |
multilingual-e5-large | intfloat/multilingual-e5-large | Microsoft | 512 | 1024 | • Model card • Research paper |
Embedding model details
You can use the watsonx.ai Python library or REST API to submit sentences or passages to one of the supported embedding models.
all-minilm-l6-v2
This model was introduced with the 5.0.3 release
The all-minilm-l6-v2 embedding model is built by the open source natural language processing (NLP) and computer vision (CV) community and provided by Hugging Face. Use the model as a sentence and short paragraph encoder. Given an input text, it generates a vector which captures the semantic information in the text.
Usage: Use the sentence vectors that are generated by the all-minilm-l6-v2 embedding model for tasks such as information retrieval, clustering, and for detecting sentence similarity.
Number of dimensions: 384
Input token limits: 256
Supported natural languages: English
Fine-tuning information: This embedding model is a version of the pretrained MiniLM-L6-H384-uncased model from Microsoft that is fine-tuned on a dataset that contains 1 billion sentence pairs.
Model architecture: Encoder-only
License: Apache 2.0 license
Learn more
multilingual-e5-large
This model was introduced with the 5.0.3 release
The multilingual-e5-large embedding model is built by Microsoft and provided by Hugging Face.
The embedding model architecture has 24 layers that are used sequentially to process data.
Usage: Use for use cases where you want to generate text embeddings for text in a language other than English. When you submit input to the model, follow these guidelines:
- Prefix the inputs with
query:
andpassage:
respectively for tasks such as passage or information retrieval. - Prefix the input text with
query:
for tasks such as semantic similarity, bitext mining, and paraphrase retrieval. - Prefix the input text with
query:
if you want to use embeddings as features, such as in linear probing classification or for clustering.
Number of dimensions: 1024
Input token limits: 512
Supported natural languages: Up to 100 languages. See the model card for details.
Fine-tuning information: This embedding model is a version of the XLM-RoBERTa model, which is a multilingual version of RoBERTa that is pretrained on 2.5TB of filtered CommonCrawl data. This embedding model was continually trained on a mixture of multilingual datasets.
Model architecture: Encoder-only
License: Microsoft Open Source Code of Conduct
Learn more
slate-125m-english-rtrvr
This model was updated to version 2.0.1
The slate-125m-english-rtrvr foundation model is provided by IBM. The model generates embeddings for various inputs such as queries, passages, or documents.
The training objective is to maximize cosine similarity between a query and a passage. This process yields two sentence embeddings, one that represents the question and one that represents the passage, allowing for comparison of the two through cosine similarity.
Usage: Two to three times slower but performs slightly better than the IBM Slate 30m embedding model.
Number of dimensions: 768
Input token limits: 512
Supported natural languages: English
Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.
Model architecture: Encoder-only
License: Terms of use
Learn more
slate-30m-english-rtrvr
This model was updated to version 2.0.1
The slate-30m-english-rtrvr foundation model is a distilled version of the slate-125m-english-rtrvr, and provided by IBM. The IBM Slate embedding model is trained to maximize the cosine similarity between two text inputs so that embeddings can be evaluated based on similarity later.
The embedding model architecture has 6 layers that are used sequentially to process data.
Usage: Two to three times faster and has slightly lower performance scores than the IBM Slate 125m embedding model.
Try it out: Using vectorized text with retrieval-augmented generation tasks
Number of dimensions: 384
Input token limits: 512
Supported natural languages: English
Fine-tuning information: This version of the model was fine-tuned to be better at sentence retrieval-based tasks.
Model architecture: Encoder-only
License: Terms of use
Learn more
Parent topic: Text embedding generation