Generate text embeddings programmatically
Use the embedding models and the text embeddings API in watsonx.ai to create text embeddings that capture the meaning of sentences or passages for use in your generative AI applications.
Ways to work
You can vectorize text, meaning convert text to numerical representations of text called embeddings, by using the following programming methods:
Alternatively, you can use graphical tools from the watsonx.ai UI to vectorize documents as part of a chat workflow or to create vector indexes. See the following resources:
- Chatting with documents and images
- Adding vectorized documents for grounding foundation model prompts
You can also use IBM embedding models from the following third-party platforms:
Overview
Converting text into text embeddings, or vectorizing text, helps with document comparison, question-answering, and in retrieval-augmented generation (RAG) tasks, where you need to retrieve relevant content quickly.
For more information, see the following topics:
Supported foundation models
For details about the available embedding models in watsonx.ai, see Supported encoder models.
To find out which embedding models are available for use programmatically, use the List the available foundation models method in the watsonx.ai API. Specify
the filters=function_embedding parameter to return only the available embedding models.
curl -X GET \
'https://{region}.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'
REST API
Use the text embeddings method of the watsonx.ai REST API to vectorize text.
For details, see the watsonx.ai API reference documentation.
REST API example
The following code snippet uses the slate-30m-english-rtrvr model to convert the following two lines of text into text embeddings:
- A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.
- Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data.
Although only two lines of text are being submitted for converstion in this example, you can specify up to 1,000 lines. Each line that you submit must conform to the maximum input token limit that is defined by the embedding model.
To address cases where a line might be longer, the truncate_input_tokens parameter is specified to force the line to be truncated. Otherwise, the request might fail. In this example, the input_text parameter is included
so that the original text will be added to the response, making it easier to pair the original text with each set of embedding values.
You specify the embedding model that you want to use as the model_id in the payload for the embedding method.
curl -X POST \
'https://{region}.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-02' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer eyJraWQiOi...' \
--data-raw '{
"inputs": [
"A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.",
"Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
],
"parameters":{
"truncate_input_tokens": 128,
"return_options":{
"input_text":true
}
},
"model_id": "ibm/slate-30m-english-rtrvr",
"project_id": "81966e98-c691-48a2-9bcc-e637a84db410"
}'
The response looks something like this, although in this sample response, the 384 values in each embedding are reduced to 6 values to improve the readbility of the example:
{
"model_id": "ibm/slate-30m-english-rtrvr",
"created_at": "2024-05-02T16:21:56.771Z",
"results": [
{
"embedding": [
-0.023104044,
0.05364946,
0.062400896,
...
0.008527246,
-0.08910927,
0.048190728
],
"input": "A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks."
},
{
"embedding": [
-0.024285838,
0.03582272,
0.008893765,
...
0.0148864435,
-0.051656704,
0.012944954
],
"input": "Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
}
],
"input_token_count": 57
}
Python
See the Embeddings class of the watsonx.ai Python library.
To get started, see the following sample notebook: