Generate text embeddings programmatically

Use embedding models and the text embeddings API in watsonx.ai to create numerical representations, in the form of vectors, that capture the meaning of sentences or passages for use in your generative AI applications.

Ways to develop

You can vectorize text, meaning convert text to numerical representations of text called embeddings, by using the following programming methods:

REST API
Python
Node.js

Alternatively, you can use graphical tools from the watsonx.ai UI to vectorize documents as part of a chat workflow or to create vector indexes. See Chatting with documents and images and Adding vectorized documents for grounding foundation model prompts.

You can also use IBM embedding models from the following third-party platforms:

Architecture overview

Converting text into text embeddings, or vectorizing text, helps with document comparison, question-answering, and in retrieval-augmented generation (RAG) tasks, where you need to retrieve relevant content quickly.

A text embedding is a numerical representation of a sentence or passage as a vector of real-valued numbers. By converting sentences to number vectors, operations on sentences become more like math equations, which is something computers can do quickly, and can do well.

When an embedding model creates a vector representation of a sentence, the embedding model assigns values that capture the semantic meaning of the sentence. The embedding model also positions the vector within a multidimensional space based on its assigned values. The size of the dimensional space varies by model, which means the exact vector values vary also. However, all models position the vectors such that sentences with similar meanings are nearer to one another.

Most embedding models generate vectors in so many dimensions, ranging from hundreds to thousands of dimensions, that it's impossible to visualize. If an embedding model were to generate a 3-dimensional vector, it might look as follows. Note that the vector values shown in the image are fictional, but are included to help illustrate this hypothetical scenario.

A 3-dimensional cube with three data points that represent three sentence embeddings

The image shows that sentences with shared keywords and with shared subjects have vectors with similar values, which places them nearer to each other within the three-dimensional space. The following sentences are positioned based on their vector values:

The Degas reproduction is hanging in the den
Jan bought a painting of dogs playing cards
I took my dogs for a walk

The first two sentences about artwork and last two sentences that share the keyword dogs are nearer to one another than the first and third sentences, which share no common words or meanings.

You can store generated vectors in a vector database. When the same embedding model is used to convert all of the sentences in the database, the vector store can leverage the inherent groupings and relationships that exist among the sentences based on their vector values to return relevant search results quickly.

Unlike traditional indexes that store text and rely on keyword search for information retrieval, vector stores support semantic searches that retrieve information that is similar in meaning. For example, where keyword search checks only whether the keyword is present, semantic search weighs the context in which the keyword is used, which typically produces better search results.

Supported foundation models

For details about the available embedding models in watsonx.ai, see Supported encoder models.

To find out which embedding models are available for use programmatically, use the List the available foundation models method in the watsonx.ai API. Specify the filters=function_embedding parameter to return only the available embedding models.

curl -X GET \
  'https://<region>.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-07-25&filters=function_embedding'

REST API

Use the text embeddings method of the watsonx.ai REST API to vectorize text. For details, see the watsonx.ai API reference documentation.

The following code snippet uses the slate-30m-english-rtrvr model to convert the following two lines of text into text embeddings:

A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.
Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data.

Although only two lines of text are being submitted for converstion in this example, you can specify up to 1,000 lines. Each line that you submit must conform to the maximum input token limit that is defined by the embedding model.

To address cases where a line might be longer, the truncate_input_tokens parameter is specified to force the line to be truncated. Otherwise, the request might fail. In this example, the input_text parameter is included so that the original text will be added to the response, making it easier to pair the original text with each set of embedding values.

You specify the embedding model that you want to use as the model_id in the payload for the embedding method.

curl -X POST \
  'https://<region>.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-02' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer eyJraWQiOi...' \
  --data-raw '{
  "inputs": [
    "A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.",
    "Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
  ],
  "parameters":{
    "truncate_input_tokens": 128,
    "return_options":{
      "input_text":true
    }
  },
  "model_id": "ibm/slate-30m-english-rtrvr",
  "project_id": "81966e98-c691-48a2-9bcc-e637a84db410"
}'

The response looks something like this, although in this sample response, the 384 values in each embedding are reduced to 6 values to improve the readbility of the example:

{
  "model_id": "ibm/slate-30m-english-rtrvr",
  "created_at": "2024-05-02T16:21:56.771Z",
  "results": [
    {
      "embedding": [
        -0.023104044,
        0.05364946,
        0.062400896,
        ...
        0.008527246,
        -0.08910927,
        0.048190728
      ],
      "input": "A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks."
    },
    {
      "embedding": [
        -0.024285838,
        0.03582272,
        0.008893765,
        ...
        0.0148864435,
        -0.051656704,
        0.012944954
      ],
      "input": "Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
    }
  ],
  "input_token_count": 57
}

Python

See the Embeddings class of the watsonx.ai Python library.

The following code snippet illustrates how to use the slate-30m-english-rtrvr model to convert the following two lines of text into text embeddings:

A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.
Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data.

from ibm_watsonx_ai.foundation_models import Embeddings
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames as EmbedParams

my_credentials = {
  "url": "https://<region>.ml.cloud.ibm.com",
  "apikey": <my-IBM-Cloud-API-key>,
}

client = APIClient(my_credentials)

model_id = client.foundation_models.EmbeddingModels.SLATE_30M_ENGLISH_RTRVR
gen_parms = None
project_id = <my-project-ID>
space_id = None
verify = False

# Set the truncate_input_tokens to a value that is equal to or less than the maximum allowed tokens for the embedding model that you are using. If you don't specify this value and the input has more tokens than the model can process, an error is generated.

embed_params = {
  EmbedParams.TRUNCATE_INPUT_TOKENS: 128,
  EmbedParams.RETURN_OPTIONS: {
    'input_text': True
  }
}

embedding = Embeddings(
  model_id=model_id,
  credentials=my_credentials,
  params=embed_params,
  project_id=project_id,
  space_id=space_id,
  verify=verify
)

q = [
  "A foundation model is a large-scale generative AI model that can be adapted to a wide range of downstream tasks.",
  "Generative AI a class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data."
]

embedding_vectors = embedding.embed_documents(texts=q)

print(embedding_vectors)

Replace {region}, {my-IBM-Cloud-API-key}, and {my-project-ID} with valid values for your environment.

Sample output

[
   [-0.0053823674,-0.018807093,0.009131943, ...-0.010469643,0.0010533642,0.020114796], 
   [-0.04075534,-0.041552857,0.04326911, ...0.017616473,-0.010064489,0.020788372]
]

Costs

The cost of generating text embeddings is measured in resource units. The rate depends on the model's billing class. For details, see Billing details for generative AI assets.