Embeddings

Use embedding models to create text embeddings that capture the meaning of a sentence or a passage. You can use these models with classifiers such as support vector machines. Embedding models can also help you with retrieval-augmented generation tasks.

The following diagram illustrates the retrieval-augmented generation pattern with embedding support.

Diagram that shows adding search results derived from a vector store to the input for retrieval-augmented generation

The retrieval-augmented generation pattern with embedding support involves the following steps:

  1. Convert your content into text embeddings and store them in a vector data store.
  2. Use the same embedding model to convert the user input into text embeddings.
  3. Run a similarity or semantic search in your knowledge base for content that is related to a user's question.
  4. Pull the most relevant search results into your prompt as context and add an instruction, such as “Answer the following question by using only information from the following passages.”
  5. Send the combined prompt text (instruction + search results + question) to the foundation model.
  6. The foundation model uses contextual information from the prompt to generate a factual answer.

For more information, see:

USE embeddings

USE embeddings are wrappers around Google Universal Sentence Encoder embeddings that are available in TFHub. These embeddings are used in the document classification SVM algorithm. For a list of pretrained USE embeddings and their supported languages, see Included pretrained USE embeddings

When using USE embeddings, consider the following:

  • Choose embedding_use_en_stock if your task involves English text.

  • Choose one of the multilingual USE embeddings if your task involves text in a non-English language, or you want to train multilingual models.

  • The USE embeddings exhibit different trade-offs between quality of the trained model and throughput at inference time, as described below. Try different embeddings to decide the trade-off between quality of result and inference throughput that is appropriate for your use case.

    • embedding_use_multi_small has reasonable quality, but it is fast at inference time
    • embedding_use_en_stock is a English-only version of embedding_embedding_use_multi_small, hence it is smaller and exhibits higher inference throughput
    • embedding_use_multi_large is based on Transformer architecture, and therefore it provides higher quality of result, with lower throughput at inference time

Code sample

import watson_nlp

syntax_model = watson_nlp.load("syntax_izumo_en_stock")
embeddings_model = watson_nlp.load("embedding_use_en_stock")

text = "python"
syntax_doc = syntax_model.run(text)
embedding = embeddings_model.run(syntax_doc)
print(embedding)

Output of the code sample:

{
  "data": {
    "data": [
      -0.01909315399825573,
      -0.009827353060245514,
...
      0.008978910744190216,
      -0.0702751949429512
    ],
    "rows": 1,
    "cols": 512,
    "dtype": "float32"
  },
  "offsets": null,
  "producer_id": null
}

Included pretrained USE embeddings

The following table lists the pretrained blocks for USE embeddings that are available and the languages that are supported. For a list of the language codes and the corresponding language, see Language codes.

List of pretrained USE embeddings with their supported languages
Block name Model name Supported languages
use embedding_use_en_stock English only
use embedding_use_multi_small ar, de, el, en, es, fr, it, ja, ko, nb, nl, pl, pt, ru, th, tr, zh_tw, zh
use embedding_use_multi_large ar, de, el, en, es, fr, it, ja, ko, nb, nl, pl, pt, ru, th, tr, zh_tw, zh

GloVe embeddings

GloVe embeddings are used by the CNN classifier.

Block name: embedding_glove__stock

Supported languages: ar, de, en, es, fr, it, ja, ko, nl, pt, zh-cn

Code sample

import watson_nlp

syntax_model = watson_nlp.load("syntax_izumo_en_stock")
embeddings_model = watson_nlp.load("embedding_glove_en_stock")

text = "python"
syntax_doc = syntax_model.run(text)
embedding = embeddings_model.run(syntax_doc)
print(embedding)

Output of the code sample:

{
  "data": {
    "data": [
      -0.01909315399825573,
      -0.009827353060245514,
...
      0.008978910744190216,
      -0.0702751949429512
    ],
    "rows": 1,
    "cols": 512,
    "dtype": "float32"
  },
  "offsets": null,
  "producer_id": null
}

Transformer embeddings

Block names

  • embedding_transformer_en_slate.125m
  • embedding_transformer_en_slate.30m

Supported languages

English only

Code sample

import watson_nlp

# embeddings_model = watson_nlp.load("embedding_transformer_en_slate.125m")
embeddings_model = watson_nlp.load("embedding_transformer_en_slate.30m")
text = "python"

embedding = embeddings_model.run(text)
print(embedding)

Output of the code sample

{
  "data": {
    "data": [
      -0.055536773055791855,
      0.008286023512482643,
      ...
      -0.3202415108680725,
      5.000295277568512e-05
    ],
    "rows": 1,
    "cols": 384,
    "dtype": "float32"
  },
  "offsets": null,
  "producer_id": {
    "name": "Transformer Embeddings",
    "version": "0.0.1"
  }
}

Parent topic: Watson Natural Language Processing task catalog