Generate output

Use the Generate output node to specify whether you want to upload your embeddings into watsonx.data Milvus, and if you want to create a new document set.

You can use both types in the flow at the same time. The Generate output node is mandatory and it must be the last node in the flow.

Use the Generate output node to upload your embeddings into Milvus or Elasticsearch, store generated entities in the Entity store, or to create a new document set.

Selecting a vector database

When you're selecting which vector database to use, consider how the RAG applications that are meant to use the embeddings are created. If you're creating a flow for a RAG application, you must select the same vector database that the RAG application is expecting.

You can select one of the following vector databases:

Milvus or watsonx.data Milvus
Elasticsearch
OpenSearch

Upstream nodes (chunking, enrichment, embeddings) produce features, for example: content (text chunk), embeddings (dense vector), sparse_embeddings, id, doc_id_hash, timestamps or metadata. By specifying feature mappings, you define which output of your flow goes into which fields or columns in the target vector store. Feature mappings explicitly connect flow outputs (text, vectors, metadata) to target-specific schema elements, determining how data is stored, indexed, and retrieved during vector and hybrid search.

Milvus

Select this node to store the embeddings in the watsonx.data Milvus database.

You can have multiple collections in your vector database and save your embeddings in one of the collections. The RAG application then reads the embeddings from the collection that you specified.

Specify an existing collection name. The schema of the collection matches the schema that is expected by the flow. The watsonx.data Milvus schema is expected to have the following fields:

Collection schema details for watsonx.data Milvus
Name	Type	Maximum length	Primary	Auto generated
pk	<DataType.VARCHAR: 21>	65535	True	False
text	<DataType.VARCHAR: 21>	65535	False	False
vector_embeddings	<DataType.FLOAT_VECTOR: 101>			False
sparse-embeddings
id	<DataType.VARCHAR: 21>	65535	False	False

Specify a nonexisting collection name.
The flow creates a collection with the name that you specified.

Use the properties panel for feature mapping. You can map the features to the columns in the Milvus collection if you choose the existing collection. In a new collection, you can change the column name.

For larger documents that have more than 65,536 characters of text, you must also use the chunking operator because Milvus supports fields with the maximum length of 65,536 characters.

Elasticsearch

Specify the following properties:

Specify connection details.
Map features to index mappings: Select features from connected nodes and map them to your target index mappings.
1. Click Add feature mappings.
2. In the window that opens, search for the index to use or create a new index by providing its name.
3. Select vector similarity:
  - Cosine similarity
  - Euclidean distance
  - Dot product
4. Map the features to the index columns if you chose the existing index. For a new index, you can edit column names.

OpenSearch

Specify the following properties:

Specify connection details.
Map features: Select features from connected nodes and map them to columns in collections.
Enable Compute sparse vectors for hybrid search to generate sparse vector representations (for example, keyword-based signals) alongside dense embeddings. This allows OpenSearch to combine semantic similarity with keyword relevance, improving search quality for queries that benefit from exact term matching as well as contextual understanding.

Document set

Use this node to generate a new document set. Select if you want to store the output of the flow for further downstream tools to utilize it, or to reuse the generated document set as an input to multiple use cases. The document set information is stored in an Iceberg table. The database can be connected through one of these connectors:

watsonx.data Presto - The connection must be configured with the engine connection properties hostname or IP address, engine ID, and engine port.
Presto - Only for flows that run in a Python runtime environment.
Iceberg metastore - Only for flows that run in a Python runtime environment.
Microsoft Azure Databricks
PostgreSQL
Db2
Oracle

In the Properties panel for the node, provide a name and description for the document set.
Select a connection, catalog, and schema to use for storing the document set table.
In the Properties panel, provide a name for the new table to be created.

Note: For the pdf, ppt, pptx formats the value for Total pages processed in the log is based on the actual number of pages present in the document that is the input. For doc, docx, md and txt the count for Total pages processed is based on the number of characters (3000 characters = 1 page) so the count in the log might not reflect the actual number of pages in the document.

Entity store

Use the Entity store node to store extracted entities in structured entities tables. Entity tables are written to Iceberg tables. Select a connection and schema to store entity tables.

The database can be connected through one of these connectors:

watsonx.data Presto - The connection must be configured with the engine connection properties hostname or IP address, engine ID, and engine port.
Presto - Only for flows that run in a Python runtime environment.
Iceberg metastore - Only for flows that run in a Python runtime environment.
Microsoft Azure Databricks
PostgreSQL
Db2
Oracle

Next node in the flow

The Generate output node is the last one in the flow. You can now run your flow.

Learn more

Creating a data preparation flow