Getting started with IBM Data Integration for Unstructured Data
With IBM Data Integration for Unstructured Data, you can ingest, cleanse, transform, and enrich unstructured data for gen AI. Use the intuitive, drag-and-drop user interface with pre-built modules for tasks such as text data extraction, filtering, and PII redaction to process your data. You can build repeatable visual data pipelines that help to continuously process new changes and updates to ensure the application is always using the latest available data.
Regardless of skill level or expertise, data teams can utilize an intuitive UI with pre-built, drag-and-drop operator nodes to ingest unstructured data from a variety of disparate input data sources, and store the output in a vector database.
You use the graphical canvas to create a flow - a collection of steps for processing your data. From the nodes palette, you can choose and configure various operator nodes and then connect them in a flow that processes your documents.
After a flow is created, IBM Data Integration for Unstructured Data maintains the live pipeline. You can use dedicated runtimes and schedule jobs for automatic embedding updates if a source document is changed. As source documents are updated, only those embeddings that changed will be reprocessed. This solves post-vectorization problems of data being out of date after consumption. It reduces the manual burden involved in preparing raw unstructured data for enterprise-grade AI by sifting through data and also determining if that data itself is relevant and accurate for your use case.
Checking whether the service is installed
An administrator must install IBM Data Integration for Unstructured Data.
To check whether the service is installed:
- From the navigation menu, select .
- Search for IBM Data Integration for Unstructured Data.
If the service is installed and ready to use, the tile in the catalog shows Ready to use.
Accessing the service
IBM Data Integration for Unstructured Data is available from Data Fabric and watsonx.data™ experience.
From the navigation menu, select . Open your project or create a new one. Click New asset and select Create an unstructured data flow to start working with IBM Data Integration for Unstructured Data.