Data integration

With IBM watsonx.data integration, you can use a range of diverse data integration styles, such as streaming, replication, observability, and bulk or batch processing.

You can access and connect data across various data sources, including databases, file systems, real-time web services and messaging systems, and other enterprise applications. You can also create alerts to track the health of the end-to-end data integration process, allowing immediate investigations of data incidents.

Your watsonx.data integration tools are in collaborative workspaces called projects. After you create a project, or join one, the next step is to use the tools to integrate and observe your data.

Data formats

You can integrate the following types of data:

Structured data, such as data stored in relational databases or CSV files
Semi-structured data, such as JSON or XML files
Unstructured data, such as PDF, HTML, or markdown files

Data integration capabilities

You can integrate your data in the following ways:

Transform batch data with DataStage: Create batch data flows that extract structured or semi-structured data from multiple source systems, transform the data as required, and deliver the data to target systems. See Transforming data with DataStage.
Stream real-time data with StreamSets: Create streaming data flows that run continuously to read, process, and write data as soon as the data becomes available. StreamSets data flows primarily process structured or semi-structured data. You can use the whole file data format to process unstructured data. See Streaming real-time data.
Replicate data with Data Replication: Build a replication pipeline that synchronizes structured data between a source and target data store. See Replicating data.
Prepare unstructured data with Unstructured Data Integration: Build unstructured data processing flows to ingest, transform, and enrich unstructured data from diverse sources for generative AI usecases. See Working with unstructured data.
Observe your data with Data Observability: Create alerts to track the health of DataStage jobs, allowing immediate investigations of data incidents. See Observing data.

Ways to work

You can work in a no-code or low-code experience with the following tools in the UI:

DataStage
StreamSets
Data Replication
Unstructured Data Integration
Data Observability

You can interact with an AI agent to build data flows using a guided, natural language experience. See Building data flows with the Data Engineering Agent.

You can use AI to observe, detect, and provide the probable root cause of the issues with your failed DataStage jobs.

You can write code to integrate your data with the following methods:

IBM watsonx.data integration SDK for Python
IBM APIs for DataStage

See Available APIs and SDKs.

Data integration

Data formats

Data integration capabilities

Ways to work

Learn more