Getting started with watsonx.data integration

After you sign up for IBM watsonx.data integration, you can start transforming, integrating, and observing your data.

To get started, review the overall workflow and choose the tool that meets your needs.

Data integration workflow

Your data integration workflow includes these basic steps:

  1. Create task credentials.

    Create the task credentials or user-generated API key required for long-running data integration tasks. For more information, see Creating task credentials for jobs.

  2. Create a project.

  3. Add data to your project.

    You can add the following types of data:

    • Data from a connected remote data source. For details, see the connector types that are supported by each watsonx.data integration tool.
    • Data files from your local system.
    • Sample data from the Resource hub.
  4. Set up a streaming engine.

    To build a streaming data flow, create a StreamSets environment for your project and run an engine in your corporate network. For more information, see Administering StreamSets environments.

  5. Create a data flow or a replication asset.

    Use the appropriate watsonx.data integration tool to create a data flow or a replication asset.

  6. Run a job for the flow or asset.

  7. Create alerts to observe and track the health of jobs.

Tools for transforming, integrating, and observing data

You can use the following tools:

DataStage
Transform batch data. To learn how to create a DataStage flow that combines data from multiple external sources, follow the Transform batch data tutorial.
StreamSets
Stream real-time data. After you set up and run a streaming engine, create a StreamSets flow.
Data Replication
Replicate data between a source and target data store. To learn how to create a replication asset, follow the Replicate data tutorial.
Unstructured Data Integration
Prepare unstructured data. Create unstructured data flows to ingest, transform, and enrich unstructured data.
Data Observability
Observe your data. To learn how to observe the health of DataStage jobs, follow the Observe data tutorial.