Feature differences for watsonx.data integration

IBM watsonx.data integration as a Service includes tools that are also offered as separate service offerings. When included in watsonx.data integration, these tools provide multiple data integration styles. When used separately, each service offering provides a single data integration style.

The watsonx.data integration tools are offered as the following separate service offerings:

IBM StreamSets as a Service
IBM Data Replication on Cloud Pak for Data as a Service
IBM Data Observability by Databand
IBM DataStage on Cloud Pak for Data as a Service

The tools differ from the service offerings in the following ways:

StreamSets differences
Data Replication differences
Data Observability differences
DataStage differences

StreamSets differences

The StreamSets tool included in IBM watsonx.data integration and the IBM StreamSets as a Service product offering provide common functionality. However, they differ in the following key ways:

Differences between the StreamSets product offerings
Feature	Watsonx.data integration capability	IBM StreamSets as a Service
Control plane	IBM watsonx platform	Control Hub
Engine management	An environment defines the engine configuration. You create an environment in your project, then run the engine in your corporate network.	An environment defines where the engine runs. A deployment defines the engine configuration. You create an environment and a deployment for your organization, then deploy the engine to your corporate network.
Automatically provision engines to AWS, Azure, or GCP	Not available	Available
Automatically provision engines to a Kubernetes cluster	Not currently available	Available
Engine versions	Data Collector 6.3.x and later	Data Collector 4.x and later
Engine communication	Engines use direct engine REST APIs and HTTPS. WebSocket tunneling is not currently available.	Engines can use WebSocket tunneling or direct engine REST APIs and HTTPS.
Pipeline development	Pipelines are referred to as flows. Drag stages in the IBM watsonx canvas to build a flow.	Drag stages in the Control Hub canvas to build a pipeline.
Sources and targets	Supported sources and targets	Referred to as origins and destinations. Supported origins and destinations.
Low code and no code transformations	Supported processors and executors	Supported processors and executors
Connections	Supported connections	Supported connections
Preview	Provides basic preview functionality that allows configuring preview properties and viewing how data changes through the flow.	Includes additional features, such as editing preview data and previewing data for a group of stages.
Pipeline versioning	Not available	Available
Parameters	Not currently available	Available
Fragments	Not currently available	Available
Sample pipelines	Not currently available	Available
Draft runs of pipelines	Instead of using draft runs, you can directly run a job from the canvas.	Start a draft run of a pipeline from the canvas for development purposes. Draft runs have limited job functions.
Jobs	A single job run processes a single flow.	A single job run can process multiple instances of the same pipeline.
Job offsets	The first time a job runs, processing starts with the initial offset of the source stage. Subsequent runs start from the last-saved offset, by default. You can view a job offset and reset the offset.	The first time a job runs, processing starts with the initial offset of the source stage. Subsequent runs start from the last-saved offset, by default. You can view a job offset, reset the offset, and upload an initial-offset file.
Run a list of jobs in order	Not currently available	Create a sequence to run a list of jobs in order based on conditions.
Scheduled jobs	Not currently available	Available
Job failover	Not currently available	Available
Snapshots	Not currently available	During a job run for a pipeline, capture and review a set of the data that is being processed.
Operations dashboard	Not currently available	Displays a summary of triggered alerts, jobs with errors, offline engines, and engines that exceed resource thresholds.
Alerts	Not currently available	Available
Subscriptions	Not currently available	Available
Reports	Not currently available	Available
Topologies	Not available	Available
Export and import	Basic export and import for flows.	Export and import pipelines, fragments, jobs, or topologies to migrate the objects from one organization to another.
Users and groups	Managed in the IBM Cloud account	Managed in Control Hub
API credentials	Managed in the IBM Cloud account	Managed in Control Hub
Python SDK	Use the IBM watsonx.data integration SDK for Python.	Use the StreamSets Platform SDK for Python.

Data Replication differences

The Data Replication tool included in IBM watsonx.data integration and the IBM Data Replication on Cloud Pak for Data as a Service product offering provide common functionality. However, they differ in the following key ways:

Differences between the Data Replication product offerings
Feature	Watsonx.data integration capability	Data Replication on Cloud Pak for Data as a Service
Sources	- Amazon RDS for PostgreSQL - IBM Db2 LUW - IBM Db2 on Cloud - PostgreSQL	- Amazon RDS for PostgreSQL - IBM Db2 LUW - IBM® Db2 on Cloud - PostgreSQL - Oracle
Active replications	Trial plan: 2 replications running at a time. Paid plan: No restrictions	1 replication running at a time.
Availability	Generally available	Beta only
Region	Toronto	Deployment: Dallas Service scope: Global

Data Observability differences

The Data Observability tool included in IBM watsonx.data integration and the IBM Data Observability by Databand product offering provide common functionality. However, they differ in the following key ways:

Differences between the Data Observability product offerings
Feature	Watsonx.data integration capability	IBM Data Observability by Databand
Integrations with other product/technological stack	For the IBM Data Engineering stack the integration with Next-Gen DataStage is available.	For the Modern Data Stack the following integrations with IBM Data Observability by Databand are available: - Apache Airflow - dbt - Azure Data Factory - Snowflake - Google BigQuery - StreamSets - Next-Gen DataStage - Control M - Apache Spark - Custom API (you can create a custom API integration to connect with the orchestration or data integration tool of your choice)
Pipeline alerting capabilities	You can set up the following alerts for DataStage jobs: - Job run state alert - Job run duration alert	You can set up the following alerts: - Pipeline state - Pipeline SLA - Pipeline duration - dbt test - Schema change
Task alerting capabilities	You can set up the DataStage system metric alert.	You can set up the following alerts: - Task state - Task duration - Custom task metric
Issue detection engine	Without requiring any setup, the AI agent automatically detects problems in your DataStage jobs and provides diagnostics by reading logs and analyzing the job flow.	Not available.
Issue remediation	Use the issue remediation functionality to get all the necessary details and AI recommendations on how to troubleshoot the detected issues with your data.	Not available.
Dataset alerting capabilities	Dataset alerts are unavailable.	You can set up the following alerts: - Tables data quality - Data delay - Data quality metric - Missing operations - Operations data quality
Alert receivers	You can get notified with the following alert receivers: - Slack - Email - PagerDuty - Microsoft Teams	You can get notified with the following alert receivers: - Slack channel - Slack webhook - Email - PagerDuty - Microsoft Teams - Custom webhook
Anomaly detection	Anomaly detection for pipeline duration alerts and custom task metric alerts is not currently available.	Anomaly detection is available for: - Pipeline duration alert - Custom task metric alert
Dashboard	You can check the number of: - Triggered alerts and whether there was a growth or a decline in the number of triggered alerts - Open alerts - Jobs with alerts The following widgets are not currently available: - Top errors widget - Pipeline stats - Pipeline metrics widget - Triggered alerts over time	You can display: - Overall view of pipeline run states - Top errors widget - Pipeline stats - Pipeline metrics widget - Last active runs

DataStage differences

The DataStage tool included in IBM watsonx.data integration and the IBM DataStage on Cloud Pak for Data as a Service product offering provide common functionality. However, they differ in the following key ways:

Differences between the DataStage tool included in watsonx.data integration and the IBM DataStage as a Service product offering
Feature	Watsonx.data integration capability	DataStage on Cloud Pak for Data as a Service
AI flow assistant	Not available in all regions. See Regional availability for more information.	Available

Learn more

Watsonx.data integration services