Feature differences for watsonx.data integration

IBM watsonx.data integration as a Service includes tools that are also offered as separate service offerings. When included in watsonx.data integration, these tools provide multiple data integration styles. When used separately, each service offering provides a single data integration style.

The watsonx.data integration tools are offered as the following separate service offerings:

  • IBM StreamSets as a Service
  • IBM Data Replication on Cloud Pak for Data as a Service
  • IBM Data Observability by Databand
  • IBM DataStage on Cloud Pak for Data as a Service

The tools differ from the service offerings in the following ways:

StreamSets differences

The StreamSets tool included in IBM watsonx.data integration and the IBM StreamSets as a Service product offering provide common functionality. However, they differ in the following key ways:

Differences between the StreamSets product offerings
Feature Watsonx.data integration capability IBM StreamSets as a Service
Control plane IBM watsonx platform Control Hub
Engine management An environment defines the engine configuration. You create an environment in your project, then run the engine in your corporate network. An environment defines where the engine runs. A deployment defines the engine configuration. You create an environment and a deployment for your organization, then deploy the engine to your corporate network.
Automatically provision engines to AWS, Azure, or GCP Not available Available
Automatically provision engines to a Kubernetes cluster Not currently available Available
Engine versions Data Collector 6.3.x and later Data Collector 4.x and later
Engine communication Engines use direct engine REST APIs and HTTPS. WebSocket tunneling is not currently available. Engines can use WebSocket tunneling or direct engine REST APIs and HTTPS.
Pipeline development Pipelines are referred to as flows. Drag stages in the IBM watsonx canvas to build a flow. Drag stages in the Control Hub canvas to build a pipeline.
Sources and targets Supported sources and targets Referred to as origins and destinations. Supported origins and destinations.
Low code and no code transformations Supported processors and executors Supported processors and executors
Connections Supported connections Supported connections
Preview Provides basic preview functionality that allows configuring preview properties and viewing how data changes through the flow. Includes additional features, such as editing preview data and previewing data for a group of stages.
Pipeline versioning Not available Available
Parameters Not currently available Available
Fragments Not currently available Available
Sample pipelines Not currently available Available
Draft runs of pipelines Instead of using draft runs, you can directly run a job from the canvas. Start a draft run of a pipeline from the canvas for development purposes. Draft runs have limited job functions.
Jobs A single job run processes a single flow. A single job run can process multiple instances of the same pipeline.
Job offsets The first time a job runs, processing starts with the initial offset of the source stage. Subsequent runs start from the last-saved offset, by default. You can view a job offset and reset the offset. The first time a job runs, processing starts with the initial offset of the source stage. Subsequent runs start from the last-saved offset, by default. You can view a job offset, reset the offset, and upload an initial-offset file.
Run a list of jobs in order Not currently available Create a sequence to run a list of jobs in order based on conditions.
Scheduled jobs Not currently available Available
Job failover Not currently available Available
Snapshots Not currently available During a job run for a pipeline, capture and review a set of the data that is being processed.
Operations dashboard Not currently available Displays a summary of triggered alerts, jobs with errors, offline engines, and engines that exceed resource thresholds.
Alerts Not currently available Available
Subscriptions Not currently available Available
Reports Not currently available Available
Topologies Not available Available
Export and import Basic export and import for flows. Export and import pipelines, fragments, jobs, or topologies to migrate the objects from one organization to another.
Users and groups Managed in the IBM Cloud account Managed in Control Hub
API credentials Managed in the IBM Cloud account Managed in Control Hub
Python SDK Use the IBM watsonx.data integration SDK for Python. Use the StreamSets Platform SDK for Python.

Data Replication differences

The Data Replication tool included in IBM watsonx.data integration and the IBM Data Replication on Cloud Pak for Data as a Service product offering provide common functionality. However, they differ in the following key ways:

Differences between the Data Replication product offerings
Feature Watsonx.data integration capability Data Replication on Cloud Pak for Data as a Service
Sources - Amazon RDS for PostgreSQL
- IBM Db2 LUW
- IBM Db2 on Cloud
- PostgreSQL
- Amazon RDS for PostgreSQL
- IBM Db2 LUW
- IBM® Db2 on Cloud
- PostgreSQL
- Oracle
Active replications Trial plan: 2 replications running at a time.
Paid plan: No restrictions
1 replication running at a time.
Availability Generally available Beta only
Region Toronto Deployment: Dallas
Service scope: Global

Data Observability differences

The Data Observability tool included in IBM watsonx.data integration and the IBM Data Observability by Databand product offering provide common functionality. However, they differ in the following key ways:

Differences between the Data Observability product offerings
Feature Watsonx.data integration capability IBM Data Observability by Databand
Integrations with other product/technological stack For the IBM Data Engineering stack the integration with Next-Gen DataStage is available. For the Modern Data Stack the following integrations with IBM Data Observability by Databand are available:
- Apache Airflow
- dbt
- Azure Data Factory
- Snowflake
- Google BigQuery
- StreamSets
- Next-Gen DataStage
- Control M
- Apache Spark
- Custom API (you can create a custom API integration to connect with the orchestration or data integration tool of your choice)
Pipeline alerting capabilities You can set up the following alerts for DataStage jobs:
- Job run state alert
- Job run duration alert
You can set up the following alerts:
- Pipeline state
- Pipeline SLA
- Pipeline duration
- dbt test
- Schema change
Task alerting capabilities You can set up the DataStage system metric alert. You can set up the following alerts:
- Task state
- Task duration
- Custom task metric
Issue detection engine Without requiring any setup, the AI agent automatically detects problems in your DataStage jobs and provides diagnostics by reading logs and analyzing the job flow. Not available.
Issue remediation Use the issue remediation functionality to get all the necessary details and AI recommendations on how to troubleshoot the detected issues with your data. Not available.
Dataset alerting capabilities Dataset alerts are unavailable. You can set up the following alerts:
- Tables data quality
- Data delay
- Data quality metric
- Missing operations
- Operations data quality
Alert receivers You can get notified with the following alert receivers:
- Slack
- Email
- PagerDuty
- Microsoft Teams
You can get notified with the following alert receivers:
- Slack channel
- Slack webhook
- Email
- PagerDuty
- Microsoft Teams
- Custom webhook
Anomaly detection Anomaly detection for pipeline duration alerts and custom task metric alerts is not currently available. Anomaly detection is available for:
- Pipeline duration alert
- Custom task metric alert
Dashboard You can check the number of:
- Triggered alerts and whether there was a growth or a decline in the number of triggered alerts
- Open alerts
- Jobs with alerts

The following widgets are not currently available:
- Top errors widget
- Pipeline stats
- Pipeline metrics widget
- Triggered alerts over time
You can display:
- Overall view of pipeline run states
- Top errors widget
- Pipeline stats
- Pipeline metrics widget
- Last active runs

DataStage differences

The DataStage tool included in IBM watsonx.data integration and the IBM DataStage on Cloud Pak for Data as a Service product offering provide common functionality. However, they differ in the following key ways:

Differences between the DataStage tool included in watsonx.data integration and the IBM DataStage as a Service product offering
Feature Watsonx.data integration capability DataStage on Cloud Pak for Data as a Service
AI flow assistant Not available in all regions. See Regional availability for more information. Available

Learn more