Feature differences for watsonx.data integration
IBM watsonx.data integration as a Service includes tools that are also offered as separate service offerings. When included in watsonx.data integration, these tools provide multiple data integration styles. When used separately, each service offering provides a single data integration style.
The watsonx.data integration tools are offered as the following separate service offerings:
- IBM StreamSets as a Service
- IBM Data Replication on Cloud Pak for Data as a Service
- IBM Data Observability by Databand
- IBM DataStage on Cloud Pak for Data as a Service
The tools differ from the service offerings in the following ways:
- StreamSets differences
- Data Replication differences
- Data Observability differences
- DataStage differences
StreamSets differences
The StreamSets tool included in IBM watsonx.data integration and the IBM StreamSets as a Service product offering provide common functionality. However, they differ in the following key ways:
| Feature | Watsonx.data integration capability | IBM StreamSets as a Service |
|---|---|---|
| Control plane | IBM watsonx platform | Control Hub |
| Engine management | An environment defines the engine configuration. You create an environment in your project, then run the engine in your corporate network. | An environment defines where the engine runs. A deployment defines the engine configuration. You create an environment and a deployment for your organization, then deploy the engine to your corporate network. |
| Automatically provision engines to AWS, Azure, or GCP | Not available | Available |
| Automatically provision engines to a Kubernetes cluster | Not currently available | Available |
| Engine versions | Data Collector 6.3.x and later | Data Collector 4.x and later |
| Engine communication | Engines use direct engine REST APIs and HTTPS. WebSocket tunneling is not currently available. | Engines can use WebSocket tunneling or direct engine REST APIs and HTTPS. |
| Pipeline development | Pipelines are referred to as flows. Drag stages in the IBM watsonx canvas to build a flow. | Drag stages in the Control Hub canvas to build a pipeline. |
| Sources and targets | Supported sources and targets | Referred to as origins and destinations. Supported origins and destinations. |
| Low code and no code transformations | Supported processors and executors | Supported processors and executors |
| Connections | Supported connections | Supported connections |
| Preview | Provides basic preview functionality that allows configuring preview properties and viewing how data changes through the flow. | Includes additional features, such as editing preview data and previewing data for a group of stages. |
| Pipeline versioning | Not available | Available |
| Parameters | Not currently available | Available |
| Fragments | Not currently available | Available |
| Sample pipelines | Not currently available | Available |
| Draft runs of pipelines | Instead of using draft runs, you can directly run a job from the canvas. | Start a draft run of a pipeline from the canvas for development purposes. Draft runs have limited job functions. |
| Jobs | A single job run processes a single flow. | A single job run can process multiple instances of the same pipeline. |
| Job offsets | The first time a job runs, processing starts with the initial offset of the source stage. Subsequent runs start from the last-saved offset, by default. You can view a job offset and reset the offset. | The first time a job runs, processing starts with the initial offset of the source stage. Subsequent runs start from the last-saved offset, by default. You can view a job offset, reset the offset, and upload an initial-offset file. |
| Run a list of jobs in order | Not currently available | Create a sequence to run a list of jobs in order based on conditions. |
| Scheduled jobs | Not currently available | Available |
| Job failover | Not currently available | Available |
| Snapshots | Not currently available | During a job run for a pipeline, capture and review a set of the data that is being processed. |
| Operations dashboard | Not currently available | Displays a summary of triggered alerts, jobs with errors, offline engines, and engines that exceed resource thresholds. |
| Alerts | Not currently available | Available |
| Subscriptions | Not currently available | Available |
| Reports | Not currently available | Available |
| Topologies | Not available | Available |
| Export and import | Basic export and import for flows. | Export and import pipelines, fragments, jobs, or topologies to migrate the objects from one organization to another. |
| Users and groups | Managed in the IBM Cloud account | Managed in Control Hub |
| API credentials | Managed in the IBM Cloud account | Managed in Control Hub |
| Python SDK | Use the IBM watsonx.data integration SDK for Python. | Use the StreamSets Platform SDK for Python. |
Data Replication differences
The Data Replication tool included in IBM watsonx.data integration and the IBM Data Replication on Cloud Pak for Data as a Service product offering provide common functionality. However, they differ in the following key ways:
| Feature | Watsonx.data integration capability | Data Replication on Cloud Pak for Data as a Service |
|---|---|---|
| Sources | - Amazon RDS for PostgreSQL - IBM Db2 LUW - IBM Db2 on Cloud - PostgreSQL |
- Amazon RDS for PostgreSQL - IBM Db2 LUW - IBM® Db2 on Cloud - PostgreSQL - Oracle |
| Active replications | Trial plan: 2 replications running at a time. Paid plan: No restrictions |
1 replication running at a time. |
| Availability | Generally available | Beta only |
| Region | Toronto | Deployment: Dallas Service scope: Global |
Data Observability differences
The Data Observability tool included in IBM watsonx.data integration and the IBM Data Observability by Databand product offering provide common functionality. However, they differ in the following key ways:
| Feature | Watsonx.data integration capability | IBM Data Observability by Databand |
|---|---|---|
| Integrations with other product/technological stack | For the IBM Data Engineering stack the integration with Next-Gen DataStage is available. | For the Modern Data Stack the following integrations with IBM Data Observability by Databand are available: - Apache Airflow - dbt - Azure Data Factory - Snowflake - Google BigQuery - StreamSets - Next-Gen DataStage - Control M - Apache Spark - Custom API (you can create a custom API integration to connect with the orchestration or data integration tool of your choice) |
| Pipeline alerting capabilities | You can set up the following alerts for DataStage jobs: - Job run state alert - Job run duration alert |
You can set up the following alerts: - Pipeline state - Pipeline SLA - Pipeline duration - dbt test - Schema change |
| Task alerting capabilities | You can set up the DataStage system metric alert. | You can set up the following alerts: - Task state - Task duration - Custom task metric |
| Issue detection engine | Without requiring any setup, the AI agent automatically detects problems in your DataStage jobs and provides diagnostics by reading logs and analyzing the job flow. | Not available. |
| Issue remediation | Use the issue remediation functionality to get all the necessary details and AI recommendations on how to troubleshoot the detected issues with your data. | Not available. |
| Dataset alerting capabilities | Dataset alerts are unavailable. | You can set up the following alerts: - Tables data quality - Data delay - Data quality metric - Missing operations - Operations data quality |
| Alert receivers | You can get notified with the following alert receivers: - Slack - PagerDuty - Microsoft Teams |
You can get notified with the following alert receivers: - Slack channel - Slack webhook - PagerDuty - Microsoft Teams - Custom webhook |
| Anomaly detection | Anomaly detection for pipeline duration alerts and custom task metric alerts is not currently available. | Anomaly detection is available for: - Pipeline duration alert - Custom task metric alert |
| Dashboard | You can check the number of: - Triggered alerts and whether there was a growth or a decline in the number of triggered alerts - Open alerts - Jobs with alerts The following widgets are not currently available: - Top errors widget - Pipeline stats - Pipeline metrics widget - Triggered alerts over time |
You can display: - Overall view of pipeline run states - Top errors widget - Pipeline stats - Pipeline metrics widget - Last active runs |
DataStage differences
The DataStage tool included in IBM watsonx.data integration and the IBM DataStage on Cloud Pak for Data as a Service product offering provide common functionality. However, they differ in the following key ways:
| Feature | Watsonx.data integration capability | DataStage on Cloud Pak for Data as a Service |
|---|---|---|
| AI flow assistant | Not available in all regions. See Regional availability for more information. | Available |