Engine versions

When you create a StreamSets environment, you select the Data Collector engine version to use. Use the latest engine version so that you have the latest updates and features.

7.4.x engine versions

Watsonx.data integration includes the following Data Collector engine version:
  • 7.4.0 - released on April 30, 2026

7.4.x new features and enhancements

Connections
You can now use connections in StreamSets flows.
For more information and a list of available connections, see Connections.
Kafka support for IAM access control for Amazon MSK
To access Kafka on Amazon MSK from Kafka stages, you can configure a custom authentication option to use IAM access control.
Data default enhancements for several destinations
The Databricks, Google BigQuery, Snowflake, and Teradata targets now provide a single Data defaults for missing or invalid fields property on the Data Defaults tab.
Use this property to specify default values when a field is missing or contains invalid data. If you do not define a default value, the target writes missing or invalid fields as null. You can specify a different default value for each data type.
Previously, these targets provided a separate default property for each data type. This change has upgrade impact when you use a Snowflake target with the JSON or Parquet staging file format.
Additional Snowflake target enhancements
  • You can enable a new Allow column number mismatch property to avoid parsing errors that can occur when you enable both data drift processing and multithreaded processing.
  • You can now use the target to process list data in Variant fields.

7.4.x upgrade impact

Update Snowflake targets that use the JSON or Parquet staging file format
With this release, when you use the JSON or Parquet staging file format and do not define values for the Data defaults for missing or invalid fields property, the Snowflake target writes missing or invalid fields as null.
Previously when you used the JSON or Parquet staging file format and did not define default values, the target wrote missing or invalid fields as \N. To keep the same behavior, update the Data defaults for missing or invalid fields property to specify \N as the default value for each data type.
Review pipelines that process PostgreSQL interval data
With this release, the format for PostgreSQL Interval fields has changed.
Earlier Data Collector releases used an earlier PostgreSQL driver that included all time units in interval data, even when those units are set to 0. This release includes a PostgreSQL driver update, and the updated driver no longer includes time units that are set to 0 in interval data.
The following table illustrates the difference in processed interval data:
Interval Interval data 7.3.x and earlier Interval data with 7.4.0 and later
2 years 2 years 0 mons 0 days 0 hours 0 mins 0.0 secs 2 years
3 months and 15 days 0 years 3 mons 15 days 0 hours 0 mins 0.0 secs 3 mons 15 days
2 hours and 30 seconds 0 years 0 mons 0 days 2 hours 0 mins 30.0 secs 2 hours 30 secs
After you upgrade to Data Collector 7.4.0, review flows that process PostgreSQL interval data and update downstream processing as needed.
The following sources can be used to process PostgreSQL data:
  • Aurora PostgreSQL CDC Client
  • JDBC Multitable Consumer
  • JDBC Query Consumer
  • PostgreSQL CDC Client

7.4.0 fixed issues

  • When not using continuous mining, the Oracle CDC Client source performs slowly when configured to use the Direct Fetch strategy.
  • Snowflake parsing errors can occur in a multithreaded Snowflake target with data drift enabled.
  • When the Snowflake target runs out of space for the temporary files required for processing, it treats all subsequent data as error records. With this fix, the target generates an error and stops the flow instead.
  • The JDBC Producer target generates error records when processing data that includes timezone fields.
  • The Named Pipe target writes the last record of a batch and the first record of the subsequent batch to the same line.
  • The Azure Data Lake Gen 2 source does not update the offset correctly when used with a path pattern that includes a combination of regular expressions between the file name and wildcards.
  • Flows that include the Snowflake target are not upgraded properly between Data Collector versions 5.11.0 and 7.3.0. With this fix, you can upgrade Snowflake target flows from earlier versions to Data Collector 7.4.0 without errors.

7.4.x known issue

  • The Oracle CDC source can generate a null pointer exception when the Ignore Materialized Views property is disabled.

7.2.x engine versions

Watsonx.data integration includes the following Data Collector engine version:
  • 7.2.0 - released on February 27, 2026

7.2.x new features and enhancements

IBM Cloud Storage origin enhancement
You can now use the IBM Cloud Storage source to process Parquet data.
Kafka enhancements
  • Kafka 4.x support - You can now use Kafka stages to process data in Kafka 4.x, in addition to Kafka 3.x. This enhancement has upgrade impact.
  • Support for SASL/OAUTHBEARER authentication - You can now use SASL/OAUTHBEARER authentication with Kafka stages by choosing the Custom authentication security option and defining related properties.
Web Client response length enhancement
You can now define a maximum entity length in characters and bytes for Web Client stages that log responses.
Improved signing key validation
Signing key sizes for the following stages are now validated according to RFC 7518 specifications:
  • Salesforce stages
  • HTTP stages
  • Web Client stages

7.2.x upgrade impact

Update Web Client stages that use OAuth 2 with signed access tokens
With this release, Web Client stages that use OAuth 2 with signed JWT access tokens now require Base64-encoded signing keys. In earlier releases, signing keys could be provided in plain text and did not require Base64 encoding.
After you upgrade to 7.2.x, ensure that all Web Client stages that use the OAuth 2 authentication method with a signed access token are updated to use a Base64-encoded signing key in the Signing Key property.
Create Kafka topics for Kafka Producer flows
With this release, the topics specified in the Topic property of a Kafka Producer target must exist before the flow starts. If those topics do not exist, the flow fails to start. Before this release, the target created the topics when necessary.
After upgrading to version 7.2.x, ensure that every topic specified in the Topic property of the Kafka Producer target exists in the Kafka cluster before you start the pipeline.
If you use an expression to specify the topics to write to, the Kafka Producer target creates topics as needed. Creating topics in advance is not necessary.

7.2.x known issues

  • The Oracle CDC source can generate a null pointer exception when the Ignore Materialized Views property is disabled.
  • When configured to produce events and a schema change occurs, the SQL Server CDC Client source can generate duplicate schema-change events after a flow restart.