Engine versions
When you create a StreamSets environment, you select the Data Collector engine version to use. Use the latest engine version so that you have the latest updates and features.
7.4.x engine versions
Watsonx.data integration includes the
following Data Collector
engine version:
- 7.4.0 - released on April 30, 2026
7.4.x new features and enhancements
- Connections
- You can now use connections in StreamSets flows.
- Kafka support for IAM access control for Amazon MSK
- To access Kafka on Amazon MSK from Kafka stages, you can configure a custom authentication option to use IAM access control.
- Data default enhancements for several destinations
- The Databricks, Google BigQuery, Snowflake, and Teradata targets now provide a single Data defaults for missing or invalid fields property on the Data Defaults tab.
- Additional Snowflake target enhancements
-
- You can enable a new Allow column number mismatch property to avoid parsing errors that can occur when you enable both data drift processing and multithreaded processing.
- You can now use the target to process list data in Variant fields.
7.4.x upgrade impact
- Update Snowflake targets that use the JSON or Parquet staging file format
- With this release, when you use the JSON or Parquet staging file format and do not define values for the Data defaults for missing or invalid fields property, the Snowflake target writes missing or invalid fields as null.
- Review pipelines that process PostgreSQL interval data
- With this release, the format for PostgreSQL Interval fields has changed.
7.4.0 fixed issues
- When not using continuous mining, the Oracle CDC Client source performs slowly when configured to use the Direct Fetch strategy.
- Snowflake parsing errors can occur in a multithreaded Snowflake target with data drift enabled.
- When the Snowflake target runs out of space for the temporary files required for processing, it treats all subsequent data as error records. With this fix, the target generates an error and stops the flow instead.
- The JDBC Producer target generates error records when processing data that includes timezone fields.
- The Named Pipe target writes the last record of a batch and the first record of the subsequent batch to the same line.
- The Azure Data Lake Gen 2 source does not update the offset correctly when used with a path pattern that includes a combination of regular expressions between the file name and wildcards.
- Flows that include the Snowflake target are not upgraded properly between Data Collector versions 5.11.0 and 7.3.0. With this fix, you can upgrade Snowflake target flows from earlier versions to Data Collector 7.4.0 without errors.
7.4.x known issue
- The Oracle CDC source can generate a null pointer exception when the Ignore Materialized Views property is disabled.
7.2.x engine versions
Watsonx.data integration includes the
following Data Collector
engine version:
- 7.2.0 - released on February 27, 2026
7.2.x new features and enhancements
- IBM Cloud Storage origin enhancement
- You can now use the IBM Cloud Storage source to process Parquet data.
- Kafka enhancements
-
- Kafka 4.x support - You can now use Kafka stages to process data in Kafka 4.x, in addition to Kafka 3.x. This enhancement has upgrade impact.
- Support for SASL/OAUTHBEARER authentication - You can now use SASL/OAUTHBEARER authentication with Kafka stages by choosing the Custom authentication security option and defining related properties.
- Web Client response length enhancement
- You can now define a maximum entity length in characters and bytes for Web Client stages that log responses.
- Improved signing key validation
- Signing key sizes for the following stages are now validated according to RFC 7518 specifications:
- Salesforce stages
- HTTP stages
- Web Client stages
7.2.x upgrade impact
- Update Web Client stages that use OAuth 2 with signed access tokens
- With this release, Web Client stages that use OAuth 2 with signed JWT access tokens now require Base64-encoded signing keys. In earlier releases, signing keys could be provided in plain text and did not require Base64 encoding.
- Create Kafka topics for Kafka Producer flows
- With this release, the topics specified in the Topic property of a Kafka Producer target must exist before the flow starts. If those topics do not exist, the flow fails to start. Before this release, the target created the topics when necessary.
7.2.x known issues
- The Oracle CDC source can generate a null pointer exception when the Ignore Materialized Views property is disabled.
- When configured to produce events and a schema change occurs, the SQL Server CDC Client source can generate duplicate schema-change events after a flow restart.