What's new and changed in watsonx.data

watsonx.data updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM Software Hub.

Installing or upgrading watsonx.data

Ready to install or upgrade watsonx.data?

To install watsonx.data along with the other IBM® Software Hub services, see Installing IBM Software Hub.
To upgrade watsonx.data along with the other IBM Software Hub services, see Upgrading IBM Software Hub.
To install or upgrade watsonx.data independently, see watsonx.data.
Remember: All of the IBM Software Hub components associated with an instance of IBM Software Hub must be installed at the same version.

IBM Software Hub Version 5.2.2

A new version of watsonx.data was released in October 2025 with IBM Software Hub 5.2.2.

Operand version: 2.2.2

This release includes the following changes:

New features: For a complete list of new and updated features in this release, see the Release notes in the watsonx.data documentation.

Customer-reported issues fixed in this release: For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak® for Data on the IBM Support website.

IBM Software Hub Version 5.2.1

A new version of watsonx.data was released in August 2025 with IBM Software Hub 5.2.1.

Operand version: 2.2.1

This release includes the following changes:

New features

This release of watsonx.data includes the following features:

Engine and service enhancements

This release of watsonx.data includes the following engine related enhancement:

Introduced version v3 of the watsonx.data API. You can continue to use version v2 until watsonx.data version 2.3.
The Gluten accelerated Spark engine in watsonx.data can now run applications by using Spark version 3.5.
Spark engines can now access Amazon S3 storage by using IAM roles, improving flexibility and security.
You can now provision watsonx.data Spark engine with the Spark runtime set to Spark 4.0, which enables you to run Spark applications on Spark 4.0.
The Milvus service in watsonx.data is now upgraded to version 2.5.12.
You can now use the open-source Milvus backup tool to back up and restore data from Milvus within watsonx.data.
The GET /instance_configuration now returns internal host URLs for key services such as Meta Data Service (MDS), Common Policy Gateway (CPG), and Data Access Service (DAS) when watsonx.data is deployed on the same cluster as IBM Software Hub.
You can now use the Vector Transport Service (VTS) with Milvus in watsonx.data to migrate or manage vector data across systems.

Query Optimizer enhancement: You can now monitor query performance improvements through the optimizer dashboard. The optimizer is actively managing query plans for the associated catalogs and improving performance for Presto (C++) engines.

Spark enhancement

Customize your Spark application payload: When you submit a Spark application in watsonx.data, you can customize the application payload to include the following features:

Idempotency keys: Ensures that application submissions are processed only once, even in cases of client-server communication failures.
Retry feature: Controls how many times your Spark application should automatically retry submission in case of failure.
Maximum runtime controls: Defines a maximum execution time for Spark applications. If the timeout is not specified, jobs continue to run until completion, regardless of how long they take.

CPDCTL CLI Enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

Starting with version 2.2.1, you can use HashiCorp Vault through cpdctl for secure secrets management and streamlined automation workflows.
A new option under the service command wx-data service generate-engine-dump allows you to generate dumps for Presto worker and coordinator nodes in watsonx.data.
Use the new component command to retrieve configuration details and status of various components in watsonx.data.
Starting from CPDCTL version 1.8.5, users no longer need to set the instance ID as an environment variable. This method is deprecated and will be removed in a future release. Instead, set the instance ID directly using the profile command.

Data sources and storage enhancements

You can now import catalogs and projects from the data platform for the following data sources:

IBM Db2
IBM Netezza
MySQL
Oracle
PostgreSQL
Snowflake
SQL Server

Privilege management

Privilege management for a Milvus service in watsonx.data now includes the following global privileges:
- DescribeDatabase – Provides detailed information about the specified database.
- AlterDatabase - Modifies the properties of an existing database.
IBM and third-party products or services can now use a new external Thrift endpoint to seamlessly consume the metadata service through the Thrift over HTTP protocol.

Ingestion enhancement: You can now upload files in the .txt file format for ingestion in addition to other supported formats.

OpenTelemetry enhancement

Expanded OpenTelemetry feature

OpenTelemetry is now available for the Presto (C++) engine, in addition to the existing Presto (Java) engine and the Milvus service. This expanded capability provides a unified way to collect and export telemetry data (traces, metrics, and logs) from all three services, giving you a more complete picture of your system's performance.
Easier UI configuration for OpenTelemetry

It is now easier to use the OpenTelemetry configuration UI to streamline the management of diagnostic metrics because you can access, review, enable, disable, edit, view details, and disassociate metrics from the diagnostics list.
New Instana dashboards

The Instana integration now includes four additional dashboards, bringing the total to eight dashboards. These additions provide a more comprehensive view of system health and performance. The four new dashboards are Query latency health, Query lifecycle health, Anomaly and trend insights, and Log and error health.
New Grafana dashboards

You can now use Granada dashboards to monitor the performance of your Prometheus data sources. The four dashboards are System health, Query performance health, Data and metadata health, and Workload health.

Semantic automation for data enrichment: watsonx.data now supports semantic search capabilities that allow users to query data using natural language, making data exploration more intuitive and efficient.

To read more about these features, see What's new and changed in watsonx.data.

Customer-reported issues fixed in this release

For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.

Deprecated features

The following features were deprecated in this release:

watsonx.data API version v2 is now deprecated in watsonx.data. It will be completely removed from watsonx.data software edition in version 2.3.0. You must migrate to the latest supported API version v3 to ensure continued compatibility and access to new features.

IBM Software Hub Version 5.2.0

A new version of watsonx.data was released in June 2025 with IBM Software Hub 5.2.0.

Operand version: 2.2.0

This release includes the following changes:

New features

This release of watsonx.data includes the following features:

Engine and service enhancements

For the Presto C++ engines, the Hive and Iceberg catalogs are now enabled with region configuration.
You can now connect to Presto C ++ engines by using a proxy server, when the HTTP_PROXY environment variable is enabled in watsonx.data.
New Gluten accelerated Spark engine: You can now provision Gluten accelerated Spark engine and use it to run complex analytical workloads by leveraging high scalability of Spark SQL framework and high performance of native libraries.
Run faster workspace queries by using a Spark job to transform Iceberg table data : To speed up the reading of Iceberg tables, you can now use a Spark job to transform Iceberg table data from Merge-on-Read (MOR) format to Copy-on-Write (COW) format.
Introduced Lightweight engine mode for installing the watsonx.data software. You can now choose the lightweight engine installation mode to install only the Milvus service, without Presto or Spark.
Introduced custom size T-shirt configuration for provisioning a Milvus service. You can also scale the Milvus service between the predefined T-shirt sizes (small, medium, and large) or custom sizes.
You can use the Spark API functionality to configure the limit of applications that can be listed and the filter criteria that you can use to filter the applications.

CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

You can use the tablemaint command to execute different Iceberg table maintenance operations in watsonx.data.
You can use the wx-data service command to perform various serviceability related operations, such as listing tables, retrieving the list of QHMM enabled buckets, and monitoring QHMM related statistics and queries. For more information, see IBM cpdctl.

Integration enhancements

This release of watsonx.data introduces the following enhanced integration with other services:

New delivery method: Deliver as a table in watsonx.data
Data products using supported data sources can now be delivered to your instance of watsonx.data tables by using the deliver as a table in watsonx.data method. This method allows users with the appropriate permissions to create new tables or append to existing ones.
New delivery method: Access in watsonx.data
You can now subscribe to a data product created from the watsonx.data instance by using the access in watsonx.data delivery method. This method lets consumers directly access watsonx.data resources through Data Product Hub. After delivery, consumers will see details on how to access the watsonx.data instance and the specific resources they have access to.
You can now connect to the Spark query server in the following ways and execute queries to analyze your data.
- Using DBeaver (JDBC clients)
- Using Java (JDBC Client) code
- Using Python (PyHive JDBC Client)
Now, you can test connection for the IBM Knowledge Catalog.

Data sources and storage enhancements

This release of watsonx.data includes the following storage enhancement:

You can now use the SQL Server with Microsoft Entra authentication.
IBM watsonx.data now supports HashiCorp vault for credential and secret management for Db2 data sources. Users can now access and use the HashiCorp secrets as credentials for registering the Db2 data source in watsonx.data.

Customer-reported issues fixed in this release

For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.

Deprecated features

The following features were deprecated in this release:

Azure Data Lake Storage (ADLS) Gen1 is now deprecated: Azure Data Lake Storage (ADLS) Gen1 is now deprecated and will be removed in an upcoming release. You must transition to ADLS Gen2 because ADLS Gen1 is not available.

User authentication ibmlhapikey and ibmlhtoken is now deprecated: The user authentication method of using ibmlhapikey and ibmlhtoken as the username is now deprecated and shall be removed in a future release. You can use ibmlhapikey_username and ibmlhtoken_username instead.