What's new and changed in watsonx.data
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.
IBM® watsonx.data updates can include new features, bug fixes, and security updates. Updates are listed in reverse chronological order so that the latest release is at the beginning of the topic.
You can see a list of the new features for the platform and all of the services at What's new in IBM Cloud Pak for Data.
Installing or upgrading watsonx.data
- Related documentation:
Cloud Pak for Data Version 4.8.5
A new version of watsonx.data was released in April 2024 with Cloud Pak for Data 4.8.5.
Operand version: 1.1.4
This release includes the following changes:
- New features
-
The 1.1.4 release of watsonx.data includes the following features and updates:
- Uploading description files for Apache Kafka data source
- The Apache Kafka data source stores data as
byte messages that producers and consumers must interpret. To query this data, consumers must first
map it into columns. Now you can upload topic description files that convert raw data into a table
format. Each file must be a JSON file that contains a definition for a table.
To upload these JSON files from the UI, go to the overview page of the Apache Kafka database that you registered and select the Add topic option.
- New data sources
- The following new data sources are now available:
- Oracle
- Amazon Redshift
- Informix®
- Prometheus
- New BINARY data type for data sources
- In the Query workspace, you can now use the BINARY data type and its equivalents with the SELECT
statement to build and run queries against your data for the following data sources:
- Db2® (BINARY data type)
- Snowflake (BINARY data type)
- PostgreSQL (BYTEA data type)
- Teradata (VARBYTE data type)
- Test SSL connections
- You can now test SSL connections for the MongoDB and SingleStore data sources.
- Kerberos authentication for HDFS connections
- You can now enable Kerberos authentication for secure Apache Hadoop Distributed File System (HDFS) connections.
- Presto engine version upgrade
- The Presto engine is now upgraded to version 0.285.1.
Cloud Pak for Data Version 4.8.4
A new version of watsonx.data was released in March 2024 with Cloud Pak for Data 4.8.4.
Operand version: 1.1.3
This release includes the following changes:
- New features
-
The 1.1.3 release of watsonx.data includes the following features and updates:
- New data types for some data sources
- You can now use the
BINARYdata type with theSELECTstatement in the Query workspace to build and run queries against data from the following data sources:- Elasticsearch
- SAP HANA
- Microsoft SQL Server
- MySQL
You can now use the
BLOBandCLOBdata types with theSELECTstatement in the Query workspace to build and run queries against data from the following data sources:- MySQL
- PostgreSQL
- Snowflake
- Microsoft SQL Server
- Db2
- Delete data by using the DELETE FROM feature for Apache Iceberg data sources
- You can now delete data from tables from Apache Iceberg data sources by using the
DELETE FROMfeature. You can use copy-on-write mode or merge-on-read mode to delete data. For more information, see Supported SQL statements. - ALTER VIEW statement for Iceberg data source
- You can now use the following SQL statement in the Query workspace to build and run
queries against your data for
ALTER VIEW:ALTER VIEW name RENAME TO new_name - Upload SSL certificates for IBM Netezza® Performance Server data sources
- You can now use the Infrastructure Manager in the watsonx.data console to browse for and upload
SSL certificates for SSL connections to IBM Netezza Performance Server data sources. The valid file formats
for SSL certificates are
.pem,.crt, and.cer. - Query data from Db2 and Watson Query
- You can now query nicknames that are created in Db2 and virtualized tables from Watson Query instances.
- Use data from Apache Hudi data sources
- You can now connect to and use data from Apache Hudi data sources.
- Add Milvus as a service in watsonx.data
- You can now provision Milvus as a service in watsonx.data. You can provision different storage variants such as starter, medium, and large nodes, and you can assign Administrator or User roles for Milvus users.
- Use time-travel queries in Apache Iceberg tables
- You can now run the following time-travel queries by using branches and tags in Apache Iceberg table snapshots:
SELECT *FROM <table name> FOR VERSION AS OF 'historical-tag'SELECT *FROM <table name> FOR VERSION AS OF 'test-branch'
- Load data in batch by using the ibm-lh ingestion tool
- You can now use the
ibm-lhingestion tool to run batch ingestion procedures in non-interactive mode, from outside theibm-lh-toolscontainer, by using theibm-lh-clientpackage. For more information, see ibm-lh commands and usage.
- Creating schemas by using bulk ingestion in web console
- You can now create a schema by using the bulk ingestion process in the web console, if the schema is not previously created.
- Data ingestion is possible only through Spark engine from web console
- Iceberg copy loader method of ingesting data is no longer available on the web console. Now, data ingestion through Spark engine is the only available method of ingestion through web console. For more information, see Ingesting data by using Spark.
- New storage type Hadoop Distributed File System (HDFS)
- You can now use new storage type HDFS and attach a bucket and catalog to it. For more information, see For more information, see Hadoop Distributed File System (HDFS).
- Add bucket feature is now available in Add storage option
- You can now add a storage and attach a bucket to the respective storage type in the Infrastructure manager in the UI.
Cloud Pak for Data Version 4.8.3
A new version of watsonx.data was released in February 2024 with Cloud Pak for Data 4.8.3.
Operand version: 1.1.2
This release includes the following changes:
- New features
-
The 1.1.2 release of watsonx.data includes the following features and updates:
- SSL connections for data sources
- You can now enable SSL connections for the following data sources by using the Add
database window to secure and encrypt the database connection:
- Db2
- IBM Data Virtualization Manager for z/OS®
- PostgreSQL
For IBM Data Virtualization Manager for z/OS and PostgreSQL, select Validate certificate to validate whether the SSL certificate that is returned by the host is trusted.
For the IBM Data Virtualization Manager for z/OS data source, you can choose to provide the hostname in the SSL certificate.
For more information, see Adding a database-catalog pair. - Secure ingestion job history
- Now users can view only their own ingestion job history. Administrators can view the ingestion job history for all users.
- Use more SQL statements
- You can now use the following SQL statements in the Query workspace to build and run queries
against your data:
- Apache Iceberg data sources
-
- CREATE VIEW
- DROP VIEW
- MongoDB data sources
-
- DELETE
- New data types for SAP HANA and Teradata data sources
- SAP HANA and Teradata data sources now support BLOB and CLOB data types. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.
- Create a table during data ingestion
- Previously, you had to have a target table in watsonx.data to ingest data. Now, when you
ingest data from the Data Manager, you can create a new table directly from your parquet or CSV
source files. You can create the table by using the following methods of ingestion:
- Ingesting data by using the Apache Iceberg copy loader.
For more information, see Ingesting data by using Iceberg copy loader.
- Ingesting data by using Spark.
For more information, see Ingesting data by using Spark.
- Ingesting data by using the Apache Iceberg copy loader.
- Perform ALTER TABLE operations on a column
- With an Apache Iceberg data source, you
can now perform ALTER TABLE operations on a column for the following data type conversions:
inttobigintfloattodoubledecimaltodecimal, where the source decimal type has fewer digits than the converted decimal type.
- Grant user access to Spark instance directly from watsonx.data
-
If you have the watsonx.data admin permission and you have access to an instance of Analytics Engine powered by Apache Spark in Cloud Pak for Data, you can provide user access directly from the watsonx.data user interface without switching to the Cloud Pak for Data user interface.
- Better query performance by using sorted files
- With an Apache Iceberg data source, you
can generate sorted files, which reduce the query result latency and improve the performance of
Presto.
Data in the Apache Iceberg table is sorted during the writing process within each file. You can configure the order to sort the data by using the
sorted_by tableproperty. When you create the table, specify the array of columns involved in sorting.
Cloud Pak for Data Version 4.8.1
A new version of watsonx.data was released in December 2023 with Cloud Pak for Data 4.8.1.
Operand version: 1.1.1
This release includes the following changes:
- New features
-
The 1.1.1 release of watsonx.data includes the following features and updates:
- Audit logging
- IBM
watsonx.data now integrates with
the Cloud Pak for Data audit logging service. Auditable
events for watsonx.data are forwarded
to the security information and event management (SIEM) solution that you integrate with.
For more information, see Audit events for watsonx.data and Exporting Cloud Pak for Data audit records to a security information and event management solution.
- Use self-signed certificates and CA certificates to connect to object stores
- Previously, watsonx.data could
connect to HTTPS endpoints that used certificates signed by well-known certificate authorities, such
as IBM Cloud Object Storage and Amazon S3. Now, you can connect to object stores that
use self-signed certificates or certificates signed by other certificate authorities.
For more information, see Connecting to external object stores over
https. - IBM Data Virtualization Manager for z/OS connector
- You can use the new IBM Data Virtualization Manager for z/OS connector
to read and write IBM Z®, without having to
move, replicate or transform the data.
For more information, see Connecting to an IBM Data Virtualization Manager (DVM) data source.
- Integration with Db2 and Netezza
- You can now register Db2 or Netezza engines with valid console URL. You can use the metastore URL shown in Engine detail page to sync the respective engines with appropriate bucket catalog-based table.
- Better memory management
- Metastore caching and metadata caching (header and footer caching) are now enabled by default to optimize the memory usage. Also, now you can create a local staging directory to optimize the use of resources during data operations. For more information, see Enhancing Query Performance through caching and Configuring a local staging directory.
- Presto case-sensitive behavior
- The Presto behavior is changed from case-insensitive to case-sensitive. Now you can provide the object names in original case format as in the database. You can also create Schemas, Tables and Columns in mixed case that is, uppercase and lowercase through Presto if the database supports it.
- Teradata connector is enabled for multiple ALTER TABLE statements
- Teradata connector now supports the
ALTER TABLE RENAME TO,ALTER TABLE DROP COLUMN,ALTER TABLE RENAME COLUMN column_name TO new_column_namestatements. - Simplified shutdown
- Now when you shut down the watsonx.data service from Cloud Pak for Data, the watsonx.data and associated engines also shut down.
- Removal of development
(*-devel)packages - For security reasons, the
*-develpackages are removed from watsonx.data. If you are already using the development packages, the programs that use the development packages cannot be compiled . For any queries, contact IBM Support. - SSL is enabled for PostgreSQL
- Now ingestion can use mounted certificates when connecting to PostgreSQL.
Cloud Pak for Data Version 4.8.0
A new version of watsonx.data was released in November 2023 with Cloud Pak for Data 4.8.0.
Operand version: 1.1.0
This release includes the following changes:
- New features
-
The 1.1.0 release of watsonx.data includes the following features and updates:
- Time-travel and roll-back queries
- You can now run the following time-travel queries to access historical data in Apache Iceberg tables:
SELECT <columns> FROM <iceberg-table> FOR TIMESTAMP AS OF TIMESTAMP <timestamp>;SELECT <columns> FROM <iceberg-table> FOR VERSION AS OF <snapshotId>;
You can use time-travel queries to query and restore data that was updated or deleted in the past.
You can also roll back an Apache Iceberg table to any existing snapshot.
- Capture historical data about Presto queries
- The Query History Monitoring and Management (QHMM) service captures historical data about
Presto queries and events. The historical
data is stored in a MinIO bucket and you
can use the data to understand the queries that were run and to debug the Presto engine.
For more information, see Monitoring and managing diagnostic data.
- Ingest data by using Spark
- You can now use the IBM Analytics Engine powered by Apache Spark to run
ingestion jobs in watsonx.data.
For more information, see Ingesting data by using Spark.
- Improved query performance with caching
- You can use the following types of caching to improve Presto query performance:
- Metastore caching
- File list caching
- File metadata caching
For more information, see Enhancing query performance through caching.
- New connectors
- You can now use connectors in watsonx.data to establish connections to the
following types of databases:
- SAP HANA
- SingleStoreDB
- Snowflake
- Teradata
For more information, see Adding a database.
- Updates
-
- Monitor watsonx.data
-
You can now monitor the performance and health of the watsonx.data service with the following monitors:
- Presto engine status check
- EDB PostgreSQL status check
- Service health check
To use these monitors, an instance administrator must install the service monitors. For more information, see Installing service monitors.
- New method for integrating with IBM Knowledge Catalog
-
Now, you can use the
ZenApiKeyauthorization method to integrate watsonx.data and IBM Knowledge Catalog. For more information, see Integrating with IBM Knowledge Catalog. - Shut down and restart watsonx.data
with
cpd-clicommands - You can now use the following
cpd-clicommands to shut down and restart the watsonx.data service:cpd-cli manage shutdowncpd-cli manage restart
For more information, see Shutting down and restarting services.