What's new and changed in Data Virtualization

Important: IBM® Cloud Pak for Data Version 4.5 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.5 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Data Virtualization updates can include new features, bug fixes, and security updates. Updates are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM Cloud Pak for Data.

Installing or upgrading Data Virtualization

Ready to install or upgrade Data Virtualization?

Related documentation:

Cloud Pak for Data Version 4.5.3

A new version of Data Virtualization was released in October 2022 with Cloud Pak for Data 4.5.3.

Operand version: 1.8.3

This release includes the following changes:

New features

The 1.8.3 release of Data Virtualization includes the following features and updates:

Sharing your virtualized objects is quicker and easier: When you virtualize objects, you can assign the objects to multiple projects or data requests, and you can publish the objects to a catalog, all in one step.
A Data Virtualization connection is now available in the Platform assets catalog by default: You can add a Data Virtualization connection to projects without manually populating the connection details.

Customer-reported issues fixed in this release

DT145154: Data Virtualization cluster unstable, continuous Pod (c-db2u-dv-db2u-0) restarts due to Liveness probes failing.

Security issues fixed in this release

This release includes fixes for the following security issues:

CVE-2022-1012, CVE-2022-1586, CVE-2022-1652, CVE-2022-1705, CVE-2022-1729, CVE-2022-1962, CVE-2022-1976, CVE-2022-2257, CVE-2022-2380, CVE-2022-21540, CVE-2022-21541, CVE-2022-23648, CVE-2022-23772, CVE-2022-23773, CVE-2022-23806, CVE-2022-24675, CVE-2022-24921, CVE-2022-25313, CVE-2022-25314, CVE-2022-25375, CVE-2022-28131, CVE-2022-28327, CVE-2022-28356, CVE-2022-29154, CVE-2022-29217, CVE-2022-30580, CVE-2022-30629, CVE-2022-30630, CVE-2022-30631, CVE-2022-30632, CVE-2022-30633, CVE-2022-30635, CVE-2022-31030, CVE-2022-32148, CVE-2022-32250, CVE-2022-33068, CVE-2022-33099, CVE-2022-33980, CVE-2022-34169, CVE-2022-34494

CVE-2021-3408, CVE-2021-3583, CVE-2021-3701, CVE-2021-3702, CVE-2021-4041, CVE-2021-29923, CVE-2021-30465, CVE-2021-33098, CVE-2021-33150, CVE-2021-39711, CVE-2021-39715, CVE-2021-40528, CVE-2021-41092, CVE-2021-46822

CVE-2020-1734, CVE-2020-1737, CVE-2020-12401

CVE-2019-16884

CVE-2016-9962

Cloud Pak for Data Version 4.5.1

A new version of Data Virtualization was released in July 2022 with Cloud Pak for Data 4.5.1.

Operand version: 1.8.1

This release includes the following changes.

New features

The 1.8.1 release of Data Virtualization includes the following features and updates.

Use Cognos Authentication Method (CAM) credentials to connect to Planning Analytics data sources

You can now use CAM credentials as an authentication method when you create a connection to a Planning Analytics data source in Data Virtualization. For more information, see IBM Planning Analytics connection.

Screenshot of a Planning Analytics connection in Data Virtualization.

Use Watson Knowledge Catalog policies to filter rows in virtualized tables

You might have a data source that has tables with government, enterprise, and retail client data combined. For example, a billing table might have data for all the customers, where some of the rows are for government clients and some are for nongovernment clients. The type of the client is not indicated in the billing table. Now, you can filter the list of client records by using one of the following techniques.

You can use a separate table to identify customers that are government clients. The IDs from this table can be used to filter out rows from the billing table. When you filter out rows, the masked table does not contain the rows with data of government clients.
You can use a table of blocked customer identifiers as a reference table. Any rows in the billing table that have rows with the customer identifier that is included in the blocked customer set are filtered out of the resulting set.

For more information, see Filtering rows in data protection rules.

Security fixes

This release includes fixes for the following security issues.

CVE-2022-0778, CVE-2022-26691, CVE-2022-28733, CVE-2022-28734, CVE-2022-28735, CVE-2022-28736, CVE-2022-28737, CVE-2022-29244, CVE-2022-31129

CVE-2021-3695, CVE-2021-3696, CVE-2021-3697, CVE-2021-34429

Cloud Pak for Data Version 4.5.0

A new version of Data Virtualization was released in June 2022 with Cloud Pak for Data 4.5.0.

Operand version: 1.8.0

This release includes the following changes.

New features

Version 1.8.0 of the Data Virtualization service includes the following features and updates.

Upgrade to Cloud Pak for Data version 4.5 (Data Virtualization 1.8.0)

You can upgrade Data Virtualization from the following Cloud Pak for Data versions to Cloud Pak for Data version 4.5.

Back up and restore Data Virtualization

You can use the Cloud Pak for Data backup and restore utilities to take frequent online backups of Data Virtualization without sacrificing productivity. Or you can put Cloud Pak for Data in quiesce mode to consistently back up Data Virtualization while your cluster is offline.

For more information, see Backing up and restoring Cloud Pak for Data.

Quickly find and virtualize tables with the Explore tab

You can now quickly find the tables that you want to virtualize. On the Virtualize page, you can use the Explore tab to browse through databases, schemas, and available tables in a connected data source. The List tab displays all of the available tables in all of your connected data sources. On the Data sources page, you can filter your data sources to quickly load the reduced list of available tables in the List tab.

Screenshot of the Explore view on the Virtualize page that shows a table select and ready to be added to the cart.

For more information, see Creating virtual objects in Data Virtualization.

Improve statistics collection for virtualized tables by using data sampling

Data sampling improves statistics collection by reducing the resources that you need to collect statistics. When you collect statistics by selecting the Remote query collection method in the web client, a default sampling rate of 20% is used. To optimize statistics collection, select Enable table sampling and choose a sampling rate between 1% and 99%.

If you collect statistics by using the DVSYS.COLLECT_STATISTICS procedure, you can use the TABLESAMPLE option with the remote-query statistics collection type to sample data when you collect statistics. For tips, see Usage notes.

You can also use the DVSYS.COLLECT_STATISTICS procedure to collect statistics for virtualized tables over flat files. For more information, see the COLLECT_STATISTICS stored procedure in Data Virtualization.

Virtualize files with column headers in cloud object storage

You can now virtualize flat files in cloud object storage that contain column headers.

For more information, see Creating a virtualized table from files in cloud object storage in Data Virtualization.

Manage access for multiple groups if you are an Admin

As a Data Virtualization Admin, you can now grant and revoke access for multiple users, groups, and roles at the same time.

For more information, see Managing access to virtual objects in Data Virtualization.

Filter rows in virtualized data based on data protection rules in Watson Knowledge Catalog

Data Virtualization supports masking columns in virtualized data based on data protection rules that are defined in Watson Knowledge Catalog. Now, you can create data protection rules to include or exclude rows in your virtualized data to avoid exposing sensitive data.

For more information, see Governing virtual data with data protection rules in Data Virtualization and Designing data protection rules.

Improve query performance and enforcement of data protection rules

Data Virtualization now stores and caches data protection rules from Watson Knowledge Catalog in a policy enforcement point cache to avoid evaluating rules every time an object is queried. This cache improves the performance of previously executed queries by reducing the number of calls to Watson Knowledge Catalog to fetch the rules. However, you might notice a delay of up to 10 minutes before newly added or updated data protection rules are applied to queries.

For more information, see Enabling enforcement of data protection rules in Data Virtualization.

Manage metadata for Data Virtualization assets with metadata enrichment

Metadata enrichment helps you find data faster, trust your data, and protect your data. Metadata includes terms that define the meaning of the data, rules that document ownership, and quality standards.

For more information, see Managing metadata enrichment.

Support for predicate pushdown on more data sources

Predicate pushdown is an optimization that reduces query times and memory usage. The following data sources now support pushdown of predicates: MySQL (My SQL Community Edition and My SQL Enterprise Edition), Cloudera Impala, and Data Virtualization Manager for z/OS®.

The following enhanced pushdown capabilities have also been implemented on more SQL patterns to improve query performance.

SQL statements with LIKE predicates are now pushed down for: Db2®, SAP HANA, Oracle, PostgreSQL, Apache Hive, MySQL, Microsoft SQL Server, Snowflake, Netezza® Performance Server, and Teradata.
SQL statements with Fetch clauses are now pushed down for: Db2, Db2 for z/OS, Apache Derby, Oracle, Amazon Redshift, Google BigQuery, and Salesforce.com data sources.
SQL statements with a string comparison filter are now pushed down for: Db2, Microsoft SQL Server, Teradata, Netezza Performance Server, and Apache Derby data sources.
SQL statements with OLAP functions are now pushed down for: Db2 and Netezza Performance Server data sources.

Customer-reported issues fixed in this release

DT127089: Data Virtualization fails to connect to MS SQL Server with an INSTANCE name
DT128265: Duplicate Virtualized Asset
DT129875: dv-extension-translations-job has 2 same label: job_name and job-name
DT130521: Data Virtualization Showing Incorrect Number of Users on Instance
DT130572: User Must Have at Least One Prior Sign-In to be Granted Permissions in Data Virtualization

Security fixes

This release includes fixes for the following security issues:

CVE-2022-1154, CVE-2022-21426, CVE-2022-21434, CVE-2022-21443, CVE-2022-21476, CVE-2022-21496, CVE-2022-29078

CVE-2021-3634, CVE-2021-3807, CVE-2021-4189, CVE-2021-25219, CVE-2021-41617, CVE-2021-43138, CVE-2021-43818

CVE-2020-19131, CVE-2020-35492

CVE-2018-25032

Bug fixes

This release includes the following fixes:

Issue: Persistent volume on Data Virtualization head node becomes full.
Resolution: The persistent volume (PV) on the Data Virtualization head node no longer becomes full because transaction logs in the embedded Db2 database are archived.
Issue: Minute selector of the cache refresh rate can be incremented beyond maximum and cannot be reset.
Resolution: To set a cache refresh rate, you can select an Hourly frequency and then choose the minute of the hour when the cache refresh is run. You cannot increase this frequency beyond 59 minutes.
Issue: You must refresh the SSL certificate that is used by Data Virtualization after the Cloud Pak for Data self-signed certificate is updated.
Resolution: The certificate manager regenerates a new certificate and re-creates the secret for that certificate. For more information, see Securing the Data Virtualization environment.