Limitations and known issues for data sources in Data Virtualization

The following limitations and known issues apply to data sources in Data Virtualization.

For more information about data sources and connections in, see Supported data sources in Data Virtualization.

For additional solutions to problems that you might encounter with data source connections, see the troubleshooting topic Troubleshooting data source connections in Data Virtualization.

Data source issues

VIRTUALIZETABLE procedure fails on Presto if user only has partial column access

Applies to: 5.2.0

For Presto data sources, the VIRTUALIZETABLE stored procedure fails with a Query failed error when a user has access to only a subset of columns in the underlying table.

Workaround: Create a view in the source that includes only the columns the user can access, and then virtualize that view in Data Virtualization.
Connections created using the cpd-cli connection create command can't be found or used

Applies to: 5.2.1

When you create connections by using the cpd-cli connection create command, the connections created cannot be found or used in Data Virtualization, but they are available and can be used in projects and catalogs. For more information on connection create, see connection create.

Workaround: Create the connection using the ID.
Using the INSERT operation on a Microsoft Azure Data Lake Storage connection causes error

Applies to: 5.2.1

When you use the INSERT operation to insert data into a table on a Microsoft Azure Data Lake Storage connection, the operation might fail and cause an error resembling this example:
2025-08-06T20:08:16,210 WARN com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsTempDirManager [TThreadPoolServer WorkerProcess-4] SId:32.0-6633: Error calling setPermission on abfs://bigdatacontainer2@bigdataqasa2.dfs.core.windows.net/FVT/dv_fvt_cos_data2124/COSBVT/path5571/_TEMP_1754499267690_320574492_20250906080915760

2025-08-06T20:08:26,720 ERROR com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsBaseWriterCommitHandler [TThreadPoolServer WorkerProcess-4] {bigsql.COMMIT} SId:32.0-6633: Failed to commit for cosbvt.table5571
org.apache.hadoop.fs.FileAlreadyExistsException: Operation failed: "The specified path already exists.", 409, PUT, https://bigdataqasa2.dfs.core.windows.net/bigdatacontainer2/FVT/dv_fvt_cos_data2124/COSBVT/path5571/.COMMITTING___TEMP_1754499267690_320574492_20250906080915760?resource=file&timeout=90, rId: dbf44b18-601f-0090-6e0d-05e4c1000000, PathAlreadyExists,
"The specified path already exists. RequestId:dbf22b38-603f-0090-7f0d-07e8s1000000 Time:2025-08-06T20:08:26.6748181Z"
Workaround:
Runtime Error displays when you try to find data sources

Applies to: 5.2.1

In the Virtualize page, if you select the Files tab and then select the Find data sources drop down, a runtime error displays. To go back to the previous page, refresh the current page. There is no current workaround for this issue.
Collecting statistics for a virtualized Presto table results in an error

Applies to: 5.2.0 and later

When you try to collect the statistics of a virtualized Presto table using the COLLECT_STATISTICS stored procedure in Run SQL, you might encounter the error No statistics returned for table.

Workaround: Wait for an interval of 2-3 minutes and then attempt to collect the statistics again.
Note: You can apply a permanent fix on the data source side by setting the parameter http-server.max-request-header-size=<value up to 10TB> in the Presto coordinator configuration file. For more information, see Update Presto engine.
Unable to connect to an SSL-enabled data source in Data Virtualization by using a remote connector if you use custom certificates

Applies to: 5.2.0 and later

After you create your own custom certificates to use when Data Virtualization connects to the IBM® Software Hub platform, you are not able to connect to a remote SSL-enabled data source.

For more information about remote data sources and connectors in Data Virtualization, see Accessing data sources by using remote connectors in Data Virtualization.
Workaround:
Complete the following steps to resolve this issue and connect to remote data sources:
  1. Download the CA certificate files from /etc/pki/ca-trust/source/anchors in the Data Virtualization head pod c-db2u-dv-db2u-0.
  2. Upload the CA certificate files that you downloaded in step 1 to each remote connector computer.
  3. If a file has more than one CA certificate, split the certificates into individual files
  4. Run the following command to set AGENT_JAVA_HOME to the Java home that is used by the remote connectors:
    AGENT_JAVA_HOME=$(tac <Remote connector install path>/datavirtualization.env| grep -i java_home | grep -v "#" -m )
  5. For each file that you created in step 3, run the following command to add the certificates to the Java cacerts truststore in AGENT_JAVA_HOME/lib/security/cacerts. Make sure that you provide a unique alias value for the -alias command parameter for each file.
    keytool -storepass changeit -keystore $AGENT_JAVA_HOME/lib/security/cacerts -importcert -alias <pick a alias> -rfc -file <absolute path to cert generated in step 3> -noprompt
  6. Restart the remote connector. For more information, see Managing connectors on remote data sources.
Querying a virtualized table in a Presto catalog with a matching schema from a previous catalog might result in an error

Applies to: 5.2.0 and later

When you use a new catalog and then query a virtualized table from a previous catalog with an identical schema name, the query might run if the schema exists in the new catalog, but the outcome might result in an error. The resulting error might be caused by the differences in column definitions within the tables under the same schema name across catalogs.

Special characters are not supported in MongoDB database names

Applies to: 5.2.0 and later

You cannot use special characters such as semicolons and single quotes in a MongoDB database name.
Limited file types are supported with the Microsoft Azure Data Lake Storage Gen2 data source connection

Applies to: 5.2.0 and later

You can connect to Microsoft Azure Data Lake Storage Gen2 data source from Data Virtualization. However, only the following file types are supported with this connection in Data Virtualization:
  • CSV
  • TSV
  • ORC
  • Parquet
  • JSON Lines
Special characters are not preserved in databases or schemas with MongoDB connections after you upgrade to Data Virtualization on IBM Software Hub

Applies to: 5.2.0 and later

After you upgrade from an earlier version of Data Virtualization on IBM Software Hub to version 5.2.0, with a MongoDB connection, special characters in the database or schema are not preserved even though SpecialCharBehavior=Include is set in the data source connection. The updated MongoDB driver (version 6.1.0) that is used in Data Virtualization on IBM Software Hub does not recognize special characters in tables by default. This might cause issues with your results when you query a virtual table that has special characters.
The DECFLOAT data type is not supported in Data Virtualization
Applies to: 5.2.0 and later

The DECFLOAT data type is not supported in Data Virtualization. As a result, the type DECFLOAT is converted to DOUBLE and the special numeric values NaN, INF, and -INF are converted to NULL.

The Data sources page might fail to load data sources when remote connectors are added, edited, or removed

Applies to: 5.2.0 and later

If you remove a connection to a remote connector or the remote connector becomes unavailable because credentials expire, the Data sources page fails to load this connection.

Unable to add a connection to SAP S/4HANA data source with a SAP OData connection

Applies to: 5.2.0 and later

If you try to connect to an SAP S/4HANA data source that contains many tables, that connection might time out and the connection might fail. Increasing timeout parameters has no impact.
To work around this issue, run the following commands.
db2 connect to bigsql
db2 "call DVSYS.setRdbcX('SAPS4Hana', '<Data source IP:port>', '0', '', 'CreateSchema=ForceNew', '<username>', '<password>', '0', '0', '', '', '<internal connector instance and port>', ?,?,?);"
db2 terminate
You cannot connect to a MongoDB data source with special characters in a database name

Applies to: 5.2.0 and later

The current MongoDB JDBC Driver does not support connection to database names that contain special characters.

When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty

Applies to: 5.2.0 and later

After you virtualize the table, in Virtualized data, the data is available for the columns that contain LOB or Long Varchar data types.

Remote data sources - Performance issues when you create data source connection

Applies to: 5.2.0 and later

You try to create a data source by searching a different host, but the process takes several minutes to complete. This performance issue occurs only when these two conditions are met:
  • The remote data source is connected to multiple IBM Software Hub clusters.
  • Data Virtualization connects to multiple data sources in different IBM Software Hub clusters by using the remote connectors.

To solve this issue, ensure that your Data Virtualization connections are on a single Cloud Pak for Data cluster.

Query fails due to unexpectedly closed connection to data source

Applies to: 5.2.0 and later

Data Virtualization does not deactivate the connection pool for the data source when your instance runs a continuous workload against virtual tables from a particular data source. Instead, Data Virtualization waits for a period of complete inactivity before it deactivates the connection pool. The waiting period can create stale connections in the connection pool that get closed by the data source service and lead to query failures.

Workaround: Check the properties for persistent connection (keep-alive parameter) for your data sources. You can try two workarounds:

  • Consider disabling the keep-alive parameter inside any data sources that receive continuous workload from Data Virtualization.
  • You can also decrease the settings for corresponding Data Virtualization properties, RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, as shown in the following examples: 

    CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC', '10', '', ?, ?);    -- default 20s, minimum 5s
    CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, '30', '', ?, ?);    -- default 120s, minimum 5s
    Decreasing the RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC settings might help if there are small gaps of complete inactivity that were previously too short for the Data Virtualization shrink and deactivate timeouts to take effect.
Schema map refresh in-progress message appears for reloaded connections that do not require a refresh schema map

Applies to: 5.2.0 and later

The Schema map refresh in-progress message appears when you reload connections in Data Virtualization, even when the data source does not require a refresh schema map.
Only connections from data sources such as Google BigQuery, MongoDB, SAP S/4HANA, and Salesforce.com require a refresh schema map to update any changes in tables and columns for existing connections.