Limitations and known issues for data sources in Data Virtualization

The following limitations and known issues apply to data sources in Data Virtualization.

For more information about data sources and connections in, see Supported data sources in Data Virtualization.

For additional solutions to problems that you might encounter with data source connections, see the troubleshooting topic Troubleshooting data source connections in Data Virtualization.

VIRTUALIZETABLE procedure fails on Presto if user only has partial column access
Connections created using the cpd-cli connection create command can't be found or used
Using the INSERT operation on a Microsoft Azure Data Lake Storage connection causes error
Runtime Error displays when you try to find data sources
Collecting statistics for a virtualized Presto table results in an error
Unable to connect to a SSL-enabled data source in Data Virtualization by using a remote connector if you use custom certificates
Querying a virtualized table in a Presto catalog with a matching schema from a previous catalog might result in an error
Special characters are not supported in MongoDB database names
Limited file types are supported with the Microsoft Azure Data Lake Storage Gen2 data source connection
Special characters are not preserved in databases or schemas with MongoDB connections after you upgrade to Data Virtualization on IBM Software Hub
The DECFLOAT data type is not supported in Data Virtualization
The Data sources page might fail to load data sources when remote connectors are added, edited, or removed
Unable to add a connection to SAP S/4HANA data source with a SAP OData connection
You cannot connect to a MongoDB data source with special characters in a database name
When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty
Remote data sources - Performance issues when you create data source connection
Schema map refresh in-progress message appears for reloaded connections that do not require a refresh schema map

Data source issues

VIRTUALIZETABLE procedure fails on Presto if user only has partial column access

Applies to: 5.2.0

For Presto data sources, the VIRTUALIZETABLE stored procedure fails with a Query failed error when a user has access to only a subset of columns in the underlying table.

Workaround: Create a view in the source that includes only the columns the user can access, and then virtualize that view in Data Virtualization.

Connections created using the cpd-cli connection create command can't be found or used: Applies to: 5.2.1; When you create connections by using the cpd-cli connection create command, the connections created cannot be found or used in Data Virtualization, but they are available and can be used in projects and catalogs. For more information on connection create, see connection create.; Workaround: Create the connection using the ID.

Using the INSERT operation on a Microsoft Azure Data Lake Storage connection causes error

Applies to: 5.2.1

When you use the INSERT operation to insert data into a table on a Microsoft Azure Data Lake Storage connection, the operation might fail and cause an error resembling this example:

2025-08-06T20:08:16,210 WARN com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsTempDirManager [TThreadPoolServer WorkerProcess-4] SId:32.0-6633: Error calling setPermission on abfs://bigdatacontainer2@bigdataqasa2.dfs.core.windows.net/FVT/dv_fvt_cos_data2124/COSBVT/path5571/_TEMP_1754499267690_320574492_20250906080915760

2025-08-06T20:08:26,720 ERROR com.ibm.biginsights.bigsql.dfsrw.scheduler.DfsBaseWriterCommitHandler [TThreadPoolServer WorkerProcess-4] {bigsql.COMMIT} SId:32.0-6633: Failed to commit for cosbvt.table5571
org.apache.hadoop.fs.FileAlreadyExistsException: Operation failed: "The specified path already exists.", 409, PUT, https://bigdataqasa2.dfs.core.windows.net/bigdatacontainer2/FVT/dv_fvt_cos_data2124/COSBVT/path5571/.COMMITTING___TEMP_1754499267690_320574492_20250906080915760?resource=file&timeout=90, rId: dbf44b18-601f-0090-6e0d-05e4c1000000, PathAlreadyExists,
"The specified path already exists. RequestId:dbf22b38-603f-0090-7f0d-07e8s1000000 Time:2025-08-06T20:08:26.6748181Z"

Workaround:

Primary workaround

Delete the temporary commit marker file manually and retry the insert. From the previous example, you would delete the .COMMITTING___TEMP_1754499267690_320574492_20250906080915760 file.
Secondary workaround

Bypass the use of temporary commit markers by setting bigsql.insert.temporary.storage.mode to BYPASS. For more information, see Bypassing temporary directories to increase the performance of insert operations into object stores.

Runtime Error displays when you try to find data sources: Applies to: 5.2.1; In the Virtualize page, if you select the Files tab and then select the Find data sources drop down, a runtime error displays. To go back to the previous page, refresh the current page. There is no current workaround for this issue.

Collecting statistics for a virtualized Presto table results in an error: Applies to: 5.2.0 and later; When you try to collect the statistics of a virtualized Presto table using the COLLECT_STATISTICS stored procedure in Run SQL, you might encounter the error No statistics returned for table.; Workaround: Wait for an interval of 2-3 minutes and then attempt to collect the statistics again.
Note: You can apply a permanent fix on the data source side by setting the parameter http-server.max-request-header-size=<value up to 10TB> in the Presto coordinator configuration file. For more information, see Update Presto engine.

Unable to connect to an SSL-enabled data source in Data Virtualization by using a remote connector if you use custom certificates

Applies to: 5.2.0 and later

After you create your own custom certificates to use when Data Virtualization connects to the IBM® Software Hub platform, you are not able to connect to a remote SSL-enabled data source.

For more information about remote data sources and connectors in Data Virtualization, see Accessing data sources by using remote connectors in Data Virtualization.

Workaround:

Complete the following steps to resolve this issue and connect to remote data sources:

Download the CA certificate files from /etc/pki/ca-trust/source/anchors in the Data Virtualization head pod c-db2u-dv-db2u-0.
Upload the CA certificate files that you downloaded in step 1 to each remote connector computer.
If a file has more than one CA certificate, split the certificates into individual files

Run the following command to set AGENT_JAVA_HOME to the Java home that is used by the remote connectors:

AGENT_JAVA_HOME=$(tac <Remote connector install path>/datavirtualization.env| grep -i java_home | grep -v "#" -m )

For each file that you created in step 3, run the following command to add the certificates to the Java cacerts truststore in AGENT_JAVA_HOME/lib/security/cacerts. Make sure that you provide a unique alias value for the -alias command parameter for each file.
```
keytool -storepass changeit -keystore $AGENT_JAVA_HOME/lib/security/cacerts -importcert -alias <pick a alias> -rfc -file <absolute path to cert generated in step 3> -noprompt
```
Restart the remote connector. For more information, see Managing connectors on remote data sources.

Querying a virtualized table in a Presto catalog with a matching schema from a previous catalog might result in an error: Applies to: 5.2.0 and later; When you use a new catalog and then query a virtualized table from a previous catalog with an identical schema name, the query might run if the schema exists in the new catalog, but the outcome might result in an error. The resulting error might be caused by the differences in column definitions within the tables under the same schema name across catalogs.

Special characters are not supported in MongoDB database names: Applies to: 5.2.0 and later; You cannot use special characters such as semicolons and single quotes in a MongoDB database name.

Limited file types are supported with the Microsoft Azure Data Lake Storage Gen2 data source connection

Applies to: 5.2.0 and later

You can connect to Microsoft Azure Data Lake Storage Gen2 data source from Data Virtualization. However, only the following file types are supported with this connection in Data Virtualization:

CSV
TSV
ORC
Parquet
JSON Lines

Special characters are not preserved in databases or schemas with MongoDB connections after you upgrade to Data Virtualization on IBM Software Hub

Applies to: 5.2.0 and later

After you upgrade from an earlier version of Data Virtualization on IBM Software Hub to version 5.2.0, with a MongoDB connection, special characters in the database or schema are not preserved even though SpecialCharBehavior=Include is set in the data source connection. The updated MongoDB driver (version 6.1.0) that is used in Data Virtualization on IBM Software Hub does not recognize special characters in tables by default. This might cause issues with your results when you query a virtual table that has special characters.

The DECFLOAT data type is not supported in Data Virtualization

Applies to: 5.2.0 and later

The DECFLOAT data type is not supported in Data Virtualization. As a result, the type DECFLOAT is converted to DOUBLE and the special numeric values NaN, INF, and -INF are converted to NULL.

The Data sources page might fail to load data sources when remote connectors are added, edited, or removed

Applies to: 5.2.0 and later

If you remove a connection to a remote connector or the remote connector becomes unavailable because credentials expire, the Data sources page fails to load this connection.

Unable to add a connection to SAP S/4HANA data source with a SAP OData connection

Applies to: 5.2.0 and later

If you try to connect to an SAP S/4HANA data source that contains many tables, that connection might time out and the connection might fail. Increasing timeout parameters has no impact.

To work around this issue, run the following commands.

db2 connect to bigsql
db2 "call DVSYS.setRdbcX('SAPS4Hana', '<Data source IP:port>', '0', '', 'CreateSchema=ForceNew', '<username>', '<password>', '0', '0', '', '', '<internal connector instance and port>', ?,?,?);"
db2 terminate

You cannot connect to a MongoDB data source with special characters in a database name

Applies to: 5.2.0 and later

The current MongoDB JDBC Driver does not support connection to database names that contain special characters.

When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty

Applies to: 5.2.0 and later

After you virtualize the table, in Virtualized data, the data is available for the columns that contain LOB or Long Varchar data types.

Remote data sources - Performance issues when you create data source connection

Applies to: 5.2.0 and later

You try to create a data source by searching a different host, but the process takes several minutes to complete. This performance issue occurs only when these two conditions are met:

The remote data source is connected to multiple IBM Software Hub clusters.
Data Virtualization connects to multiple data sources in different IBM Software Hub clusters by using the remote connectors.

To solve this issue, ensure that your Data Virtualization connections are on a single Cloud Pak for Data cluster.

Query fails due to unexpectedly closed connection to data source

Applies to: 5.2.0 and later

Data Virtualization does not deactivate the connection pool for the data source when your instance runs a continuous workload against virtual tables from a particular data source. Instead, Data Virtualization waits for a period of complete inactivity before it deactivates the connection pool. The waiting period can create stale connections in the connection pool that get closed by the data source service and lead to query failures.

Workaround: Check the properties for persistent connection (keep-alive parameter) for your data sources. You can try two workarounds: 

Consider disabling the keep-alive parameter inside any data sources that receive continuous workload from Data Virtualization.
You can also decrease the settings for corresponding Data Virtualization properties, RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, as shown in the following examples:  
```
CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC', '10', '', ?, ?);    -- default 20s, minimum 5s
CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, '30', '', ?, ?);    -- default 120s, minimum 5s
```
Decreasing the RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC settings might help if there are small gaps of complete inactivity that were previously too short for the Data Virtualization shrink and deactivate timeouts to take effect.

Schema map refresh in-progress message appears for reloaded connections that do not require a refresh schema map

Applies to: 5.2.0 and later

The Schema map refresh in-progress message appears when you reload connections in Data Virtualization, even when the data source does not require a refresh schema map.

Only connections from data sources such as Google BigQuery, MongoDB, SAP S/4HANA, and Salesforce.com require a refresh schema map to update any changes in tables and columns for existing connections.