Limitations and known issues for data sources in Data Virtualization

The following limitations and known issues apply to data sources in Data Virtualization.

For more information about data sources and connections in, see Supported data sources in Data Virtualization.

For additional solutions to problems that you might encounter with data source connections, see the troubleshooting topic Troubleshooting data source connections in Data Virtualization.

Data source issues

setrdbcx fails with java.lang.NullPointerException error when connecting to the Snowflake data source

Applies to: 5.2.0

When you attempt to connect to the Snowflake data source, the setrdbcx stored procedure fails to run, and displays a java.lang.NullPointerException error.

The error might resemble this example:
2025-05-04 10:01:26 [    INFO] Stmt: Call dvsys.setrdbcx('Snowflake', 'wv03539.us-east-2.aws.snowflakecomputing.com', 443, 'TM_DV_DB', 'warehouse=DV_WH;schema=TM_DV;role=TM_DV_ROLE', 'TM_DV_USER', 'DpEE9k9ucqkppGp@NdV2', 1, 0, '', '', 'qpendpoint_3', '', ?, ?, ?) (fvt_utils.py:4677)
2025-05-04 10:01:52 [ WARNING] dvsys.setrdbcx diags = qpendpoint_3, failed with The exception 'java.lang.Exception: java.lang.NullPointerException: Cannot invoke "java.io.File.exists()" because "this.cacheFile" is null' was thrown while evaluating an expression.; (fvt_utils.py:4692)
Workaround: Attempt to connect to the Snowflake data source again until the connection is established.
Note: The workaround might not work in all cases.
Collecting statistics for a virtualized Presto table results in an error

Applies to: 5.2.0

When you try to collect the statistics of a virtualized Presto table using the COLLECT_STATISTICS stored procedure in Run SQL, you might encounter the error No statistics returned for table.

Workaround: Wait for an interval of 2-3 minutes and then attempt to collect the statistics again.
Note: You can apply a permanent fix on the data source side by setting the parameter http-server.max-request-header-size=<value up to 10TB> in the Presto coordinator configuration file. For more information, see Update Presto engine.
Unable to connect to an SSL-enabled data source in Data Virtualization by using a remote connector if you use custom certificates

Applies to: 5.2.0

After you create your own custom certificates to use when Data Virtualization connects to the IBM® Software Hub platform, you are not able to connect to a remote SSL-enabled data source.

For more information about remote data sources and connectors in Data Virtualization, see Accessing data sources by using remote connectors in Data Virtualization.
Workaround:
Complete the following steps to resolve this issue and connect to remote data sources:
  1. Download the CA certificate files from /etc/pki/ca-trust/source/anchors in the Data Virtualization head pod c-db2u-dv-db2u-0.
  2. Upload the CA certificate files that you downloaded in step 1 to each remote connector computer.
  3. If a file has more than one CA certificate, split the certificates into individual files
  4. Run the following command to set AGENT_JAVA_HOME to the Java home that is used by the remote connectors:
    AGENT_JAVA_HOME=$(tac <Remote connector install path>/datavirtualization.env| grep -i java_home | grep -v "#" -m )
  5. For each file that you created in step 3, run the following command to add the certificates to the Java cacerts truststore in AGENT_JAVA_HOME/lib/security/cacerts. Make sure that you provide a unique alias value for the -alias command parameter for each file.
    keytool -storepass changeit -keystore $AGENT_JAVA_HOME/lib/security/cacerts -importcert -alias <pick a alias> -rfc -file <absolute path to cert generated in step 3> -noprompt
  6. Restart the remote connector. For more information, see Managing connectors on remote data sources.
Querying a virtualized table in a Presto catalog with a matching schema from a previous catalog might result in an error

Applies to: 5.2.0

When you use a new catalog and then query a virtualized table from a previous catalog with an identical schema name, the query might run if the schema exists in the new catalog, but the outcome might result in an error. The resulting error might be caused by the differences in column definitions within the tables under the same schema name across catalogs.

Special characters are not supported in MongoDB database names

Applies to: 5.2.0

You cannot use special characters such as semicolons and single quotes in a MongoDB database name.
Limited file types are supported with the Microsoft Azure Data Lake Storage Gen2 data source connection

Applies to: 5.2.0

You can connect to Microsoft Azure Data Lake Storage Gen2 data source from Data Virtualization. However, only the following file types are supported with this connection in Data Virtualization:
  • CSV
  • TSV
  • ORC
  • Parquet
  • JSON Lines
Special characters are not preserved in databases or schemas with MongoDB connections after you upgrade to Data Virtualization on IBM Software Hub

Applies to: 5.2.0

After you upgrade from an earlier version of Data Virtualization on IBM Software Hub to version 5.2.0, with a MongoDB connection, special characters in the database or schema are not preserved even though SpecialCharBehavior=Include is set in the data source connection. The updated MongoDB driver (version 6.1.0) that is used in Data Virtualization on IBM Software Hub does not recognize special characters in tables by default. This might cause issues with your results when you query a virtual table that has special characters.
The DECFLOAT data type is not supported in Data Virtualization
Applies to: 5.2.0

The DECFLOAT data type is not supported in Data Virtualization. As a result, the type DECFLOAT is converted to DOUBLE and the special numeric values NaN, INF, and -INF are converted to NULL.

The Data sources page might fail to load data sources when remote connectors are added, edited, or removed

Applies to: 5.2.0

If you remove a connection to a remote connector or the remote connector becomes unavailable because credentials expire, the Data sources page fails to load this connection.

Unable to add a connection to SAP S/4HANA data source with a SAP OData connection

Applies to: 5.2.0

If you try to connect to an SAP S/4HANA data source that contains many tables, that connection might time out and the connection might fail. Increasing timeout parameters has no impact.
To work around this issue, run the following commands.
db2 connect to bigsql
db2 "call DVSYS.setRdbcX('SAPS4Hana', '<Data source IP:port>', '0', '', 'CreateSchema=ForceNew', '<username>', '<password>', '0', '0', '', '', '<internal connector instance and port>', ?,?,?);"
db2 terminate
You cannot connect to a MongoDB data source with special characters in a database name

Applies to: 5.2.0

The current MongoDB JDBC Driver does not support connection to database names that contain special characters.

When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty

Applies to: 5.2.0

After you virtualize the table, in Virtualized data, the data is available for the columns that contain LOB or Long Varchar data types.

Remote data sources - Performance issues when you create data source connection

Applies to: 5.2.0

You try to create a data source by searching a different host, but the process takes several minutes to complete. This performance issue occurs only when these two conditions are met:
  • The remote data source is connected to multiple IBM Software Hub clusters.
  • Data Virtualization connects to multiple data sources in different IBM Software Hub clusters by using the remote connectors.

To solve this issue, ensure that your Data Virtualization connections are on a single Cloud Pak for Data cluster.

Query fails due to unexpectedly closed connection to data source

Applies to: 5.2.0

Data Virtualization does not deactivate the connection pool for the data source when your instance runs a continuous workload against virtual tables from a particular data source. Instead, Data Virtualization waits for a period of complete inactivity before it deactivates the connection pool. The waiting period can create stale connections in the connection pool that get closed by the data source service and lead to query failures.

Workaround: Check the properties for persistent connection (keep-alive parameter) for your data sources. You can try two workarounds:

  • Consider disabling the keep-alive parameter inside any data sources that receive continuous workload from Data Virtualization.
  • You can also decrease the settings for corresponding Data Virtualization properties, RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, as shown in the following examples: 

    CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC', '10', '', ?, ?);    -- default 20s, minimum 5s
    CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, '30', '', ?, ?);    -- default 120s, minimum 5s
    Decreasing the RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC settings might help if there are small gaps of complete inactivity that were previously too short for the Data Virtualization shrink and deactivate timeouts to take effect.
Schema map refresh in-progress message appears for reloaded connections that do not require a refresh schema map

Applies to: 5.2.0

The Schema map refresh in-progress message appears when you reload connections in Data Virtualization, even when the data source does not require a refresh schema map.
Only connections from data sources such as Google BigQuery, MongoDB, SAP S/4HANA, and Salesforce.com require a refresh schema map to update any changes in tables and columns for existing connections.