Limitations and known issues in Data Virtualization

The following limitations and known issues apply to Data Virtualization.

The following general limitations and issues apply to Data Virtualization.

Virtualizing a large number of tables simultaneously might cause internal server errors
Persistent volume on Data Virtualization head node becomes full
You cannot use special characters such as semicolons in schema names when you virtualize a table in MongoDB on Cloud
Preview of virtualized tables is limited by row size and number of columns
You must refresh the SSL certificate that is used by Data Virtualization after the Cloud Pak for Data self-signed certificate is updated
You cannot use special characters in table names
You cannot use special characters in schema names that are used in remote data sources
Virtualizing large tables might be slow or might fail
Downloading the Linux driver package fails in an air-gapped environment
Using custom SSL certificate can cause verification errors
Query performance issues due to missing table statistics
Cannot connect to the service in load-balancing environment

The following limitations and issues apply to data sources in Data Virtualization. See also Supported data sources in Data Virtualization.

Some columns are missing after you virtualize tables in a Microsoft Azure SQL Database in Data Virtualization
Connecting to a data source with a secret that uses the type set to generic fails
JDBC drivers default to use the TLSv1.3 protocol
Connections that uses a custom SSL certificate, which has been stored in a vault, must be created in Platform connections first
Cannot create connections to Db2 on Cloud data source with API keys that are stored in vaults
Unable to add a connection to SAP S/4HANA data source with a SAP OData connection
Some queries to SAP HANA data sources do not show correct results
Tables in a MongoDB data source might be missing when you virtualize
You cannot connect to a MongoDB data source with special characters in a database name
Previewing tables returns incorrect data for TINYINT in SAP HANA data sources
When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty
When you preview tables, LONG VARCHAR and LONG VARCHAR for bit data are mapped to CLOB and BLOB
Errors when you edit an SSL-based connection
Support of timestamp data type up to nanoseconds
Cannot preview data from Db2 for i source
Limitations for SSL-based data source connections
Cannot edit additional properties for data source connections
Limitations for adding Db2 on Cloud data source connections
Limitations for adding Db2 Hosted , Db2 Event Store, and Compose for MySQL data source connections
Query performance issues against Db2 data sources
Remote data sources - Performance issues when you create data source connection
Remote data sources - Errors in the remote connector upgrade script
Remote data sources - Cannot use system junction points

User and group management issues

The following limitations and issues apply to user and group management in Data Virtualization.

User group assignment changes might not take effect
Users and groups must adhere to naming guidelines
Privileges and authorities that are granted to user groups are not considered when you create views

Data governance issues

The following limitations and issues apply to data governance in Data Virtualization.

You cannot apply business terms when you virtualize files in data sources on the Files tab
Profiling of Data Virtualization data assets in Watson Knowledge Catalog fails in Cloud Pak for Data versions 4.0.7 and 4.0.8
Automatic publishing of virtual objects to the catalog is limited to certain objects
Data Virtualization always enforces data protection rules
Access control issues to preview assets with masked data
Cannot see list of available tables in the default virtualization mode
Cannot see list of available tables in the strict virtualization mode
Access to a table is denied by policies
Do not use duplicate catalog assets for the same table
Cannot see business term that is assigned to data asset
A virtualized object cannot be used in Cognos Dashboards without credentials and an appropriate role

Caching issues

The following limitations and issues apply to caching in Data Virtualization.

Minute selector of the cache refresh rate can be incremented beyond maximum and cannot be reset

File issues

The following limitations and issues apply to files in Data Virtualization.

You cannot preview long string values in headings in CSV, TSV, or Excel files
You might encounter errors when you virtualize large Excel files
Encoding detection override for files data with Japanese characters
Only UTF-8 character encoding is supported for CSV, TSV, and JSON files in Cloud Object Storage

Resolved issues

The following limitations and issues are resolved in Data Virtualization 1.7.8.

Data Virtualization upgrade stuck in pending state when a new Cloud Pak for Data project is added
Speed up loading of tables when you virtualize
You must zoom out in the user interface to add all connection details for an existing Platform connection in Cloud Object Storage
When you edit a grouped table page with a long list, the scroll bar might disappear
Caching issues with Cloud Object Storage
Caching pod does not initialize when Db2 Data Management Console is not available
Virtualized tables that were created from large Excel files might not have content after upgrading to Cloud Pak for Data version 4.0.5
Data Virtualization service on Cloud Pak for Data might appear to be unstable on a Red Hat OpenShift Kubernetes Service cluster
You cannot use special characters such as double quotation marks in schema names when you virtualize a table
A data source that is added with the setrdbcx stored procedure might not appear in the user interface
The IBM Informix driver does not support SSL connections
Cannot edit Ceph connection when incorrect parameters are entered
Virtualization might fail when assigning to a data request that was created by another user in the group
Some values are empty in the Services settings page
A user might not be able to access the cache dashboard page

General issues

Virtualizing a large number of tables simultaneously might cause internal server errors

When you add a large number of tables to your cart, for example more than 120, and try to virtualize them, you might see HTTP status code 500 Internal Server Error and your cart will fail to virtualize.

Workaround: Separate the tables to be virtualized into batches of 20 or less, and run these batches sequentially until all tables are virtualized.

Persistent volume on Data Virtualization head node becomes full

The persistent volume (PV) on the Data Virtualization head node becomes full because archived transaction logs in the embedded Db2® database are taking up a significant amount of space on the PV.

To work around this issue, see Persistent volume on Data Virtualization head pod becomes full.

Applies to: 4.0.0 and later

Fixed in: 4.0.8 (when you install Data Virtualization, might still occur when you upgrade Data Virtualization)

You must refresh the SSL certificate that is used by Data Virtualization after the Cloud Pak for Data self-signed certificate is updated

Data Virtualization stops accepting connections when your SSL certification expires. You might see an error message that indicates The server encountered an error followed by a Load configured data source failed: HttpError500 message.

When the Cloud Pak for Data self-signed certificate is updated, the SSL certificate that is used by Data Virtualization must be refreshed to maintain connectivity to the service.

For more information, Refreshing the SSL certificate used by Data Virtualization after the Cloud Pak for Data self-signed certificate is updated.

Applies to: 4.0.0 and later

You cannot use special characters in table names

Not all characters can be part of a table name. For example, the hashtag character (#) cannot be part of table names. You cannot use the hashtag character in table names even if you wrap this character in quotation marks.

Applies to: 4.0.0 and later

You cannot use special characters such as semicolons in schema names when you virtualize a table in MongoDB on Cloud

When you virtualize a table in a MongoDB on Cloud data source that has a semicolon in the schema name, you will not be able to virtualize the table. The MongoDB on Cloud data source removes the semicolon from the schema name and queries of this virtual table fail with error SQL5105N.

Applies to: 4.0.2 and later

You cannot use special characters in schema names that are used in remote data sources

You can virtualize a table in the remote data source but querying the virtual table fails with an SQL5105N error. See the following example.

ERROR] SQL error: [IBM][CLI Driver][DB2/LINUXX8664] SQL5105N The statement failed because a Big SQL component encountered an error. Component receiving the error: "DV-FMP". Component returning the error: "Virtualization Connector". Log entry identifier: "GAI-003-NA". SQLSTATE=58040 SQLCODE=-5105 (fvt_sql.py:157)

Applies to: 4.0.0 and later

Preview of virtualized tables is limited by row size and number of columns

Data Virtualization supports virtualization of tables with a row size up to 1 MB, and up to 2048 columns in a table. However, the number of columns that Data Virtualization can preview depends on many factors, such as the data types of the columns. Currently, preview is limited to 200 columns.

Applies to: 4.0.0 and later

Virtualizing large tables might be slow or might fail

When you virtualize multiple large tables from data sources, you might see the following messages in the Virtualize objects dialog box:

Error: CREATE OR REPLACE NICKNAME -- DB2 SQL Error: SQLCODE=-1229, SQLSTATE=40504, SQLERRMC=null, DRIVER=4.29.24

Error: CREATE OR REPLACE NICKNAME -- "<schema>"."<table>" already exists in Server "<server>" (use REPLACE=Y to overwrite)

The table <schema>.<table> already exists. Use REPLACE=Y to overwrite CallStack

This issue occurs even when you do not have tables with the same schema name and table name. The root cause is a timeout and retry sequence that is triggered by HAProxy timeouts. To work around this issue, ensure that your HAProxy settings meet the recommend values. For more information, see Network requirements for Data Virtualization and Load balancer timeout settings. If virtualization of large files takes longer than 5 minutes, increase the HAProxy settings and retry the virtualization.

Applies to: 4.0.0 and later

Downloading the Linux® driver package fails in an air-gapped environment

On the Connection Information page, when you click Download Linux driver package, the drivers must be downloaded from an external website. This is not supported in an air-gapped environment currently.

Applies to: 4.0.0 and later

Using custom SSL certificate can cause verification errors

When you follow these instructions to use a custom SSL or TLS certificate, you might encounter SSL verification errors and Data Virtualization might not work. Custom CA-signed certificates are not supported for Data Virtualization. Self-signed certificates do not cause this issue and are supported for Data Virtualization. To work around this issue, revert to the default certificate that was provided with Cloud Pak for Data or use a self-signed certificate.

Applies to: 4.0.0 and later

Query performance issues due to missing table statistics

When you create virtual tables, some of the table statistics are not collected correctly. To solve this issue, see Table statistics are not collected. For more information, Collecting statistics in Data Virtualization.

Applies to: 4.0.0 and later

Cannot connect to the service in load-balancing environment

If you get a timeout error when you try to connect to Data Virtualization, increase timeout values. To increase timeout values, update the /etc/haproxy/haproxy.cfg file and set the client and server timeout to 10m. For more information, see Load balancer timeout settings.

Applies to: 4.0.0 and later

Data source issues

Some columns are missing after you virtualize tables in a Microsoft Azure SQL Database in Data Virtualization

Some columns might be missing after you virtualize a table from Microsoft Azure SQL Database database. This is because the virtualization process in Data Virtualization uses a cache to resolve the remote table structure. If this structure is stale and new columns have been added to the remote table, the resulting virtual table contains only the old subset of columns.

Workaround: Reload the remote table's cache by using the refresh button in the List view of the Virtualize page.

Applies to: 4.0.0 and later

Connecting to a data source with a secret that uses the type set to generic fails

If you define a generic secret (type=generic) in a vault and use the secret to connect to a data source using the setrdbcx stored procedure, the connection will fail with an error message that is similar the following message.

qpendpoint_5:6419, failed with The exception 'java.sql.SQLException: 
Failed to update connection details in step build new connection details. 
Cause: Failed to update connection details in step retrieve secret details. 
Cause: null' was thrown while evaluating an expression.

Workaround: Use a secret with type set to credentials instead of a secret with type set to generic.

Applies to: 4.0.9 and later

JDBC drivers default to use the TLSv1.3 protocol

Some connections in Cloud Pak for Data 4.0.8 default to use the TLSv1.3 protocol when they initiate SSL connections to remote data sources. If the remote data source does not support TLSv1.3, the SSL handshake fails.

To work around this issue, enter the connection property CryptoProtocolVersion=TLSv1.2 in the Additional properties field when you create a connection to the data source in Data Virtualization.

This change affects the following connection types.

Amazon Redshift
Apache Hive
Google BigQuery
Greenplum
PostgreSQL
Microsoft Azure SQL Database
Microsoft SQL Server
Oracle
Salesforce.com
SAP S/4HANA (Using the SAP OData connection type)

Applies to: 4.0.6 and later

Connections that uses a custom SSL certificate, which has been stored in a vault, must be created in Platform connections first

If a connection uses a custom SSL certificate that has been stored in a vault, the connection must first be created in Platform connections by using the connections service REST API before it can be added on the Data sources page in Data Virtualization with Add connection > Existing connection.

The SSL certificate must either be input as a bare certificate in plain text format or be entered using the following secret reference format as a singleton secret instead of a JSON array as shown in the following example.

"ssl_certificate": "{\"secretRefID\":\"secret_urn:MySSLCert\",\"key\":\"password\"}"

For example, create the connection in Platform connections with the following API command.

POST "${CPDHOST}/v2/connections?test=false&catalog_id=${CATALOG_ID}" \
-H "content-type: application/json" \
-H "Authorization: Bearer ${CPDTOKEN}" \
-d '{ "datasource_type": "${DATASOURCE_TYPE}",
    "name": "${CONNECTION_NAME}",
    "properties": {
        "host": "mydatabase.mycompany.com",
        "port": "5500",
        "connection_mode": "sid",
        "sid": "mydatabase",
        "username": "{\"secretRefID\":\"secret_urn:MyCreds\",\"key\":\"username\"}",
        "password": "{\"secretRefID\":\"secret_urn:MyCreds\",\"key\":\"password\"}",
        "ssl": "true",
        "properties": "CryptoProtocolVersion=TLSv1.2",
        "ssl_certificate": "{\"secretRefID\":\"secret_urn:MySSLCert\",\"key\":\"password\"}"
    },
    "origin_country": "US",
    "flags": [
        "personal_credentials"
    ] } '

Applies to: 4.0.8 and later

Cannot create connections to Db2 on Cloud data source with API keys that are stored in vaults

When you enter add a connection to a Db2 on Cloud data source through a remote connector or a local agent, and you use an API key that is stored in a vault as your credentials, the action fails with an Error SQL4302N "Exception occurred during connection" message.

To work around this issue for a local agent, enter your API key directly, not as a secret reference in a vault. There is no workaround for Db2 on Cloud data sources that are accessed through a remote connector.

Applies to: 4.0.7 and later

Unable to add a connection to SAP S/4HANA data source with a SAP OData connection

If you try to connect to an SAP S/4HANA data source that contains many tables, that connection might time out and the connection might fail. Increasing timeout parameters has no impact.

To work around this issue, run the following commands:

db2 connect to bigsql
db2 "call DVSYS.setRdbcX('SAPS4Hana', '<Data source IP:port>', '0', '', 'CreateSchema=ForceNew', '<username>', '<password>', '0', '0', '', '', '<internal connector instance and port>', ?,?,?);"
db2 terminate

Applies to: 4.0.7 and later

Some queries to SAP HANA data sources do not show correct results

When you query virtualized tables in SAP HANA data sources that select certain columns and use the NOT operator as a filter on those columns, you might see incorrect results.

Applies to: 4.0.3 and later

Tables in a MongoDB data source might be missing when you virtualize

When you create a connection to MongoDB, you see only tables that were created in the MongoDB data source before the connection was added.

For example, if you have 10 tables in your MongoDB data source when you create a connection, you see 10 tables when you start to virtualize the table. If a user adds new tables into the MongoDB data source after the connection is added and before you click Virtualize, Data Virtualization won't display the new tables under the Virtualize tab.

Workaround: To see all recently added virtualized MongoDB tables, delete the connection to MongoDB and re-create the connection.

Applies to: 4.0.0 and later

You cannot connect to a MongoDB data source with special characters in a database name

The current MongoDB JDBC Driver does not support connection to database names containing special characters.

Applies to: 4.0.0 and later

Previewing tables returns incorrect data for TINYINT in SAP HANA data sources

If you preview SAP HANA data sources that contain data with a data type of TINYINT, you see inaccurate data for some rows of type TINYINT. However, you can virtualize the data source. Only preview is affected.

Applies to: 4.0.3 and later

When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty

After you virtualize the table, in Virtualized data, the data is available for the columns that contain LOB or Long Varchar data types.

Applies to: 4.0.0 and later

When you preview tables, LONG VARCHAR and LONG VARCHAR for bit data are mapped to CLOB and BLOB

When you preview tables, LONG VARCHAR and LONG VARCHAR for bit data are now mapped in Data Virtualization as follows:

Remote LONG VARCHAR s mapped to Data Virtualization CLOB and fix length is 32700.
Remote LONG VARCHAR for bit data is mapped to Data Virtualization BLOB and fix length is 32700.

Applies to: 4.0.0 and later

Errors when you edit an SSL-based connection

For example, you might see the following message. DB_WARNING: ENGINE_JDBC_CONN_ATTEMPT_ERROR: Failed JDBC Connection attempt in XXX ms for: YYY;sslTrustStoreLocation=/path/.dsTempTrustStore;, cause: .... Message: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty.

To work around this issue, see Errors when you edit an SSL-based connection in Data Virtualization.

Applies to: 4.0.0 and later

Support of timestamp data type up to nanoseconds

Data Virtualization supports the timestamp data type up to nanoseconds. When a remote data source's timestamp type is with a scale greater than 9 nanoseconds, Data Virtualization returns truncated timestamp column values up to nanoseconds. Additionally, for timestamp predicates, Data Virtualization compares only up to nanoseconds.

Applies to: 4.0.0 and later

Cannot preview data from Db2 for i source

When you try to preview data from the Db2 for i, an error message is displayed. Db2 for i might be a remote data source and you don't have the required access permissions. For more information, see the Db2 product documentation.

Applies to: 4.0.0 and later

Limitations for SSL-based data source connections

The following limitations and known issues apply to SSL-based data source connections in Data Virtualization:

Support of only PEM certificate format for built-in data sources: When you add a built-in data source to Data Virtualization, and you upload an SSL certificate, Data Virtualization supports SSL certificates with PEM format only. If you add a built-in data source to Data Virtualization, and you use an SSL certificate with a non-PEM format, the connection fails.
You can use openssl to convert your SSL certificates to PEM format before you upload the SSL certificate file to add a data source connection. For more information, see Cannot connect to a data source.
Issues to upload SSL certificates for connections that require a third-party driver: When you add a data source connection by uploading a third-party driver and you try to use an SSL certificate, you might encounter an error. It is recommended that you use non-SSL connections only for data sources that require third-party drivers.

Applies to: 4.0.0 and later

Cannot edit additional properties for data source connections

You can specify any additional JDBC properties when you add data source connections to Data Virtualization. However, you cannot change these additional JDBC properties after you add data source connections.

Applies to: 4.0.0 and later

Limitations for adding Db2 on Cloud data source connections

You can add Db2 on Cloud data source connections to Data Virtualization. However, you cannot specify any port information when you add these data source connections; by default, port 50001 is used. Additionally, you cannot specify whether the port has SSL enabled, so all Db2 on Cloud connections have SSL enabled by default. If you want to add a Db2 on Cloud connection with a different port or without enabling SSL, specify the Db2 connection type.

Applies to: 4.0.0 and later

Limitations for adding Db2 Hosted , Db2 Event Store, and Compose for MySQL data source connections

When you try to add Db2 Hosted , Db2 Event Store, and Compose for MySQL data source connections, you cannot specify whether to enable SSL or not. This limitation prevents you from adding these data source connections to Data Virtualization. To solve this issue, if you want to add these connections to Data Virtualization, specify the Db2 connection type for Db2 Hosted and Db2 Event Store data sources, data source and the MySQL connection type for the Compose for MySQL data source.

Applies to: 4.0.0 and later

Query performance issues against Db2 data sources

When you run a query against Db2 data sources, you might encounter performance issues. For example, these performance issues might appear if your SQL statement uses a string comparable, such as c1 = 'abc', c2=c3, where c1, c2, and c3 are string data types such as char or varchar. To avoid these performance issues, you must modify the collating sequence (COLLATING_SEQUENCE) server option of the data source. For more information about this server option, see Collating sequence.

Applies to: 4.0.0 and later

Remote data sources - Performance issues when you create data source connection

You try to create a data source by searching a different host, but the process takes several minutes to complete. This performance issue occurs only when these two conditions are met:

The remote data source is connected to multiple Cloud Pak for Data clusters
Data Virtualization connects to multiple data sources in different Cloud Pak for Data clusters by using the remote connectors.

To solve this issue, ensure that your Data Virtualization connections are located on a single Cloud Pak for Data cluster.

Applies to: 4.0.0 and later

Remote data sources - Errors in the remote connector upgrade script

When you run the script to upgrade a remote connector on a Windows data source, you see some error messages. For example:

'#' is not recognized as an internal or external command, operable program or batch file. '#export' is not recognized as an internal or external command, operable program or batch file.

You can ignore these error messages as the remote connector was upgraded successfully.

Applies to: 4.0.0 and later

Remote data sources - Cannot use system junction points

Data Virtualization does not support browsing data on remote data sources by using paths that contain system junction points. System junction points provide compatibility with an earlier version. For example, on Windows 10 C:\Documents and Settings is a system junction point to C:\Users. Thus, when you browse files on a remote Windows data source, you cannot enter a path that contains system junction points, such as C:\Documents and Settings. By default, system junction points are hidden from Windows users.

Note: Data Virtualization does support junction points and symbolic links that are created by Windows users.

Applies to: 4.0.0 and later

User and group management issues

User group assignment changes might not take effect

A new session (authentication) must be established for the user group assignments to take effect.

Workaround: SeeUser management in Managing users.

Applies to: 4.0.0 and later

Users and groups must adhere to naming guidelines

Best practices:

Group names must be less than or equal to the group name length listed in SQL and XML limits.
A username on Windows can contain up to 30 characters.
When not using Client authentication, non-Windows 32-bit clients that connect to Windows with a username that is longer than the username length listed in SQL and XML limits are supported when the username and password are specified explicitly.
A username must not be USERS, ADMINS, GUESTS, PUBLIC, LOCAL, or any SQL reserved word.
A username must not begin with IBM, SQL, or SYS.

Applies to: 4.0.0 and later

Privileges and authorities that are granted to user groups are not considered when you create views

This limitation is a result of a Db2 limitation on groups. For more information, see Restrictions on the use of group privileges when executing DDL statements or binding packages.

You can also grant public access on your objects for all roles or all Data Virtualizationusers and then restrict access by using data protection rules that are defined on groups. For more information, see Governing virtual data with data protection rules in Data Virtualization.

Applies to: 4.0.0 and later

Data governance issues

You cannot apply business terms when you virtualize files in data sources on the Files tab

When you virtualize files in Data > Data virtualization > Virtualize, the Business terms column is not available for data sources on the Files tab. These data sources do not support business terms.

Applies to: 4.0.0 and later.

Profiling of Data Virtualization data assets in Watson™ Knowledge Catalog fails in Cloud Pak for Data versions 4.0.7 and 4.0.8

In Cloud Pak for Data version 4.0.7 or 4.0.8, profiling of Data Virtualization data assets in Watson Knowledge Catalog fails with the following error The property [auth_method] is not supported. Also, you cannot preview virtual objects in Watson Knowledge Catalog when a data masking rule applies.

Important: Do not upgrade to Cloud Pak for Data version 4.0.7 or 4.0.8 unless you have applied all available patches for Watson Knowledge Catalog.

Applies to: 4.0.7 and 4.0.8

Fixed in: 4.0.9

Automatic publishing of virtual objects to the catalog is limited to certain objects

Only objects that are created in the user interface are automatically published to the catalog. Objects that are created using SQL are not published automatically and must be published to the catalog manually or by using the API.

Applies to: 4.0.0 and later

Data Virtualization always enforces data protection rules

Data Virtualization enforces data protection rules even if the data asset is cataloged in a catalog that does not have the Enforce data protection rules option enabled. This behavior is subject to change in future releases. To ensure a predictable behavior in future releases, add the virtual data assets to the catalogs with the Enforce data protection rules option enabled only.

Applies to: 4.0.0 and later

Access control issues to preview assets with masked data

Tech preview This is a technology preview and is not supported for use in production environments.

When you preview Data Virtualization data assets in Watson services in Cloud Pak for Data (for example, Watson Knowledge Catalog, Watson Studio, and Data Refinery), and in cases when data masking applies, the preview is subject to the data protection rules and catalog or project access control only.

To avoid double masking when you use preview in Watson services, access control in Data Virtualization is not applied when you preview a data asset (table or view) that comes from Data Virtualization. This happens only when data masking applies to the preview in Watson services. Access control does not apply in this circumstance.

You must define your rules to manage access to the catalogs, projects, data assets, or connections for access control in Watson services.

Applies to: 4.0.0 and later

Cannot see list of available tables in the default virtualization mode

In the default virtualization mode (where you can see all tables, irrespective of business term assignments), when you navigate to the Virtualize page, the console appears to be loading the table list for a while when data sources added to Data Virtualization have tables with nonstandard types, such as NULL or OTHER. However, you can wait for the loading to complete to see a list of all tables, and you can preview, add to cart, edit columns, and virtualize any of the listed tables. Refresh is disabled, but you can refresh the page to trigger the reload of the available tables cache.

Applies to: 4.0.0 and later

Cannot see list of available tables in the strict virtualization mode

In the strict virtualization mode (where you can see tables only if they have at least one column with business term assignment), when you navigate to the Virtualize page, the console appears to be loading the table list for a while without showing any tables. The loading can be much slower compared to the default virtualization mode while the console evaluates the list of eligible tables that can be virtualized, depending on term assignments to data source table and column names.

Applies to: 4.0.0 and later

Access to a table is denied by policies

You cannot access a table but according to the data policies and authorizations, you are authorized to access this table. This issue occurs only if Watson Knowledge Catalog policy enforcement is enabled in Data Virtualization. To solve this issue, see Access to a table is denied by policies in Data Virtualization.

Applies to: 4.0.0 and later

Do not use duplicate catalog assets for the same table

The policy service is unable to decide which of the duplicated assets to use for policy enforcement and does not aggregate the rules. You must avoid duplicate assets across governed catalogs as this might lead to issues with policy enforcement behavior in Data Virtualization.

Applies to: 4.0.0 and later

Cannot see business term that is assigned to data asset

You are virtualizing a data asset in Watson Knowledge Catalog, but you cannot see a business term that is assigned to this data asset. To solve this issue, see Cannot see business term that is assigned to data asset in Data Virtualization.

Applies to: 4.0.0 and later

A virtualized object cannot be used in Cognos® Dashboards without credentials and an appropriate role

Using a virtualized object in Data Virtualization displays an error message if you do not enter credentials for the Data Virtualization connection or you do not have the correct role.

If you did not enter a username and password when you created the connection to the Data Virtualization data source, you see the following error: Missing personal credentials. If you are not assigned the Admin or Steward role for this object, you see the following error: Unable to access this asset.

To work around this issue, see A virtualized object cannot be used in Cognos Dashboards without credentials and an appropriate role in Data Virtualization.

Applies to: 4.0.0 and later

Caching issues

Minute selector of the cache refresh rate can be incremented beyond maximum and cannot be reset: To set a cache refresh rate, you can select an Hourly frequency and then choose the minute of the hour when the cache refresh is run. If you increase this minute beyond 59, the refresh minute becomes blank. If you leave this page with the refresh minute blank and then return to this page, you cannot set the hourly refresh rate for a cache because the minute remains blank.; Applies to: 4.0.0 and later; Workaround: If you increase the minute beyond 59, you must decrement the value until it is valid before leaving the page.

File issues

You cannot preview long string values in headings in CSV, TSV, or Excel files

When you use the first row as column headings, the string values in that row must not exceed the maximum Db2 identifier length of 128 characters and cannot be duplicated. If your file has string names in the header row with values that are too long or are duplicated, an error message is displayed when you try to preview your file in Data Virtualization.

400: Missing ResultSet:java.sql.SQLSyntaxErrorException: Long column type column or parameter 'COLUMN2' not permitted in declared global temporary tables or procedure definitions.

Column heading names are case-insensitive and converted to uppercase in the API response, which is exposed by the console. Therefore, a column named ABC is considered the same as a column named abc. However, the columns can be renamed to mixed case when you virtualize your data source.

Workaround: Review the first row in the data file that contains the intended column heading names and make the necessary changes to avoid this limitation.

Applies to: 4.0.0 and later

You might encounter errors when you virtualize large Excel files

You might experience an error when you preview or virtualize large files with Excel format (XLS):

ERROR 400 'Your InputStream was neither an OLE2 stream nor an OOXML stream' was thrown while evaluating an expression.

Workaround: You can follow these steps:

Load the file data into a table on a supported data source. For more information, see Supported data sources.
Virtualize the table. For more information, see Creating a virtualized table.
Optionally, you can change the format of the file; for example, to CSV format before you virtualize it again.

Applies to: 4.0.0 and later

Encoding detection override for files data with Japanese characters

For text files exposed by remote connectors, Cloud Pak for Data automatically detects the encoding scheme of flat data files, such as CSV and TSV files. However, to avoid decoding issues, it is recommended that you set the encoding scheme manually for flat data files. For more information, see Setting encoding scheme.

Applies to: 4.0.0 and later

Only UTF-8 character encoding is supported for CSV, TSV, and JSON files in Cloud Object Storage

For text files in Cloud Object Storage data sources with CSV, TSV, and JSON formats, only UTF-8 character encoding is supported in Data Virtualization. Cloud Object Storage binary formats such as Optimized Row Columnar (ORC) or Parquet are unaffected because they transparently encode character types.

Applies to: 4.0.3 and later

Resolved issues

Data Virtualization upgrade stuck in pending state when a new Cloud Pak for Data project is added

If you upgrade Data Virtualization after a new Cloud Pak for Data project (namespace) was added, the Data Virtualization operator pod remains stuck in a pending state.

To work around the problem, restart the Data Virtualization operator pod by running the following command.

oc -n <project> delete pod $(oc -n <project> get pods | grep -i dv | cut -d' ' -f 1)

Applies to: 4.0.0 and later

Fixed in: 4.0.8

Speed up loading of tables when you virtualize

Data sources larger than 100,000 tables slow down the loading of tables in the Virtualize > Tables page. You can reduce their scope by setting remote schema filters. For more information, see Speed up loading of tables when you virtualize in Data Virtualization.

Applies to: 4.0.0 to 4.0.6

Fixed in: 4.0.7 (See Filtering data in Data Virtualization.)

You must zoom out in the user interface to add all connection details for an existing Platform connection in Cloud Object Storage

When you add a Data Virtualization connection to an existing platform connection to Cloud Object Storage, you must add all of the required credentials. For more information, see Connecting to Cloud Object Storage in Data Virtualization.

To add credentials, you must zoom out in your browser to see the entire Add required credentials dialog box. The Secret key field is not enabled until you enter an Access key. The Update button does not appear until you zoom out to see the entire dialog box.

Applies to: 4.0.1 and later

Fixed in: 4.0.7

When you edit a grouped table page with a long list, the scroll bar might disappear

After you connect to data sources and virtualize group tables, you can review the grouped tables and clear some entries before you virtualize. On the Edit grouped tables page, if you have a long list and you need to scroll through the page, the scroll bar might disappear after you clear some selections. You cannot scroll up and click Apply and continue.

Workaround: Zoom out your browser to see the entire page without a scroll bar. You can click Apply and continue.

Applies to: 4.0.3 and later

Fixed in: 4.0.7

Caching issues with Cloud Object Storage

On Data > Data virtualization > Virtualization > Cache management > Add new cache, the cache can be created and managed by users. However, if you click Test queries with cache, you see a Failed to test queries error message. You cannot use matching and recommendations capabilities in caching for Cloud Object Storage.

Applies to: 4.0.0 and later

Fixed in: 4.0.7

Virtualized tables that were created from large Excel files might not have content after upgrading to Cloud Pak for Data version 4.0.5

When you upgrade to Cloud Pak for Data version 4.0.5, you must check all tables that were virtualized from files with greater than 250,000 cells or files that have a file size greater than 3 MB, which can happen when a workbook contains macros. If these virtualized tables are empty after upgrade, run the following commands.

On all remote connector hosts, edit the file: <DV endpoint directory>/sysroot/data/gaiandb_config.properties.
Search for and replace all instances of com.ibm.db2j.GExcel with com.ibm.db2j.QExcel. For troubleshooting, see Queries on virtualized flat files fail with incorrect results in Data Virtualization.
Save the file and exit the editor. The changes are automatically loaded after you save the file. Confirm that large Excel files can be queried.
If virtual tables for files on a host still don't show content, try restarting the remote connector agent. For more information, see Managing connectors on remote data sources.

Fixed in: 4.0.6

Data Virtualization service on Cloud Pak for Data might appear to be unstable on a Red Hat® OpenShift® Kubernetes Service cluster

You might experience SQL operations intermittently failing, and you might also see SQLCODE=-1229 errors:

Error message The current transaction has been rolled back because of a system error.. SQLCODE=-1229, SQLSTATE=40504, DRIVER=4.27.25

To work around the issue, see Data Virtualization service on Cloud Pak for Data appears to be unstable on a Red Hat OpenShift on Kubernetes cluster.

Applies to: 4.0.0, 4.0.1, 40.2, 4.0.3, 4.0.4

Fixed in: 4.0.5

You cannot use special characters such as double quotation marks in schema names when you virtualize a table

When you virtualize a table that has a double quotation mark in the schema name, you will see an error message and you will not be able to virtualize the table. See the following example.

Error The assets request failed: CDICO2060E: The metadata for the select statement could not be retrieved Sql error: The statement failed because a Big SQL component encountered an error. Component receiving the error: "SCHEDULER". Component returning the error: "SCHEDULER". Log entry identifier: "[SCL-0-3a08009d4]". Reason: "".. SQLCODE=-5105, SQLSTATE=58040, DRIVER=4.28.11. The statement text is: SELECT * FROM """ABC"""."SHIP_MODE" FOR READ ONLY

Applies to: 4.0.2 to 4.0.4

Fixed in: 4.0.5

A data source that is added with the setrdbcx stored procedure might not appear in the user interface

If you try to add a data source by using the setrdbcx stored procedure, the data source gets added successfully, but it doesn't show up in the Data Sources page in the user interface. Data Virtualization cannot add the data source to Platform connections.

This issue can occur on Netezza® on Cloud, MongoDB, and Db2 Warehouse data sources.

Workaround: Add a data source by clicking New connection or selecting an Existing Connection in the user interface. This method ensures that you are already using a data source that is available in Platform connections or the connection is added to Platform connections before you add the data source to Data Virtualization.

Applies to: 4.0.0 to 4.0.2

Fixed in: 4.0.3

The IBM Informix® driver does not support SSL connections

If you create a connection by using an uploaded IBM Informix JDBC driver, this connection will not support SSL.

Applies to: 4.0.2

Fixed in: 4.0.3

Cannot edit Ceph® connection when incorrect parameters are entered

You have the Data Virtualization Admin or Engineer roles and you try to add a Ceph data source connection with incorrect parameters. The error message indicates The data source could not be added. Ensure that the specified parameters are correct and try again, however, when you click Edit in the message, the page hangs and you cannot edit your parameters.

Workaround: Close the error message and retry with corrected parameters.

Applies to: 4.0.2

Fixed in: 4.0.3

Virtualization might fail when assigning to a data request that was created by another user in the group

This issue occurs in the following scenario:

User A is a member of a user group.
User A does not explicitly have a Data Virtualization role but the group has a Data Virtualization role.
User A logs in and creates a data request and assigns it to another user, user B.
User B logs in and accepts the data request. User B can't virtualize objects from the Data requests page.
User B can virtualize objects successfully from the Virtualize page but assigning them to the data request fails.

Workaround: Add user A in Data Virtualization as a Data Virtualization role. Then, user A can log in and create a data request. User B can log in and accept the data request. User B can now virtualize objects and assign them to the data request in Data Virtualization.

Applies to: 4.0.0 - 4.0.2

Fixed in: 4.0.3

Caching pod does not initialize when Db2 Data Management Console is not available

In Cloud Pak for Data version 4.0.1 or earlier, the Data Virtualization caching pod does not initialize successfully if Db2 Data Management Console does not initialize successfully.

Workaround: See Caching pod does not initialize when Db2 Data Management Console is not available in Data Virtualization.

Applies to: 4.0.0 to 4.0.1

Fixed in: 4.0.2

Some values are empty in the Services settings page

When you view the Service settings page, the following values are empty:

Number of worker nodes
Cores per worker node
Memory per worker node
Storage summary

Workaround: You can use the following commands to retrieve missing information:

To find the number of worker nodes, run the following command.
```
oc get bigsql db2u-dv -o yaml | grep -i 'size:'
```
To find the cores per worker node and memory per worker node, run the following command. Look for the volumeClaimTemplates section where you can see the storage class and the persistent volume size.
```
oc get statefulset c-db2u-dv-db2u -o yaml
```
To find a storage summary that shows the storage class that is used to create persistent volumes and sizes of persistent volumes, run the following command. Look for the volumeClaimTemplates section where you can see the storage class and the persistent volume size.
```
oc get statefulset c-db2u-dv-db2u -o yaml
```

Applies to: 4.0.1

Fixed in: 4.0.2

A user might not be able to access the cache dashboard page

In Cloud Pak for Data 4.0.1, a user who does not explicitly have a Data Virtualization Admin role but is assigned to a group and the group has a Data Virtualization Admin role, will not be able to access the cache dashboard page.

Workaround: Grant the user a Data Virtualization Admin role explicitly, instead of or in addition to adding the user to a group that has the Data Virtualization Admin role.

Applies to: 4.0.1

Fixed in: 4.0.2