Known issues and limitations for Data Virtualization

The following known issues and limitations apply to Data Virtualization.

For additional solutions to problems that you might encounter with Data Virtualization, see the troubleshooting topic Troubleshooting the Data Virtualization service.

Administration issues
The are no current issues that apply to administration tasks in Data Virtualization. For more information, see Administering Data Virtualization.
Audit issues
The following issues and limitations apply to audit issues in Data Virtualization.
Data source and connections issues
The following issues and limitations apply to data sources in Data Virtualization. For more information about data sources, see Supported data sources in Data Virtualization. For additional solutions to problems that you might encounter with data source connections, see the troubleshooting topic Troubleshooting data source connections in Data Virtualization.
Installation and upgrade issues
The following issue applies to installing and upgrading Data Virtualization. For more information, see Installing Data Virtualization and Upgrading Data Virtualization.
Data virtualization issues
The following issues and limitations apply to virtualization in Data Virtualization. For additional solutions to problems that you might encounter with data virtualization, see the troubleshooting topic Troubleshooting virtualization issues in Data Virtualization.
User and group management issues
The following issues and limitations apply to user and group management in Data Virtualization. For more information about user and group management, see Managing roles for users and groups in Data Virtualization.
Data governance issues
The following issues and limitations apply to data governance in Data Virtualization. For additional solutions to problems that you might encounter with data governance, see the troubleshooting topic Troubleshooting governance issues in Data Virtualization.
Caching issues
The following issues and limitations apply to caching in Data Virtualization. For additional solutions to problems that you might encounter with caching, see the troubleshooting topic Troubleshooting data caches and queries in Data Virtualization.
File issues
The following issues and limitations apply to files in Data Virtualization.
Resolved issues
The following issues and limitations are fixed in Data Virtualization 5.0.0:

Administration issues

Audit issues

Audit records with transaction IDs containing special characters might be missing

Applies to: 5.0.0 and later

Some audit records might be missing from the IBM Cloud Pak for Data audit service because the Db2U Audit service is unable to process some randomly generated transaction IDs that contain special characters. There is currently no workaround.

Audit log entries might not be forwarded to the zen-audit service

Applies to: 5.0.0 and later

You might lose log entries when the log-streaming pod crashes or the audit log PVC is experiencing issues, resulting in logs not being forwarded to the zen-audit service.
Audit log records might display incomplete information

Applies to: 5.0.0 and later

An audit log record might display only a portion of attributes such as timestamp, action user ID, and etc., when unprintable characters occur in the audit record.

SQL statement text is not logged

Applies to: 5.0.0 and later

SQL statement text is not logged for events related to SQL executions within the CONTEXT and EXECUTE categories. To view Db2 audit events, see Db2 audit events for Data Virtualization.

Data source and connections issues

VIRTUALIZETABLE procedure fails on Presto if user only has partial column access

Applies to: 5.0.0 and later

For Presto data sources, the VIRTUALIZETABLE stored procedure fails with a Query failed error when a user has access to only a subset of columns in the underlying table.

Workaround: Create a view in the source that includes only the columns the user can access, and then virtualize that view in Data Virtualization.
Connecting as cpadmin fails after IAM is disabled
After IAM is disabled, some platform admin users fail to connect to the Db2 deployments as cpadmin.
Applies to: 5.0.0 and later
Symptoms: Initially, the IAM integration for Cloud Pak for Data is enabled, and the Db2 connections worked normally. You then disable the IAM integration by using the steps in Configuring Cloud Pak for Data to use the embedded LDAP integration. After you disable the IAM integration, you cannot connect to Db2 instances, including existing deployments from before IAM was disabled and new deployments from after IAM was disabled.

The Db2 database connection error message shows USERNAME AND/OR PASSWORD INVALID, as shown in the following example:

$ db2 connect to BLUDB user cpadmin using <password>
SQL30082N  Security processing failed with reason "24" ("USERNAME AND/OR
PASSWORD INVALID").  SQLSTATE=08001
Cause: Starting in Cloud Pak for Data version 4.8.0, the flag that indicates that IAM integration is disabled and that the embedded LDAP integration is used was changed. The security plugin in Db2 instances looks at the old flag instead of the new flag and continues to try to authenticate the through the Cloud Pak for Data IAM API layer. This results in connection failures.
Diagnosing the problem: In the Db2 diagnostic log file, db2diag.N.log, the following error appears for the function IBMIAMauth::verify_cp4d_auth_iam:
2024-03-01-02.24.45.239952+000 I4283174E595          LEVEL: Error
PID     : 14632                TID : 140006895118080 PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : BLUDB
APPHDL  : 0-7714
HOSTNAME: c-db2oltp-1707901157071494-db2u-0
EDUID   : 52                   EDUNAME: db2agent (BLUDB) 0
FUNCTION: DB2 UDB, bsu security, sqlexLogPluginMessage, probe:20
DATA #1 : String with size, 170 bytes
IBMIAMauth::verify_cp4d_auth_iam: Fail to get response code, retval = 6, http_code = 400, jwt = Failed to get access token, invalid token request parameters server_error
Workaround:
  1. Complete the following workaround steps first to make sure that they resolve the problem. This workaround, however, is temporary. If the workaround steps solve the issue, proceed to step 2 to apply a patch for a more permanent solution.
    1. Select your deployment by running the following command:
      oc -n ${PROJECT_CPD_INST_OPERANDS} exec -it ${db2_podname} -- bash
    2. Switch to the db2inst1 profile:
      su - db2inst1
    3. Unset the env value DB2_BEDROCK_ROUTE:
      unset DB2_BEDROCK_ROUTE
    4. Stop wolverine:
      sudo sv stop wolverine
    5. Refresh the db2u security plug-in and restart Db2:
      db2stop force && ipclean -a
      /bin/bash -c "source /db2u/scripts/include/db2_functions.sh && refresh_db2_sec_plugin"
      db2inst1]$ db2start
      You can now connect to your Db2 database as cpadmin.
      Note:

      You must apply this change again if the pod restarts. For a more permanent solution, complete step 2.

  2. If the workaround from step 1 does resolve the issue and you want the changes to persist, complete the following steps to apply a patch:
    1. Copy the file from the Db2 engine pod, where <db2u-podname> is the Db2 engine pod. For example, the engine pod might be c-db2oltp-1234567-db2u-0, c-db2u-dv-db2u-0, or c-bigsql-<instance id>-0.

      oc cp <db2u-podname>:/db2u/db2u_entrypoint.sh ./db2u_entrypoint.sh
    2. Open the file in a text editor, such as vi editor:

      vi db2u_entrypoint.sh
    3. The following patching might be dynamic because of different version in Cloud Pak for Data. If you are uncertain with the change, reach out to the Db2 Containerized team.
      1. Find the following line (around line 97):
        echo "export DB2_BEDROCK_ROUTE=$(${GLOBAL_SUDO_CMD} cat ${cm_key_cpd_cluster_address})" >>${sqllib_profile}
      2. Change the line to the following:
        echo "#export DB2_BEDROCK_ROUTE=$(${GLOBAL_SUDO_CMD} cat ${cm_key_cpd_cluster_address})" >>${sqllib_profile}
      3. Add the following line:
        unset DB2_BEDROCK_ROUTE
      The updated code snippet looks like the following example:
      # Add BEDROCK_ROUTE from external ConfigMap VolumeSource
      cm_key_cpd_cluster_address="${DB2U_CONFIGMAP_ROOT}/external/management-ingress-ibmcloud-cluster-info/cluster_address"
      if [[ -e ${cm_key_cpd_cluster_address} ]]; then
      grep -qE "^export DB2_BEDROCK_ROUTE=" ${sqllib_profile} && sed -i "/^export DB2_BEDROCK_ROUTE=.*$/d" ${sqllib_profile}
      echo "#export DB2_BEDROCK_ROUTE=$(${GLOBAL_SUDO_CMD} cat ${cm_key_cpd_cluster_address})" >>${sqllib_profile}
      fi
    4. Save the file.

    5. Create a configmap that contains the updated script, where <CPD_INSTANCE_NAMESPACE> is the Cloud Pak for Data instance namespace:
       oc create configmap db2u-entrypoint-sh -n <CPD_INSTANCE_NAMESPACE> --from-file=db2u_entrypoint.sh
    6. Find the hardcoded statefulset named c-db2u-dv-db2u, which is associated with the Db2 engine pod.

    7. Patch the volume:

       oc set volume statefulset/<db2-StatefulSet> -n <CPD_INSTANCE_NAMESPACE> --add --name=db2u-entrypoint-sh --type=configmap --containers=db2u --mount-path="/db2u/db2u_entrypoint.sh" --sub-path="db2u_entrypoint.sh" --configmap-name=db2u-entrypoint-sh --default-mode="755" --read-only=false --overwrite

      After you patch the volume, you should see the Db2 engine pod restarting. When the pod reaches ready state, Db2 connections should be successful.

Unable to add a data source from an existing connection

Applies to: 5.0.3

When you try to add a data source in Data Virtualization by using an existing platform connection, after you select the radio button beside the platform connection name, and then click Add, the UI hangs and you cannot complete the process.
Workaround:

To select the platform connection, click the platform connection name instead of the radio button beside the name, and then proceed to add the connection.

Collecting statistics for a virtualized Presto table results in an error

Applies to: 5.0.0 and later

When you try to collect the statistics of a virtualized Presto table using the COLLECT_STATISTICS stored procedure in Run SQL, you might encounter the error No statistics returned for table.

Workaround: Wait for an interval of 2-3 minutes and then attempt to collect the statistics again.
Note: You can apply a permanent fix on the data source side by setting the parameter http-server.max-request-header-size=<value up to 10TB> in the Presto coordinator configuration file. For more information, see Update Presto engine.
Accessing Data Virtualization using a URL displays an error

Applies to: 5.0.0

Fixed in: 5.0.1

When you attempt to access Data Virtualization using a URL rather than through the Cloud Pak for Data home page, the resulting page displays the error message The data cannot be displayed..
Workaround: Log into Cloud Pak for Data and then navigate to Data > Data virtualization.
Unable to connect to a SSL-enabled data source in Data Virtualization by using a remote connector if you use custom certificates

Applies to: 5.0.0 and later

After you create your own custom certificates to use when Data Virtualization connects to the Cloud Pak for Data platform, you are not able to connect to a remote SSL-enabled data source.

For more information about creating custom certificates, see managegen-platform-ca-certs in the "cpd-cli command reference."

For more information about remote data sources and connectors in Data Virtualization, see Accessing data sources by using remote connectors in Data Virtualization.
Workaround:
Complete the following steps to resolve this issue and connect to remote data sources:
  1. Download the CA certificate files from /etc/pki/ca-trust/source/anchors in the Data Virtualization head pod c-db2u-dv-db2u-0.
  2. Upload the CA certificate files that you downloaded in step 1 to each remote connector computer.
  3. If a file has more than one CA certificate, split the certificates into individual files
  4. Run the following command to set AGENT_JAVA_HOME to the Java home that is used by the remote connectors:
    AGENT_JAVA_HOME=$(tac <Remote connector install path>/datavirtualization.env| grep -i java_home | grep -v "#" -m )
  5. For each file that you created in step 3, run the following command to add the certificates to the Java cacerts truststore in AGENT_JAVA_HOME/lib/security/cacerts. Make sure that you provide a unique alias value for the -alias command parameter for each file.
    keytool -storepass changeit -keystore $AGENT_JAVA_HOME/lib/security/cacerts -importcert -alias <pick a alias> -rfc -file <absolute path to cert generated in step 3> -noprompt
  6. Restart the remote connector. For more information, see Managing connectors on remote data sources.
Querying a virtualized table in a Presto catalog with a matching schema from a previous catalog might result in an error

Applies to: 5.0.0 and later

When you use a new catalog and then query a virtualized table from a previous catalog with an identical schema name, the query might run if the schema exists in the new catalog, but the outcome might result in an error. The resulting error might be caused by the differences in column definitions within the tables under the same schema name across catalogs.

Special characters are not supported in MongoDB database names

Applies to: 5.0.0 and later

You cannot use special characters such as semicolons and single quotes in a MongoDB database name.
The List view in the Virtualize page does not display certain Hive schemas correctly

Applies to: 5.0.0 and later

Fixed in: 5.0.3

For Hive connections, on the Virtualize page, the List view might not show data assets, due to restricted column access.

Workaround:
  1. In the Data Virtualization web client, navigate to the Data sources page, then select your Hive connection.
  2. Select Edit connection from the overflow menu (...), then select Connection details > Additional properties.
  3. Add the following parameter in the text box.
    CatalogMode=query
  4. Reload the data assets in the List view.
Limited file types are supported with the Microsoft Azure Data Lake Storage Gen2 data source connection

Applies to: 5.0.0 and later

You can connect to Microsoft Azure Data Lake Storage Gen2 data source from Data Virtualization. However, only the following file types are supported with this connection in Data Virtualization:
  • CSV
  • TSV
  • ORC
  • Parquet
  • JSON Lines
Special characters are not preserved in databases or schemas with MongoDB connections after you upgrade to Data Virtualization on Cloud Pak for Data

Applies to: 5.0.0 and later

After you upgrade from an earlier version of Data Virtualization on Cloud Pak for Data to version 5.0.0, with a MongoDB connection, special characters in the database or schema are not preserved even though SpecialCharBehavior=Include is set in the data source connection. The updated MongoDB driver (version 6.1.0) that is used in Data Virtualization on Cloud Pak for Data does not recognize special characters in tables by default. This might cause issues with your results when you query a virtual table that has special characters.
The DECFLOAT data type is not supported in Data Virtualization
Applies to: 5.0.0 and later

The DECFLOAT data type is not supported in Data Virtualization. As a result, the type DECFLOAT is converted to DOUBLE and the special numeric values NaN, INF, and -INF are converted to NULL.

The Data sources page might fail to load data sources when remote connectors are added, edited, or removed

Applies to: 5.0.0 and later

If you remove a connection to a remote connector or the remote connector becomes unavailable because credentials expire, the Data sources page fails to load this connection.

Unable to add a connection to SAP S/4HANA data source with a SAP OData connection

Applies to: 5.0.0 and later

If you try to connect to an SAP S/4HANA data source that contains many tables, that connection might time out and the connection might fail. Increasing timeout parameters has no impact.
To work around this issue, run the following commands.
db2 connect to bigsql
db2 "call DVSYS.setRdbcX('SAPS4Hana', '<Data source IP:port>', '0', '', 'CreateSchema=ForceNew', '<username>', '<password>', '0', '0', '', '', '<internal connector instance and port>', ?,?,?);"
db2 terminate
You cannot connect to a MongoDB data source with special characters in a database name

Applies to: 5.0.0 and later

The current MongoDB JDBC Driver does not support connection to database names that contain special characters.

When you virtualize data that contains LOB (CLOB/BLOB) or Long Varchar data types, the preview might show the columns as empty

Applies to: 5.0.0 and later

After you virtualize the table, in Virtualized data, the data is available for the columns that contain LOB or Long Varchar data types.

Support of timestamp data type up to nanoseconds

Applies to: 5.0.0 and later

Data Virtualization supports the timestamp data type up to nanoseconds. When a remote data source's timestamp type is with a scale greater than 9 nanoseconds, Data Virtualization returns truncated timestamp column values up to nanoseconds. Additionally, for timestamp predicates, Data Virtualization compares only up to nanoseconds.

Remote data sources - Performance issues when you create data source connection

Applies to: 5.0.0 and later

You try to create a data source by searching a different host, but the process takes several minutes to complete. This performance issue occurs only when these two conditions are met:
  • The remote data source is connected to multiple Cloud Pak for Data clusters.
  • Data Virtualization connects to multiple data sources in different Cloud Pak for Data clusters by using the remote connectors.

To solve this issue, ensure that your Data Virtualization connections are on a single Cloud Pak for Data cluster.

Remote data sources - Cannot use system junction points

Applies to: 5.0.0 and later

Data Virtualization does not support browsing data on remote data sources by using paths that contain system junction points. System junction points provide compatibility with an earlier version. For example, on Windows 10 C:\Documents and Settings is a system junction point to C:\Users. Thus, when you browse files on a remote Windows data source, you cannot enter a path that contains system junction points, such as C:\Documents and Settings. By default, system junction points are hidden from Windows users.
Note: Data Virtualization does support junction points and symbolic links that are created by Windows users.
Query fails due to unexpectedly closed connection to data source

Applies to: 5.0.0 and later

Data Virtualization does not deactivate the connection pool for the data source when your instance runs a continuous workload against virtual tables from a particular data source. Instead, Data Virtualization waits for a period of complete inactivity before it deactivates the connection pool. The waiting period can create stale connections in the connection pool that get closed by the data source service and lead to query failures.

Workaround: Check the properties for persistent connection (keep-alive parameter) for your data sources. You can try two workarounds:
  • Consider disabling the keep-alive parameter inside any data sources that receive continuous workload from Data Virtualization.
  • You can also decrease the settings for corresponding Data Virtualization properties, RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, as shown in the following examples:
    CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC', '10', '', ?, ?);    -- default 20s, minimum 5s
    CALL DVSYS.SETCONFIGPROPERTY('RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC, '30', '', ?, ?);    -- default 120s, minimum 5s
    Decreasing the RDB_CONNECTION_IDLE_SHRINK_TIMEOUT_SEC and RDB_CONNECTION_IDLE_DEACTIVATE_TIMEOUT_SEC settings might help if there are small gaps of complete inactivity that were previously too short for the Data Virtualization shrink and deactivate timeouts to take effect.
Schema map refresh in-progress message appears for reloaded connections that do not require a refresh schema map

Applies to: 5.0.0 and later

The Schema map refresh in-progress message appears when you reload connections in Data Virtualization, even when the data source does not require a refresh schema map.
Only connections from data sources such as Google BigQuery, MongoDB, SAP S/4HANA, and Salesforce.com require a refresh schema map to update any changes in tables and columns for existing connections.

Data virtualization issues

Publishing an object that uses double quotation marks in its name (") fails with SQL0010N error

Applies to: 5.0.0 and later

When you publish an object that uses double quotation marks (") in its name, the action fails and this error displays:
SQLExecDirectW: {42603} [IBM][CLI Driver][DB2/LINUXX8664] SQL0010N The string constant beginning with "" WHERE 0=1 FOR READ ONLY WITH UR" does not have an ending string delimiter. SQLSTATE=42603
Workaround: When you are virtualizing the objects, remove the double quotation marks from the object names.
Previewing Cloud Object Storage (COS) data might cause a "This flat file is invalid" error

Applies to: 5.0.0 and later

When you navigate to the Virtualize page and then select Files to preview COS data, you might encounter this error: This flat file is invalid. Try refreshing or remove the flat file before continuing. This error occurs when a database connection becomes inactive for an extended time without COS connectivity, which causes the database connection to time out and terminate.

Workaround: Activate the database connection by navigating to the Virtualize page and then virtualizing COS assets.

Leaving the page during a bulk deletion of tables causes errors
Applies to: 5.0.0 and later

If you delete tables in bulk, for example more than 30 at the same time, don't leave the page until the deletion is complete.

The web client does not display fetch phase warnings with the result set
Applies to: 5.0.0
When you run a query from the Run SQL page, fetch phase warning messages for queries with warnings or errors are not displayed as part of the Result set.
Workaround: Use an external SQL client such as DBvirtualizer or DBeaver and turn on SQL tracing to display the fetch phase warnings in the SQL trace logs.
The Reload in progress bar in the web client remains visible even after reaching 100%
Applies to: 5.0.0
Fixed in: 5.0.1
When you finish reloading your tables in the Virtualize > Tables > List view, the Reload in progress bar remains visible for a significant amount of time even after reaching 100%.
Fetch phase errors that might occur on grouped virtualized tables return a null value
Applies to: 5.0.0, 5.0.1, and 5.0.2
Fixed in: 5.0.3
When you virtualize two similar tables from different data sources and then query the data from these virtualized tables, any fetch phase error that might occur while Data Virtualization is reading a table, is not displayed. Instead, Data Virtualization marks the affected row or rows as empty, and returns a null value.
Data Virtualization only publishes, virtualizes and deletes objects that are displayed on the current page of the web client

Applies to: 5.0.0 and later

When you attempt to publish, virtualize and delete objects after selecting them in bulk using the web client (or when you select 300+ objects to publish or 30+ objects to virtualize or delete), Data Virtualization only completes the action for the objects that are displayed on the current page. This issue means that you have to repeat the action for each page.
Workaround: To workaround this issue, complete the following tasks to create and run scripts to virtualize, publish, and delete objects in bulk. You must be a Data Virtualization admin to complete this section.

  1. Run the following command to log into the Data Virtualization head pod:
    oc rsh c-db2u-dv-db2u-0
  2. Run the following command to switch to the db2inst1 user:
    su - db2inst1
  3. Complete any of the following tasks.
    • Virtualize objects in bulk by completing these steps:
      1. Create a file with the following information, and then save the file as bulk-virtualize.sh.
        Note: Skip this step if you already have this script.
        #!/bin/bash
        db2 "connect to bigsql"
        
        # Substitute values below as needed
        cid=<connection_id>;
        remote_schema=<remote_schema_name>;   # remote schema name
        local_schema=<local_schema_name>;  # local schema name
        
        # Pull out the full list of tables for the schema
        db2 "export to tables.out of del modified by coldel| select TNAME from DVSYS.LISTTABLES where CID='${cid}' and FILTER='null,${remote_schema},null,null'"
        
        # Run virtualize for each table discovered
        for nick in `cat tables.out | sed -e 's/"//g'`
        do 
          db2 "call DVSYS.virtualizetable('${nick}', '' ,'${local_schema}', '${nick}', '' ,'${cid}:${remote_schema}', 'Y', 'N', -1, '' , ?)"
        done
      2. Run the following command to open the script in an editor:
        vi bulk-virtualize.sh
      3. Edit the following values in the script: cid, remote_schema, and local_schema.
      4. Run the following command to add run permissions to the script:
        chmod +x ./bulk-virtualize.sh
      5. Run the following command to run the script:
        ./bulk-virtualize.sh
      6. Run the following command to connect to the database:
        db2 connect to bigsql
      7. Run the following command to grant all permissions:
        db2 "GRANT ALL ON SCHEMA <schema_name> TO USER CPADMIN"
    • Publish objects in bulk by completing these steps:
      1. Create a file with the following information, and then save the file as pub-obj.sh.
        Note: Skip this step if you already have this script.
        # Common settings specific to your cluster, substitute values as needed
        #set -x
        cpd_url=<cluster_url>
        dmc_inst=<dmc_instance>
        ns="cpd-instance"
        cpd_login="cpadmin"
        cpd_passwd=<cpd_password>
        dv_inst=<dv_instance>
        
        # Get bearer token
        token=`curl -k -s -X POST ${cpd_url}/icp4d-api/v1/authorize -H 'Content-Type: application/json' -d '{"username":"'"$cpd_login"'", "password":"'"$cpd_passwd"'"}' | jq -r ".token"`
        
        #echo "Token set to $token"
        
        tbl_name=$1;
        schema=<schema_name>
        cat_name=<catalog_name>
        cat_id=<catalog_id>
        
        # Publish object
        curl -k -X POST "${cpd_url}/icp4data-addons/${dmc_inst}/${ns}/dbapi/v4/dv/catalog/publish" -H "Authorization: Bearer ${token}" -H "x-db-profile: ${dv_inst}" -H "X-Namespace: ${ns}" -H "Content-Type: application/json" -d "{\"assets\":[{\"table\":\"${tbl_name}\",\"schema\":\"${schema}\"}],\"allow_duplicates\":false,\"catalog_id\":\"${cat_id}\",\"catalog_name\":\"${cat_name}\"}"
        
        echo "\n Done"
        
        #set -x
        # Obtain results
        curl -k -X GET -H "Authorization: Bearer ${token}" -H "x-db-profile: ${dv_inst}" -H "X-Namespace: ${ns}" "${cpd_url}/icp4data-addons/${dmc_inst}/${ns}/dbapi/v4/dv/published_assignment_status?schema=${schema}&name=${tbl_name}&type=Publish&limit=3"
        echo "\n Done"
      2. Run the following command to open the script in an editor:
        vi pub-obj.sh
      3. Edit the following values in the pub-obj.sh script: cpd, dmc_inst, ns, cpd_login, cpd_passwd, dv_inst, schema, cat_name and cat_id if needed.
      4. Create a file with the following information, and then save the file as bulk-publish.sh.
        Note: Skip this step if you already have this script.
        #!/bin/bash
        
        # This script calls pub_obj.sh in the loop through the list of tables provided.
        # List of tables is expected to be 'tables.out'. You need to set schema, catalog name, catalog id in pub_obj.sh script
        
        for tasset in `cat tables.out`
        do
        				echo "Publishing table $tasset"
        				./pub-obj.sh $tasset
        done
      5. Run the following command to add run permissions to the script:
        chmod +x ./bulk-publish.sh
      6. Run the following command to run the script:
        ./bulk-publish.sh
    • Delete objects in bulk by completing these steps:
      1. Create a file with the following information, and then save the file as bulk-delete.sh.
        Note: Skip this step if you already have this script.
        # Common settings specific to your cluster, substitute values below as needed
        #set -x
        cpd_url=<cluster_url>
        dmc_inst=<dmc_instance>
        ns="cpd-instance"
        cpd_login="cpadmin"
        cpd_passwd=<cpd_password>
        dv_inst=<dv_instance>
        schema_name=<schema_name>
        virtual_name=<virtual_name>
        
        # Get bearer token
        token=`curl -k -s -X POST ${cpd_url}/icp4d-api/v1/authorize -H 'Content-Type: application/json' -d '{"username":"'"$cpd_login"'", "password":"'"$cpd_passwd"'"}' | jq -r ".token"`
        
        echo "Token set to $token"
        
        curl -k -X POST "${cpd_url}/icp4data-addons/${dmc_inst}/${ns}/dbapi/v4/dv/delete_virtualize_task" -H "Authorization: Bearer ${token}" -H "x-db-profile: ${dv_inst}" -H "X-Namespace: ${ns}" -H "Content-Type: application/json" -d "{\"delete_assets\": [{\"virtual_schema\": \""$schema_name"\",\"virtual_name\": \""$virtual_name"\",\"type\": \"table\"}]}"
      2. Run the following command to open the script in an editor:
        vi bulk-delete.sh
      3. Edit the following values in the script: cpd_url, dmc_inst, ns, cpd_login, cpd_passwd, dv_inst, schema_name, and virtual_name.
      4. Run the following command to add run permissions to the script:
        chmod +x ./bulk-delete.sh
      5. Run the following command to run the script:
        ./bulk-delete.sh 
Recreating virtualized tables after using the DROP NICKNAME SQL statement results in object already exists error

Applies to: 5.0.0 and later

When you use the DROP NICKNAME SQL statement to drop tables that have a defined nickname, attempting to recreate and virtualize the tables in the Review cart and virtualize tables page results in an Object already exists error.
Workaround: Use the delete API to remove all the data that is associated with the virtualized tables, then attempt to virtualize the table again using the same nickname. The delete API does not indicate whether it ran successfully, and you may encounter an error. Regardless, you should attempt to virtualize the tables again. To use the delete API, you will need to get an authorization token. For more information, see Delete virtualized data and Get authorization token.
Data that is virtualized with double-byte coded character sets and special characters might appear truncated

Applies to: 5.0.0 and later

When you virtualize a table with a double-byte coded character set and special characters, such as Swedish characters, data might appear to be cut off.

Workaround: When virtualizing a table, use the Edit columns menu to extend the field lengths for those columns.

You cannot use special characters such as semicolons in schema names when you virtualize a table in MongoDB on Cloud

Applies to: 5.0.0 and later

When you virtualize a table in a MongoDB on Cloud data source that has a semicolon in the schema name, you will not be able to virtualize the table. The MongoDB on Cloud data source removes the semicolon from the schema name and queries of this virtual table fail with error SQL5105N.

Virtualizing large tables might be slow or might fail

Applies to: 5.0.0 and later

When you virtualize multiple large tables from data sources, you might see the following messages in the Virtualize objects dialog box.
Error: CREATE OR REPLACE NICKNAME -- DB2 SQL Error: SQLCODE=-1229, SQLSTATE=40504, SQLERRMC=null, DRIVER=4.29.24
Or
Error: CREATE OR REPLACE NICKNAME -- "<schema>"."<table>" already exists in Server "<server>" (use REPLACE=Y to overwrite)
Or
The table <schema>.<table> already exists. Use REPLACE=Y to overwrite CallStack

This issue occurs even when you do not have tables with the same schema name and table name. The root cause is a timeout and retry sequence that is triggered by HAProxy timeouts. To work around this issue, ensure that your HAProxy settings meet the recommend values. For more information, see Configuring network requirements for Data Virtualization and Changing load balancer timeout settings. If virtualization of large files takes longer than 5 minutes, increase the HAProxy settings and retry the virtualization.

Deleting a virtual table causes a dependent object error

Applies to: 5.0.0 and later

When you create a virtual table from a Db2 source and then create a cache that is based on the virtual table, you might get a dependent object error when you delete the cache, and then attempt to delete the virtual table.

Workaround:

Choose one of the following workarounds:

    1. Click Add Cache.
    2. Enter a valid cache definition and then select Test Queries > Input SQL queries.
    3. Provide one or more valid SQL queries for matching.
    4. Select Add Queries > Test cache.
    5. Exit the cache creation steps to return to the virtual objects that need to be deleted.
    6. Attempt to delete the virtual table again.
  • Delete the temporary objects from the Run SQL editor by running the command:drop table CACHESYS.DVTMPMxxxx
    Note: Only delete objects that begin with DVTMPxxx, and not other objects such as DVxxxxorDVxxx_B.
Virtualizing a table with many columns or long column names might fail

Applies to: 5.0.0 and later

Virtualizing some tables in Data Virtualization might fail with a Virtual table creation failed message. Some tables contain columns with large STRING and CLOB data. Virtualizing the table by reducing the column width or even choosing just one of the columns fails with an error message.

Workaround: See Virtualizing a table with many columns or long column names fails in Data Virtualization.

Preview of tables to be virtualized might not be correct when the table contains a column that is named id

Applies to: 5.0.0 and later

If you try to virtualize a table that contains a column that is named id, preview shows only the column value of the first row in the table. This issue occurs because the keyword id is reserved for the user interface that displays the table.

In addition, if a table contains a column that is named id and the column contains identical values in the id column, these rows display only the values of the first row for each row that has the same value for the id column.

For example, see the following table.
id a2 a3
1 C1-sjis-ok A0_*&
1 abc 123
This table appears as follows because the second row is replaced with the data from the first row.
id a2 a3
1 C1-sjis-ok A0_*&
1 C1-sjis-ok A0_*&

Workaround: Rename the column of the table to a unique name that is not id. If you cannot rename the column name, you must ensure that values in the column are unique.

Installation and upgrade issues

Instance upgrades from 4.8.3 to 4.8.7 fails with the error message Error updating/upgrading database objects

Applies to: 5.0.0, 5.0.1, and 5.0.2

Fixed in: 5.0.3

When you upgrade a Data Virtualization service instance from 4.8.3 to 4.8.7, the upgrade fails with the error message Error updating/upgrading database objects in the head pod c-db2u-dv-db2u-0.

Before you proceed to the workaround, you must check the file /mnt/blumeta0/home/db2inst1/ibm/bigsql/hive-site.xml in the head pod for jdbc:mysql. If you do not see jdbc:mysql, then this specific issue does not apply to you.

Workaround:
Complete the following steps to resolve this issue:
  1. Scale down the c-db2u-dv-db2u StatefulSets by running this command:
    oc scale sts c-db2u-dv-db2u -n ${PROJECT_CPD_INST_OPERANDS} --replicas=0
    The command returns:
    statefulset.apps/c-db2u-dv-db2u scaled
  2. Wait for the pods to be deleted.

    Check whether there are any pods by running this command:

    oc get pods -n ${PROJECT_CPD_INST_OPERANDS} | grep c-db2u-dv-db2u

    If the pods were deleted, the command returns nothing.

  3. Scale up the c-db2u-dv-db2u StatefulSets by running this command:
    oc scale sts c-db2u-dv-db2u -n ${PROJECT_CPD_INST_OPERANDS} --replicas=2
Existing Db2 for z/OS source server connections upgraded from Cloud Pak for Data version 4.7 and 4.8 are displayed as Invalid

Applies to: 5.0.0, 5.0.1, 5.0.2 and 5.0.3

Connections to Db2 for z/OS servers that only accept cleartext username and passwords might show up as Invalid in the Data sources page after the connection is upgraded from Cloud Pak for Data version 4.7 or 4.8 to 5.0.0 or 5.0.3. The issue is caused by the latest Db2 driver update, which now enforces encrypted credentials by default.
Workaround: The primary solution is to configure the source of your data source to accept encrypted credentials. If you're unable to do that, then revert to the old driver or try the alternative workaround to keep the new driver.
  • Revert to the old driver in the upgraded cluster
    Revert to the old driver in the upgraded cluster by completing these steps.
    1. Download the db2jcc4-4.29.24.jar file from Fix Central.
    2. Save this text into a new .txt file named patch.txt. Ensure you preserve the indentation.
            if [[ -e /mnt/blumeta0/db2jcc4-4.29.24.jar ]]; then
              log_info "Replacing db2jcc4-4.33.31.jar with db2jcc4-4.29.24.jar in /opt/ibm/qpendpoint/sysroot/lib/"
              rm -f /opt/ibm/qpendpoint/sysroot/lib/db2jcc4-4.33.31.jar
              cp /mnt/blumeta0/db2jcc4-4.29.24.jar /opt/ibm/qpendpoint/sysroot/lib/
            fi
    3. Log in to the OpenShift® Container Platform (OCP) cluster and copy the db2u-dv-setup.sh from the Data Virtualization head pod by running this command. Replace <DV_NAMESPACE> with the namespace of each Data Virtualization instance.
      Note: You must repeat this step for each Data Virtualization instance running in a tethered namespace.
      oc -n ${<DV_NAMESPACE>} cp c-db2u-dv-db2u-0:/opt/dv/current/db2u-dv-setup.sh db2u-dv-setup.sh
    4. Insert the patch.txt file contents into the script by running this command:
      sed -i '/Restart BigSQL before starting QueryPlex/r patch.txt' db2u-dv-setup.sh
    5. Create the ConfigMap by running this command. Replace <DV_NAMESPACE> with the namespace of each Data Virtualization instance.
      Note: You must repeat this step for each Data Virtualization instance running in a tethered namespace.
      oc create configmap db2u-dv-setup-sh -n ${<DV_NAMESPACE>} --from-file=db2u-dv-setup.sh=db2u-dv-setup.sh -o yaml --dry-run=client | oc apply -f -
    6. Apply the patch by mounting the ConfigMap to the StatefulSet by running this command. Replace <DV_NAMESPACE> with the namespace of each Data Virtualization instance.
      Note: You must repeat this step for each Data Virtualization instance running in a tethered namespace.
      oc set volume statefulset/c-db2u-dv-db2u -n ${<DV_NAMESPACE>} --add --name=db2u-dv-setup-sh --type=configmap --containers=db2u --mount-path=/opt/dv/current/db2u-dv-setup.sh --sub-path=db2u-dv-setup.sh --configmap-name=db2u-dv-setup-sh --default-mode=755 --read-only=false --overwrite

      The head and worker pods restart.

    7. Optional: In the case you need to undo the patch, remove the ConfigMap mount from the StatefulSet by running this command. Replace <DV_NAMESPACE> with the namespace of each Data Virtualization instance.
      Note: You must repeat this step for each Data Virtualization instance running in a tethered namespace.
      oc set volume statefulset/c-db2u-dv-db2u -n ${<DV_NAMESPACE>} --remove --name=db2u-dv-setup-sh
  • Use the new driver
    If you do not want to revert to the old version of the driver, you can retain the new driver and use the connections by completing these steps.
    1. Run the following command to find the endpoint that the connection created:
      select * from dvsys.listrdbc

      The endpoint displays as one entry.

    2. Open the corresponding folder with the file path /mnt/dv_data/qpendpoints/data<x>. Replace <x> with the qpendpoint number.

      For example: If the connection is created at qpendpoint_1, then the folder you should select is /mnt/dv_data/qpendpoints/data1.

    3. Edit gaiandb_config.properties and add the following property. Replace <Connection_ID> with the Db2 connection ID.
      <Connection_ID>_JDBC_PROP=securityMechanism=3
      For example:
      DB210027_JDBC_PROP=securityMechanism=3
    4. Save the file.

      The engine automatically applies these changes when the configuration file updates.

Db2REST pod in CrashLoopBackOff state

Applies to: 5.0.0 and later

You can't connect to the Db2REST server. After enabling REST, the REST pod enters CrashLoopBackOff state and repeatedly restarts and never reaches completion.
You see the following log:
chmod: cannot access '/db2u/tmp/os_envar_configmap': No such file or directory
/db2u/scripts/include/common_functions.sh: line 23: /db2u/tmp/os_envar_configmap: No such file or directory
ls: cannot access '/opt/ibm/dbrest/auth': No such file or directory
......+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*............+..+.+.....+.......+...........+....+..+....+..............+...+...+.......+...+............+...+....................+.+...+..+.........+....+.......................+............+...+....+........+.+.....+.+.....+....+...............+.....+.+.....+......+....+..................+......+..+....+...+..+....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
...........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*..........+.........+.....+......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
Starting up IBM Db2 REST container...
Starting the Db2 REST server...
Starting server with https
/opt/ibm/dbrest/scripts/../bin/db2rest-server: error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file or directory
SERVER STATUS: INACTIVE
#############################################################
###  IBM Db2 REST container was started successfully   ###
#############################################################
* If you used docker/oc/kubectl logs to monitor progress, detach from the
console by pressing Ctrl+C.
* To get a command line prompt, issue the following command from the host:
    [ docker | oc | kubectl -n <NS> ] exec -it <rest container name> bash
#############################################################
Workaround: Complete the following steps to resolve this issue:
  1. Find your affected db2ucluster resource by running the following command:
    oc get db2ucluster
    Refer to the following example output:
    NAME      STATE      AGE
    db2u-dv   Ready      34m
  2. Disable REST for your db2ucluster.
    1. Run the following command to edit your db2ucluster:
      oc edit db2ucluster deployment_ID
    2. Set rest to false in your CR. Refer to the following example:
      spec:
        ...
        addOns:
          rest:
            enabled: false
  3. Save your db2u-release Config Map:
    oc get cm db2u-release -n ${PROJECT_CPD_INST_OPERATORS} -o yaml > db2u-release.yaml
  4. Extract, then replace your REST images.
    1. Extract your Db2 11.5.9.0 REST image:
      REST_IMAGE_1159=$(oc get cm db2u-release -n ${PROJECT_CPD_INST_OPERATORS} -o=jsonpath='{.data.json}' | jq '.databases.db2u["11.5.9.0"].images.rest')
    2. Extract your Db2 12.1.0.0 REST image:
      REST_IMAGE_CN1=$(oc get cm db2u-release -n ${PROJECT_CPD_INST_OPERATORS} -o=jsonpath='{.data.json}' | jq '.databases.db2u["12.1.0.0"].images.rest')
    3. Replace your Db2 12.1.0.0 REST image with your 11.5.9.0 REST image:
      sed -i.bak "s|\"rest\": $REST_IMAGE_CN1|\"rest\": $REST_IMAGE_1159|g" db2u-release.yaml
  5. Apply your changes to your ConfigMap:
    oc apply -f db2u-release.yaml -n ${PROJECT_CPD_INST_OPERATORS}
  6. Delete your ConfigMap YAML file:
    rm db2u-release.yaml
  7. Enable REST for your db2ucluster.
    1. Run the following command to edit your db2ucluster:
      oc edit db2ucluster deployment_ID
    2. Set rest to true in your CR. Refer to the following example:
      spec:
        ...
        addOns:
          rest:
            enabled: true
Cannot provision Data Virtualization instance when the instance short name has underscore characters

Applies to: 5.0.0 to 5.0.2

Fixed in: 5.0.3

You cannot provision a Data Virtualization instance when the instance short name has underscore characters. However, you receive the following error message that indicates that the problems is because the namespace already has a running instance:
"Service instance exist" Namespace <name> already has a Data Virtualization instance running. Select a different namespace
This issue applies only when you provision new instances. It does not impact existing instances.
Workaround: Although the error message is misleading, the workaround is to avoid using underscores in the instance short name.
Resolution: Starting in Data Virtualization on Cloud Pak for Data 5.0.3, the Short name field prohibits the use of underscore characters so that you cannot create an instance short name with an underscore.
Data Virtualization instance upgrade does not complete due to a timing issue

Applies to: 5.0.0 to 5.0.2

Fixed in: 5.0.3

Due to a timing issue, upgrades of your Data Virtualization instances might not complete properly.
Complete the following steps to diagnose this issue:
  1. Run the following command to upgrade the instance:
    cpd-cli service-instance upgrade --all --service-type dv --profile <dv profile>
    Look for the following message that indicates that the upgrade is not complete:
    ###INFO: waiting for dv instance to be Upgraded ...
    # INSTANCE Not ready yet - check if INSTANCE status indicates completion, max retries left 45 times w/ 120s delay...
  2. List the service instances by running the cpd-cli service-instance list command.The service instance status is Failed.

  3. Complete the following steps to determine the reason that the upgrade did not complete:
    1. In the Data Virtualization head pod log, look for the following error message that is displayed repeatedly as the last section of the log:
      + bigsql_getPyhsicalWorkerCount db2inst1 + [[ 0 -gt 0 ]] + log_warn 'Must have at least one worker. Wait for 5 seconds then recheck' + log_entry WARN 'Must have at least one worker. Wait for 5 seconds then recheck'
    2. In the Big SQL upgrade log from the worker pod /var/log/bigsql/cli/bigsql*upgrade*.log, look for the following error:
      2024-04-10.04:22:28.841 @ Line(00242) @ Fun update_xml_property: Failed to update property dfs.encrypt.data.transfer to value true in file /etc/hadoop/conf/hdfs-site.xml 2024-04-10.04:22:29.521 @ Line(00242) @ Fun update_xml_property: Failed to update property dfs.block.access.token.enable to value true in file /etc/hadoop/conf/hdfs-site.xml
      
    3. In the Data Virtualization worker pod /etc/hadoop/conf directory, make sure that the hdfs-site.xml file is missing and that the worker pod log file indicates that the file is not found, as shown in the following example:
      2024-04-10T04:22:29.481599083Z ERROR:root:[Errno 2] No such file or directory: '/etc/hadoop/conf/hdfs-site.xml'
      2024-04-10T04:22:29.481599083Z ERROR:root:Failed to execute 'set-property'
      2024-04-10T04:22:29.578758181Z Failed to update property dfs.block.access.token.enable to value true in file /etc/hadoop/conf/hdfs-site.xml
      
If your instance meets the criteria in 1 and 2 a, b, and c, complete the workaround.
Workaround: Complete the following steps to resolve this issue:
  1. Run the following command to log in to the worker pod and download a copy of the configuration:
    oc -n <dv instance namespace> rsh c-db2u-dv-db2u-1 bash
    su - db2inst1
    db2uctl copy HEADNODE:/etc/hadoop/conf/ /etc/hadoop/conf/
  2. Run the following command to restart the worker pod:
    oc -n <dv instance namespace> delete pod c-db2u-dv-db2u-1
Some data source definition (DSD) assignments do not work properly after upgrading to 5.0.0

Applies to: 5.0.0 and later

When you have Data Virtualization installed on a previous version of Cloud Pak for Data, and then you install IBM Knowledge Catalog on Cloud Pak for Data on 5.0.0, the Data Virtualization data source definition is not automatically associated with the Data Virtualization connections in catalogs.

Workaround: Delete existing endpoints of the Data Virtualization data source definition in Platform Connections and re-add them. For more details, see Editing, deactivating, activating, or deleting data source definitions.

User and group management issues

Privileges and authorities that are granted to user groups are not considered when you create views

Applies to: 5.0.0 and later

Privileges and authorities that are granted to user groups are not considered when you create views. This limitation is a result of a Db2 limitation on groups.

For more information, see Restrictions on the use of group privileges when executing DDL statements or binding packages.

You can also grant public access on your objects for all roles or all Data Virtualization users and then restrict access by using data protection rules that are defined on groups. For more information, see Governing virtual data with data protection rules in Data Virtualization.

See also SQL0727N error when you query view results in Data Virtualization.

Workaround: Instead of granting access to specific groups from the Manage access page, you should grant access to the public by selecting All data virtualization users. Then create a new data protection rule to define Allow or Deny rules based on groups.

See Governing virtual data with data protection rules in Data Virtualization.

Applies to: 5.0.0 and later
You cannot add multiple users or groups in one attempt by using the search bar and selecting users in the Add users and groups window. After you use the search bar to find and add users in the Add users and groups window, previously selected users get unchecked as you search and add more users.

Workaround: Select and add users directly by using the table or by using the browser's search function.

Data governance issues

Objects with the slash (/), backslash (\) or both characters in their object or schema name cannot be governed
Applies to: 5.0.0 and later
Publishing an object or schema to a catalog or project fails if the object or schema name contains the slash (/), backslash (\) or both characters.
Note: Data protection rules do not apply to the objects with a slash (/) or backslash (/) character in their object or schema name.
The failed action displays this error:
The assets request failed: CDICO2034E: The path (/CPADMIN/D/EMO] contains more path elements than defined for the connection [2).

Workaround: Avoid using the slash (/), backslash (\) or both characters when you name a Data Virtualization schema or object.
The List view on the Virtualize page does not consistently display tables or views
Applies to: 5.0.0 and later
The List view does not accurately display the tables or views the user has access to for data sources that have personal credentials enabled. For data sources with personal credentials enabled, this issue occurs because the list view displays tables or views that are relevant to CCDEFINER user.
Workaround: Use the Explore view to view the tables and views that you have access to.
You cannot apply masking rules with business terms when strict term enforcement is enabled
Applies to: 5.0.0 and later
After you enable strict term enforcement and publish assets to the default catalog, the term relationships are preserved, but the assets and column masking details do not appear under the Related content tab of the business term.
Workaround: Un-assign and re-assign the terms manually on published virtual objects.
Profiling results can be viewed in projects for ungoverned objects
Applies to: 5.0.3
To prevent unexpected exposure to value distributions through the profiling results of a view, all users are denied access to profiling results in Data Virtualization views in all catalogs and projects. However, if a view is ungoverned (not published or added to a governed catalog), its profiling results can be viewed by all project collaborators who are authorized to access the catalog asset, even if they are not authorized to query the view.
Workaround: Publish the view to a governed catalog to make it subject to governance.
Partial redaction and length-preserving full redaction of empty strings returns a string of “X” characters
Applies to: 5.0.0
Fixed in: 5.0.3
When you are masking empty strings using partial redaction or length-preserving full redaction data protection rules, Data Virtualization returns a string of "X"s (or another character depending on the rule definition) instead of returning an empty string. For more information on masking, see Masking virtual data in Data Virtualization.
RCAC data protection rules are not applied to a view if both Deny access and masking IBM Knowledge Catalog rules are applied to the table referenced by the view
Applies to: 5.0.0 and later

If a table has both the Deny access rule and external RCAC (column and row filters) IBM Knowledge Catalog data protection rules applied to it, then the RCAC rules are not applied to any views that reference that table, leaving the view unmasked.

Workaround:

Use Data Virtualization Db2 authorizations to deny access and define IBM Knowledge Catalog data protection rules to mask data and filter rows instead of doing both through IBM Knowledge Catalog data protection rules. For more information, see Managing access to virtual objects in Data Virtualization.

Business terms that are automatically assigned as a result of the publish action in Data Virtualization cannot be used to enforce IBM Knowledge Catalog data protection rules
Applies to: 5.0.0 and later

When you virtualize an object in Data Virtualization and publish it at the same time, Data Virtualization assigns the business terms from the original data source to the published data asset. The automatically propagated business terms cannot be used to enforce IBM Knowledge Catalog data protection rules.

Workaround: Unassign and reassign the business terms manually on the published virtual data assets.
You cannot apply business terms when you virtualize files in data sources on the Files tab

Applies to: 5.0.0 and later

When you virtualize files in Data > Data virtualization > Virtualize, the Business terms column is not available for data sources on the Files tab. These data sources do not support business terms.

You cannot see business term assignments or grouped tables in the Explore view

Applies to: 5.0.0 and later

On the Virtualize page, you cannot see business term assignments and you cannot see grouped tables.

Workaround: Switch to the List view to see business term assignments and grouped tables.

Cannot see list of available tables in the strict virtualization mode

Applies to: 5.0.0 and later

In the strict virtualization mode (where you can see tables only if they have at least one column with business term assignment), when you navigate to the Virtualize page, the console appears to be loading the table list for a while without showing any tables. The loading can be much slower compared to the default virtualization mode while the console evaluates the list of eligible tables that can be virtualized, depending on term assignments to data source table and column names.

A virtualized object cannot be used in Cognos Dashboards without credentials and an appropriate role

Applies to: 5.0.0 and later

Using a virtualized object in Data Virtualization displays an error message if you do not enter credentials for the Data Virtualization connection or you do not have the correct role.

If you did not enter a username and password when you created the connection to the Data Virtualization data source, you see the following error: Missing personal credentials. If you are not assigned the Admin or Steward role for this object, you see the following error: Unable to access this asset.

To work around this issue, see A virtualized object cannot be used in Cognos Dashboards without credentials and an appropriate role in Data Virtualization.

Caching issues

The cache dashboard web client displays errors related to Invalid Operation: No data is retrieved for OUT parameter
Applies to: 5.0.0, 5.0.1 and 5.0.2
Fixed in: 5.0.3
You might encounter one or more error messages related to Invalid Operation: No data is retrieved for OUT parameter in the cache dashboard web client. This error occurs if you generate cache recommendations by using a past query workload that contains invalid queries. Invalid queries can include queries on virtualized tables that no longer exist, or use a data source that is no longer accessible and other similar reasons. Also, this error might cascade to other caching APIs, leading to intermittent API failures which can cause the cache pod to restart.
Workaround:
  1. Complete one of the following options:
    • If autocaching is enabled, then temporarily narrow the query workload window by opening the autocaching configuration settings and entering a smaller value in Time period of queries (last x days).

      For example: If you narrow the query workload window to one day, then autocaching only analyzes queries from the past day, reducing the likelihood of encountering invalid queries. This might resolve the error as you run valid queries and older invalid queries are pushed out of the window each day, but if invalid queries exist in the selected time period, then this error might occur again.

    • If autocaching is disabled, then proceed to the next step.
  2. Run the following commands to restart the caching pod:
    Note: You must restart the caching pod every time you encounter this error.
    oc get pods -n <namespace> | grep dv
    oc delete pod <caching-pod> -n <namespace>
  3. Optional: If you want to generate cache recommendations, then temporarily narrow the query workload window by opening the engine recommendation configuration settings and entering a smaller value in Time period of queries to include in recommendation engine (last X days).

    For example: If you narrow the query workload window to one day, then the cache recommendation engine only analyzes queries from the past day, reducing the likelihood of encountering invalid queries. This might resolve the error as you run valid queries and older invalid queries are pushed out of the window each day, but if invalid queries exist in the selected time period, then this error might occur again.

  4. After the invalid queries are pushed out of the window, you can expand the query workload window again as it doesn’t need to remain narrow permanently. However, this error might reoccur if invalid queries exist within the new selected time period.
Online backup of Data Virtualization can affect cache refreshes

Applies to: 5.0.0 and later

If you schedule an online backup in Data Virtualization and if at the same time a cache refresh operation runs, the cache write operation can fail.

Online backup in Data Virtualization works by quiescing database write and update operations. Cache refresh is a write operation that updates Data Virtualization caches with fresh data from remote data sources. Both online backup and cache refresh operations can be scheduled to run periodically. These schedules can conflict and every time the cache tries to refresh, an online backup might be in progress or it might start while the cache refresh is in progress. In these scenarios, cache refresh operations can fail with an error similar to the following message. The cache is active but it contains stale data.

The statement failed because a Big SQL component encountered an error. 
Component receiving the error: "BigSQL IO". Component returning the error: "UNKNOWN". 
Log entry identifier: "[BSL-1-b1bf428f]". Reason: "Cannot add block to /warehouse".. 
SQLCODE=-5105, SQLSTATE=58040, DRIVER=4.27.25

Cache auto-refresh might also fail and you can either manually refresh or wait for the next auto-refresh. However, if the online backup is scheduled such that it conflicts with any cache's refresh schedule, then every auto-refresh of those caches will fail.

Workaround: Reschedule the cache's refresh or the online backup schedule to avoid conflicts. You can also refresh the cache manually at any time.

File issues

You cannot preview long string values in headings in CSV, TSV, or Excel files

Applies to: 5.0.0 and later

When you use the first row as column headings, the string values in that row must not exceed the maximum Db2 identifier length of 128 characters and cannot be duplicated. If your file has string names in the header row with values that are too long or are duplicated, an error message is displayed when you try to preview your file in Data Virtualization.
400: Missing ResultSet:java.sql.SQLSyntaxErrorException: Long column type column or parameter 'COLUMN2' not permitted in declared global temporary tables or procedure definitions.

Column heading names are case-insensitive and converted to uppercase in the API response, which is exposed by the console. Therefore, a column that is named ABC is considered the same as a column that is named abc. However, the columns can be renamed to mixed case when you virtualize your data source.

Workaround: Review the first row in the data file that contains the intended column heading names and make the necessary changes to avoid this limitation.
You might encounter errors when you virtualize large Excel files

Applies to: 5.0.0 and later

You might experience an error when you preview or virtualize large files with Excel format (XLS):
ERROR 400 'Your InputStream was neither an OLE2 stream nor an OOXML stream' was thrown while evaluating an expression.
Workaround: You can follow these steps:
  1. Load the file data into a table on a supported data source. For more information, see Supported data sources.
  2. Virtualize the table. For more information, see Creating a virtualized table.
  3. Optionally, you can change the format of the file; for example, to CSV format before you virtualize it again.
Encoding detection override for files data with Japanese characters

Applies to: 5.0.0 and later

For text files exposed by remote connectors, Cloud Pak for Data automatically detects the encoding scheme of flat data files, such as CSV and TSV files. However, to avoid decoding issues, it is recommended that you set the encoding scheme manually for flat data files. For more information, see Setting encoding scheme.

Resolved issues

The Explore view might not display schemas for Hive data sources due to DVSYS.LISTSCHEMAS API timeout
Resolved in: 5.0.0
When you run the DVSYS.LISTSCHEMAS API, it might take longer to complete for Hive data sources. By default, the timeout for the DVSYS.LISTSCHEMAS API is set to 30 seconds. If the API does not finish within this time, you won't be able to view the list of schemas in the Explore view. To resolve this, you can adjust the default timeout value to allow the API more time to complete.
Workaround:
  1. In Run SQL, run the following view:
    SELECT * FROM DVSYS.LISTRDBC
  2. From the results, find the row that matches the connection you would use to run the LISTSCHEMAS API, and then take a note of the value in the NODE_NAME column.
  3. Replace the values in the following stored procedure and then run it.
    • Replace <ms> with the new timeout value in milliseconds.
    • Replace <node_name> with the value in the NODE_NAME column from the previous step.
    CALL DVSYS.SETCONFIGPROPERTY('SOURCE_CATALOG_API_FOREGROUND_TIMEOUT_MS', <ms>,'<node_name>', ?, ?)
    Example: This stored procedure allows for a 2 minute timeout on the LISTSCHEMAS API when the remote source is connected to the qpendpoint_1:6415 agent.
    call dvsys.SETCONFIGPROPERTY('SOURCE_CATALOG_API_FOREGROUND_TIMEOUT_MS',
          120000,'qpendpoint_1:6415', ?, ?)
Upgrades from Data Virtualization on Cloud Pak for Data fail because EXPLAIN tables are not migrated

Resolved in: 5.0.0

If you are upgrading Data Virtualization on Cloud Pak for Data, and you created EXPLAIN tables in the earlier version, the upgrade fails because the EXPLAIN tables are not migrated.

Access control issues when you preview assets with masked or filtered data

Resolved in: 5.0.0

When you preview, download, or refine Data Virtualization data assets in Watson services other than Data Virtualization, in Cloud Pak for Data (for example, IBM Knowledge Catalog, Watson Studio, and Data Refinery), and data masking or row-level filtering applies, the preview is subject to the data protection rules and catalog or project access control only. The Data Virtualization internal access controls, which are controlled by using Manage access in the Data Virtualization UI, do not apply to these previews.
Upgrades to Data Virtualization on Cloud Pak for Data fail with the message "Must have at least one worker. Wait for 5 seconds"

Resolved in: 5.0.0

If you are upgrading from an earlier version of Data Virtualization on Cloud Pak for Data, the upgrade might fail due to a timing issue. The Data Virtualization head pod c-db2u-dv-db2u-0 log repeatedly prints the "Must have at least one worker. Wait for 5 seconds" message.

Data Virtualization connections to Microsoft Azure Data Lake Storage Gen 2 might experience delays, timeouts, or hangups when using TLSv1.3 with specific versions of JDK

Resolved in: 5.0.0

Microsoft Azure Data Lake Storage supports TLSv1.3 which is not compatible with the particular version of Java (Java 8) that Data Virtualization is configured to operate with. Running queries might result in failures accompanied by an InvalidAbfsRestOperationException error.