Known issues and limitations for Data Refinery

The following issues and limitations apply to Data Refinery.
Known issues
Limitations

Known issues

See also Known issues for Common core services for the issues in other services in Cloud Pak for Data that might affect Data Refinery.

Tokenize GUI operation does not work for Data Refinery on R 4.3 with Spark 3.4 or on Default Spark 3.4 + R 4.3 environment on IBM Power

Applies to: 5.1.2 and later

Data Refinery flow jobs that include the Tokenize GUI operation do not work on Spark and R environment. Users can use the Default Data Refinery XS environment for small datasets.

Data Refinery might fail when upgrading IBM® Software Hub to version 5.1.2

Applies to: 5.1.2 and later

When upgrading Watson Studio or IBM Knowledge Catalog to IBM Software Hub version 5.1.2 from an earlier version, Data Refinery might fail to complete.

Workaround: Apply this patch to ensure that Data Refinery completes the upgrade successfully:
oc patch pvc volumes-datarefinerylibvol-pvc -n <CPD_INSTANCE_NAMESPACE> -p '{"metadata":{"finalizers":null}}' --type=merge
or in Segregation of Duty (SoD) mode:
oc patch pvc volumes-datarefinerylibvol-pvc -n <DATAPLANE_NAMESPACE> -p '{"metadata":{"finalizers":null}}' --type=merge
Error opening a Data Refinery flow

Applies to: 5.1.0 and later

When you open the Data Refinery user interface, you might obtain the error The selected data set wasn't loaded. Error occurred while launching the container (retry attempts exceeded).

Workaround: Delete the existing interactive RuntimeAssembly(RTA) as follows:
oc -n <CPD_INSTANCE_NAMESPACE> delete rta -l type=service,component=shaper
or in Segregation of Duty (SoD) mode:
oc -n <DATAPLANE_NAMESPACE> delete rta -l type=service,component=shaper
Target table loss and job failure when you use the Update option in a Data Refinery flow

Applies to: 5.1.0 and later

Using the Update option for the Write mode target property for relational data sources (for example Db2) replaces the original target table and the Data Refinery job might fail.

Workaround: Use the Merge option as the Write mode and Append as the Table action.

Concatenate operation does not allow you to put the new column next to the original column

Applies to: 5.1.0 and later

When you add a step with the Concatenate operation to your Data Refinery flow, and you select Keep original columns and also select Next to original column for the new column position, the step fails with an error.

You can, however, select Right-most column in the data set.

Google BigQuery connection: TRUNCATE TABLE statement fails in Data Refinery flow jobs

Applies to: 5.1.0 and later

If you run a Data Refinery flow job with data from a Google BigQuery connection and the DDL includes a TRUNCATE TABLE statement, the job fails

Limitations

Error opening a Data Refinery flow with connection with personal credentials

When you open a Data Refinery flow that uses a data asset that is based on a connection with personal credentials, you might see an error.

Workround: To open a Data Refinery flow that has assets which use connections with personal credentials, you must unlock the connection. You can unlock the connection either by editing the connection and entering your personal credentials, or by previewing the asset in the Project where you are prompted to enter your personal credentials. When you have unlocked the connection, you can then open the Data Refinery flow.

Data Refinery does not support the Satellite Connector

You cannot use a Satellite Connector to connect to a database with Data Refinery

Data column headers cannot contain special characters
Data with column headers that contain special characters might cause Data Refinery jobs to fail, and give the error Supplied values don't match positional vars to interpolate.

Workaround: Remove the special characters from the column headers.

Unable to use masked data in visualizations from data assets imported from version 4.8 or earlier

Applies to: 5.1.0 and later

If you import data assets with masked data from version 4.8 or earlier into your project, you cannot use these assets to create visualizations. If you attempt to generate a chart in the Visualizations tab of a Data Refinery data source from an imported asset that has masked data, the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported.

Workaround: To properly mask data with imported data assets in visualization, you must configure your platform with Data Virtualization as a protection solution. For more information, see Data Virtualization as a protective solution in the IBM Cloud Pak® for Data documentation.

Tokenize GUI operation might not work on large data assets

Data Refinery flow jobs that include the Tokenize GUI operation might fail for large data assets.

Data refinery does not support Kerberos authentication
Data refinery does not support connecting to data with Kerberos authentication.