Known issues and limitations for Data Refinery

The following issues and limitations apply to Data Refinery.

Known issues

Target table loss and job failure when you use the Update option in a Data Refinery flow
Google BigQuery connection: TRUNCATE TABLE statement fails in Data Refinery flow jobs

Limitations

Folder location doesn't automatically update when changing target asset in Data Refinery flow settings
Changes to interactive code templates
Special characters in column header names causes Data Refinery job errors
Data column headers cannot contain special characters
Data Refinery flows with connections that use vault secrets only work if the user has permission to access the vault secrets.
Tokenize GUI operation does not work for Data Refinery on Spark and R environments.
Tokenize GUI operation might not work on large data assets
Error opening a Data Refinery flow
Error opening a Data Refinery flow with connection with personal credentials
Unable to use masked data in visualizations from data assets imported from 4.8 or earlier
Data Refinery cannot connect with the Satellite Connector
Data Refinery cannot use Kerberos authentication to connect to data

Known issues

See also Known issues for Common core services for the issues in other services in Cloud Pak for Data that might affect Data Refinery.

Target table loss and job failure when you use the Update option in a Data Refinery flow

Applies to: 5.4.0 and later

Using the Update option for the Write mode target property for relational data sources (for example Db2) replaces the original target table and the Data Refinery job might fail.

Workaround: Use the Merge option as the Write mode and Append as the Table action.

Google BigQuery connection: TRUNCATE TABLE statement fails in Data Refinery flow jobs

Applies to: 5.4.0 and later

If you run a Data Refinery flow job with data from a Google BigQuery connection and the DDL includes a TRUNCATE TABLE statement, the job fails

Limitations

Folder location doesn't automatically update when changing target asset in Data Refinery flow settings

Applies to: 5.4.0 and later

When you change the target asset in the Data Refinery Flow settings, the folder location doesn't automatically update in the View Info panel if the new target asset is located in a different folder.

Workaround: Manually update the folder location in the Flow settings to match the location of the new target asset.

Changes to interactive code templates

You can no longer use the following interactive code templates in Data Refinery:

mutate_if
select_if
summarize_if

If you used these operations in previous deployments, they will no longer work and might give errors.

Special characters in column header names causes Data Refinery job errors

You can no longer use periods, brackets or other special characters in data column header names. Data with column header names that contain special characters might cause Data Refinery jobs to fail.

Workaround: Remove special characters from header names. Alternatively for .xls files, if you do not want to change the header row, you can edit the format to read the file and skip the header row. You can then use the rename operation to change the column name to your desired name.

Data column headers cannot contain special characters

Data with column headers that contain special characters might cause Data Refinery jobs to fail, and give the error Supplied values don't match positional vars to interpolate.

Workaround: Remove the special characters from the column headers.

Data Refinery flows with connections that use vault secrets only work if the user has permission to access the vault secrets.

If the source or target data of a Data Refinery flow uses connections that reference vault secrets, the user running the Data Refinery flow job must have permission to access the vault secrets. Otherwise, you obtain the error authorization_failed no access to secret.

Tokenize GUI operation does not work for Data Refinery on Spark and R environments.

Data Refinery flow jobs that include the Tokenize GUI operation do not work on the following Spark and R environments.

R 4.3 with Spark 3.5
Default Spark 3.5 + R 4.3

Users can use the Default Data Refinery XS environment for small datasets.

Tokenize GUI operation might not work on large data assets

Data Refinery flow jobs that include the Tokenize GUI operation might result in sparklyr worker rscript failure.

Workaround: Increase the CPU and memory resources for the environment that is used for your job by creating a custom Spark 3.5 + R 4.3 environment.

Error opening a Data Refinery flow

When you open the Data Refinery user interface, you might obtain the error The selected data set wasn't loaded. Error occurred while launching the container (retry attempts exceeded).

Workaround: Delete the existing interactive RuntimeAssembly(RTA) as follows:

oc -n <CPD_INSTANCE_NAMESPACE> delete rta -l type=service,component=shaper

or in Segregation of Duty (SoD) mode:

oc -n <DATAPLANE_NAMESPACE> delete rta -l type=service,component=shaper

Error opening a Data Refinery flow with connection with personal credentials

When you open a Data Refinery flow that uses a data asset that is based on a connection with personal credentials, you might see an error.

Workround: To open a Data Refinery flow that has assets that use connections with personal credentials, you must unlock the connection. You can unlock the connection either by editing the connection and entering your personal credentials, or by previewing the asset in the Project where you are prompted to enter your personal credentials. When you have unlocked the connection, you can then open the Data Refinery flow.

Unable to use masked data in visualizations from data assets that are imported from version 4.8 or earlier

Applies to: 5.4.0 and later

If you import data assets with masked data from version 4.8 or earlier into your project, you cannot use these assets to create visualizations. If you attempt to generate a chart in the Visualizations tab of a Data Refinery data source from an imported asset that contains masked data, the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported.

Workaround: To properly mask data with imported data assets in visualization, you must configure your platform with Data Virtualization as a protection solution. For more information, see Data Virtualization as a protective solution in the IBM Cloud Pak for Data documentation.

Data Refinery cannot connect with the Satellite Connector

You cannot use a Satellite Connector to connect to a database with Data Refinery

Data Refinery cannot use Kerberos authentication to connect to data

You cannot connect to data with Kerberos authentication in Data refinery.