Known issues and limitations for Watson Studio and supplemental services
The following known issues and limitations apply to Watson Studio.
Known issues
-
Known issues for Anaconda Repository for IBM Cloud Pak for Data
-
Known issues for Data Refinery
- Target table loss and job failure when you use the Update option in a Data Refinery flow
- Error opening Data Refinery after shutdown and restart
- Cannot use masked assets in Data Refinery
- Dremio Cloud: Connections fail for Data Refinery flows
- Concatenate operation does not allow you to put the new column next to the original column
- Dremio connection fails with java.lang.NoClassDefFoundError...
- Google BigQuery connection: TRUNCATE TABLE statement fails in Data Refinery flow jobs
- Cannot run a Data Refinery flow job in a Git-based project with a custom Spark environment
- Logs are not available in a Git-based project for Data Refinery flow jobs run with Default Spark & R environments
- Data Refinery flow job fails with an Excel target file when running the job with the Default Data Refinery XS environment
- Data Refinery cannot retrieve a large data set from the Presto connection
-
Known issues for Hadoop integration
-
Known issues for notebooks
-
Known issues for projects
-
Known issues for spaces
Limitations
-
Limitations for assets
-
Limitations for Data Refinery
-
Limitations for Hadoop integration
-
Limitations for jobs
-
Limitations for projects
- Cannot open connected data asset imported with parquet and partitioned parquet files
- Unable to sync deprecated Git projects when all assets have been deleted
- In git-based projects, you cannot preview assets with managed attachments imported from catalogs
- Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
- Import of a project larger than 1 GB in Watson Studio fails
- Export of a large project in Watson Studio can timeout
- Scheduling jobs is unsupported in git-based projects
- Can't include a Cognos dashboard when exporting a project to desktop
- Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
-
Limitations for notebooks
-
Limitations for data visualizations
Known issues for Anaconda Repository for IBM Cloud Pak for Data
Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters
When you create a channel in Anaconda Team Edition, you can't use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _
Known issues for Data Refinery
Target table loss and job failure when you use the Update option in a Data Refinery flow
Applies to: 4.8.0 and later
Using the Update option for the Write mode target property for relational data sources (for example Db2) deletes the original target table and the Data Refinery job might fail.
Workaround: Use the Merge option as the Write mode and Append as the Table action
Error opening Data Refinery after shutdown and restart
Applies to: 4.8.7
After shutdown and restart you might encounter a 504 error when trying to re-open Data Refinery.
Workaround: The cluster administrator can restart the ibm-nginx pods:
oc delete pod -n ${PROJECT_CPD_INST_OPERANDS} -l component=ibm-nginx
Cannot use masked assets in Data Refinery
Applies to: 4.8.4 and later
For more information, see Cannot use masked assets in Data Refinery in the Known issues and limitations for IBM Knowledge Catalog topic.
Dremio Cloud: Connections fail for Data Refinery flows
Applies to: 4.8.3 and 4.8.4
Fixed in: 4.8.5
If you connect to a Dremio Cloud instance and you run a Data Refinery flow or job, the flow will fail.
Concatenate operation does not allow you to put the new column next to the original column
Applies to: 4.8.3 and later
When you add a step with the Concatenate operation to your Data Refinery flow, and you select Keep original columns and also select Next to original column for the new column position, the step will fail with an error.
You can, however, select Right-most column in the data set.
Dremio connection fails with java.lang.NoClassDefFoundError...
Applies to: 4.8.3
Fixed in: 4.8.4
When you run a Data Refinery flow job with data from a Dremio connection, the job will fail with the error message: java.lang.NoClassDefFoundError: org.apache.arrow.flight.sql.impl.FlightSql$SqlInfo
Google BigQuery connection: TRUNCATE TABLE statement fails in Data Refinery flow jobs
Applies to: 4.8.3 and later
If you run a Data Refinery flow job with data from a Google BigQuery connection and the DDL includes a TRUNCATE TABLE statement, the job will fail.
Cannot run a Data Refinery flow job in a Git-based project with a custom Spark environment
Applies to: 4.8.3
Fixed in: 4.8.4
If you run a Data Refinery flow job in a Git-based project and you use a custom Spark environment, the job will fail.
Workaround: Use one of the default environments: Default Spark 3.4 & R 4.2 or Default Data Refinery XS.
Logs are not available in a Git-based project for Data Refinery flow jobs run with Default Spark & R environments
Applies to: 4.8.3
Fixed in: 4.8.4
In a Git-based project, if you run a Data Refinery flow job with the Default Spark 3.3 & R 4.2 or Default Spark 3.4 & R 4.2 environment, the log file will be unavailable to download.
Workaround: Use the Default Data Refinery XS environment.
Data Refinery flow job fails with an Excel target file when running the job with the Default Data Refinery XS environment
Applies to: 4.8.3
Fixed in: 4.8.4
You might receive an error when you run a Data Refinery flow job with a target Excel file and you use the Default Data Refinery XS environment.
Workaround: Change the target to a different file type or run the job with a Spark & R environment.
Data Refinery cannot retrieve a large data set from the Presto connection
Applies to: 4.8.0, 4.8.1, 4.8.2, 4.8.3, and 4.8.4
Fixed in: 4.8.5
In Data Refinery, when you attempt to retrieve a large data set (for example, 25 MB or greater) with the Presto connection, you might encounter this time-out error message:
Cannot retrieve the data from Flight service
Workaround: Use a different connection to retrieve the data from the data source. For example, if the Presto connector has data from an IBM Db2 database that is connected to the Presto server, use the IBM Db2 connection to retrieve the Db2 data.
Known issues for Hadoop integration
Error while executing one of the methods from hi_core_utils
in notebooks
Applies to: 4.8.5
After creating a Livy Spark session using the pushed imageId in notebooks, you may see the following error while executing one of the methods from hi_core_utils
:
An error was encountered:
<class 'RuntimeError'> —- OpenSSL 3.0's legacy provider failed to load. This is a fatal error by default, but cryptography supports running without legacy algorithms by setting the environment variable CRYPTOGRAPHY_OPENSSL_NO_LEGACY. If you did not expect this error, you have likely made a mistake with your OpenSSL configuration.
Workaround: Insert the following code in the same cell to set the environment variable CRYPTOGRAPHY_OPENSSL_NO_LEGACY
in notebooks:
import os
os.environ["CRYPTOGRAPHY_OPENSSL_NO_LEGACY"] = "1"
Support for Spark versions
Applies to: 4.8.0 and later
-
Apache Spark 3.1 for Power is not supported.
-
To run Jupyter Enterprise Gateway (JEG) on Cloud Pak for Data 4.8.0 and later, you must run the following commands as the first cell after the kernel starts:
from pyspark.sql import SparkSession from pyspark import SparkContext spark = SparkSession.builder.getOrCreate() sc = SparkContext.getOrCreate()
Failure to connect to Impala via Execution Engine for Hadoop
Applies to: 4.8.0 and later
On CDP version 7.1.8, the JDBC client fails and you receive the following SQL error message when you try to connect to Impala via Execution Engine for Hadoop:
SQL error: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Socket is closed by peer. ExecuteStatement for query "SHOW DATABASES".
Workaround: Set the property -idle_client_poll_period_s=0
to 0 and restart Impala:
- Go to Cloudera Manager.
- From the home page, click the Status tab.
- Select Impala.
- Click the Configuration tab.
- In the Impala Command Line Argument Advanced Configuration Snippet (impalad_cmd_args_safety_valve), add the property:
-idle_client_poll_period_s=0
. - Restart Impala.
Known issues for notebooks
Failure to export a notebook to HTML in the Jupyter Notebook editor
When you are working with a Jupyter Notebook created in a tool other than Watson Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.
Workaround
-
In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.
-
Remove the following metadata:
"widgets": { "state": {}, "version": "1.1.2" }
-
Click Edit.
-
Save the notebook.
Error when trying to access data in an Oracle database
If you try to access data in an Oracle database, you might get a DatabaseError
if the schema or table name contains special characters such as the period .
character. The reason for this is that Oracle uses periods
as separators between schemas, tables, and columns. If this issue occurs, consider removing any periods from the table name or schema of your database or adapt your code to surround the table name or schema identifier with double quotes
i.e. my_schema."table.with.dots"
.
Known issues for projects
Project export fails
Applies to: 4.8.3 and later
If you are no longer able to use the export project functionality, you must restart the rabbitmq
pod with the following steps:
- To delete asset-files-api:
oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep asset-files-api | awk '{print $1}' | xargs oc delete po -n ${PROJECT_CPD_INST_OPERANDS}
- To delete RMQ:
oc delete po -n ${PROJECT_CPD_INST_OPERANDS} rabbitmq-ha-{0,1,2}
Connections that require uploaded JAR files might not work in a Git-based project after upgrade
Applies to: 4.8.0 and later
These connections require that you upload one or more JAR files:
- IBM Db2 for i
- IBM Db2 for z/OS (Unless you have an IBM Db2 Connect Unlimited Edition license certificate file on the Db2 for z/OS server)
- Generic JDBC
- SAP Bulk Extract
- SAP Delta Extract
- SAP HANA
- SAP IDoc
If you upgrade from a version of Cloud Pak for Data that is earlier than 4.7.0, the connections that use associated JAR files might not work.
Workaround: Edit the connection to use the JAR files from their new locations.
Known issues for deployment spaces
Error managing a deployment space when Watson Machine Learning is not installed
Applies to: 4.8.0 and later
Fixed in: 4.8.5
If you navigate to the Manage tab of a deployment space and the Watson Machine Learning serice is not installed, you see an error indicating that deployments failed to load:
Error loading deployments. Unexpected response code: 404
You can disregard the error and continue to use the deployment space.
Limitations for assets
Security for file uploads
Applies to: 4.8.0 and later
Files you upload through the Watson Studio or Watson Machine Learning UI are not validated or scanned for potentially malicious content. It is strongly recommended that you run security software, such as an anti-virus application on all files prior to uploading to ensure the security of your content.
Can't load CSV files to projects that are larger that 20 GB
You can't load a CSV file to a project in Cloud Pak for Data that is larger than 20 GB.
Limitations for previews of assets
You can't see previews of these types of assets:
- Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
- Connected data assets for image files in projects.
- Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
- Connected data assets for PDF files in projects.
Limitations for Data Refinery
Data column headers cannot contain special characters
Data with column headers that contain special characters might cause Data Refinery jobs to fail, and give the error Supplied values don't match positional vars to interpolate
.
Workaround: Remove the special characters from the column headers.
Data protection rules do not always mask data in Data Refinery visualizations
If you set up data protection rules for an asset, data protection rules are not always enforced. As a result, in some circumstances the data can be seen in Data Refinery Visualizations charts.
Tokenize GUI operation might not work on large data assets
Data Refinery flow jobs that include the Tokenize GUI operation might fail for large data assets.
Limitations for Hadoop integration
The Cloud Pak for Data cluster and the Hadoop Cluster have to be co-located within the same network
For the connection between Cloud Pak for Data and the Hadoop cluster to work, they must be located within the same network setup.
The Livy service does not restart when a cluster is rebooted
The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.
Workaround: Restart the Livy service.
Limitations for jobs
Jobs scheduled on repeat also run at the :00 minute
Jobs scheduled on repeat run at the scheduled time and again at the start of the next minute (:00).
Job run has wrong environment variable values if special characters are used
Environment variables defined in the job configuration are not passed correctly to the job runs if the variable values contain special characters. This might lead to job run failures, or the incorrect behavior of job runs. To resolve the problem, see Job run has wrong environment variable values if special characters are used.
Job runs fail after environments are deleted or Cloud Pak for Data has been upgraded
Job runs in deployment spaces or projects fail if the job is using an environment that has been deleted or is no longer supported after a Cloud Pak for Data version upgrade. To get the job running again, edit the job to point to an alternative environment.
To prevent job runs from failing due to an upgrade, create custom environments based on custom runtime images. Jobs associated with these environments will still run after an upgrade. For details, see Building custom images.
Excluding days when scheduling a job causes unexpected results
If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.
This issue only exists if you exclude days of a week when you schedule to run a job.
Limitations for projects
Cannot open connected data asset imported with parquet and partitioned parquet files
After you import a connected data asset with parquet
and partitioned_parquet
assets selected, the resulting partitioned_parquet
asset is corrupt and can't be opened from the project's Assets page.
Applies to: 4.8.0
Workaround: Importing bulk selections of assets including partitioned assets is not supported. You need to manually select and import assets one by one.
Unable to sync deprecated Git projects when all assets have been deleted
If you delete all assets from a deprecated Git project, the project can no longer sync with the Git repository.
Workaround: Retain at least one asset in the deprecated Git project.
In git-based projects, you cannot preview assets with managed attachments that are imported from catalogs
In git-based projects, you receive an error when you attempt to preview assets with managed attachments that are imported from catalogs. Previewing these assets in git-based projects is not supported.
Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
You shouldn't use the Git repository from a project with deprecated Git integration in a project with default Git integration as this can result in an error. For example, in Bitbucket, you will see an error stating that the repository contains content from a deprecated Git project although the selected branch contains default Git project content.
In a project with default Git integration, you can either use a new clean Git repository or link to one that was used in a project with default Git integration.
Import of a project larger than 1 GB in Watson Studio fails
If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.
Export of a large project in Watson Studio fails with a time-out
If you are trying to export a project with a large number of assets (for example, more than 7000), the export process can time-out and fail. In that case, although you could export assets in subsets, the recommended solution is to export using the APIs available from the CPDCTL command line interface tool.
Scheduling jobs is unsupported in Git-based projects
In Git-based projects, you must run all jobs manually. Job scheduling is not supported.
Can't include a Cognos dashboard when exporting a project to desktop
Currently, you cannot select a Cognos dashboard when you export a project to desktop.
Workaround:
Although you cannot add a dashboard to your project export, you can move a dashboard from one project to the another.
To move a dashboard to another project:
- Download the dashboard JSON file from the original project.
- Export the original project to desktop by clicking from the project toolbar.
- Create a new project by importing the project ZIP with the required data sources.
- Create a new dashboard by clicking the From file tab and adding the JSON file you downloaded from the original project.
- A dialog box will pop up asking you if you want to re-link each of your data sources. Click the re-link button and select the asset in the new project that corresponds to the data source.
Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
If your project is associated with a Git repository that was used in a project in another cluster and contains connections that require a JDBC driver, the connections will not work in your project. If you upload the required JDBC JAR file, you will see an error stating that the JDBC driver could not be initialized.
This error is caused by the JDBC JAR file that is added to the connection as a presigned URI. This URI is not valid in a project in another cluster. The JAR file can no longer be located even if it exists in the cluster, and the connection will not work.
Workaround
To use any of these connections, you need to create new connections in the project. The following connections require a JDBC driver and are affected by this error situation:
- Db2 for i
- Db2 for z/OS
- Generic JDBC
- Hive via Execution Engine for Apache Hadoop
- Impala via Execution Engine for Apache Hadoop
- SAP HANA
- Exasol
Limitations for notebooks
Unable to open terminal windows in JupyterLab within a Spark environment
All Terminal options are disabled when JupyterLab is used within a Spark environment.
Limitations for data visualizations
Masked data is not supported in data visualizations
Masked data is not supported in data visualizations. If you attempt to work with masked data while generating a chart in the Visualizations tab of a data asset in a project the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported
.
Parent topic: Limitations and known issues in IBM Cloud Pak for Data