Known issues and limitations for Watson Studio and supplemental services
The following known issues and limitations apply to Watson Studio.
Known issues
-
Known issues for Anaconda Repository for IBM Cloud Pak for Data
-
Known issues for Hadoop integration
-
Known issues for notebooks
-
Known issues for projects
-
Known issues for Visualizations
Limitations
-
Limitations for assets
-
Limitations for Hadoop integration
-
Limitations for jobs
-
Limitations for projects
- Assets from Git integrated projects don’t show up in intelligent search
- Cannot run multiple RStudio sessions in one project at the same time
- Unable to sync deprecated Git projects when all assets have been deleted
- In git-based projects, you cannot preview assets with managed attachments imported from catalogs
- Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
- Import of a project larger than 1 GB in Watson Studio fails
- Export of a large project in Watson Studio can timeout
- Cannot export Tuning Studio experiments or prompt sessions
- Can't include a Cognos dashboard when exporting a project to desktop
- Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
- The maximum number of displayed projects is 10000
-
Limitations for notebooks
-
Limitations for visualizations
Known issues for Anaconda Repository for IBM Cloud Pak for Data
Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters
When you create a channel in Anaconda Team Edition, you can't use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _
Known issues for Hadoop integration
Cloudera Data Platform (CDP) 7.3.1 may return a 500 error for the DSXHI REST status check
Applies to: 5.1.3
When running ./status.py command on CDP 7.3.1, it may return a 500 error for the DSXHI REST status check. This error does not affect the functionality of Execution Engine for Apache Hadoop.
Error while executing one of the methods from hi_core_utils in notebooks
Applies to: 5.2.0 and later
After creating a Livy Spark session using the pushed imageId in notebooks, you may see the following error while executing one of the methods from hi_core_utils:
An error was encountered:
<class 'RuntimeError'> —- OpenSSL 3.0's legacy provider failed to load. This is a fatal error by default, but cryptography supports running without legacy algorithms by setting the environment variable CRYPTOGRAPHY_OPENSSL_NO_LEGACY. If you did not expect this error, you have likely made a mistake with your OpenSSL configuration.
Workaround: Insert the following code in the same cell to set the environment variable CRYPTOGRAPHY_OPENSSL_NO_LEGACY in notebooks:
import os
os.environ["CRYPTOGRAPHY_OPENSSL_NO_LEGACY"] = "1"
Support for Spark versions
Applies to: 5.1.0 and later
-
Apache Spark 3.1 for Power is not supported.
-
To run Jupyter Enterprise Gateway (JEG) on Cloud Pak for Data 5.1.0, you must run the following commands as the first cell after the kernel starts:
from pyspark.sql import SparkSession from pyspark import SparkContext spark = SparkSession.builder.getOrCreate() sc = SparkContext.getOrCreate()
Known issues for notebooks
Failure to export a notebook to HTML in the Jupyter Notebook editor
Applies to: 5.2.0 and later
When you are working with a Jupyter notebook created in a tool other than Watson Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.
Workaround
-
In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.
-
Remove the following metadata:
"widgets": { "state": {}, "version": "1.1.2" } -
Click Edit.
-
Save the notebook.
Error when trying to access data in an Oracle database
If you try to access data in an Oracle database, you might get a DatabaseError if the schema or table name contains special characters such as the period . character. The reason for this is that Oracle uses periods
as separators between schemas, tables, and columns. If this issue occurs, consider removing any periods from the table name or schema of your database or adapt your code to surround the table name or schema identifier with double quotes
i.e. my_schema."table.with.dots".
Notebooks that run on the 25.1 default runtime environment on a Power cluster and use the numpy or itc_utils library stop responding
Applies to: 5.2.0 Fixed in: 5.2.1
Data Science Notebooks that run on the 25.1 default runtime environment on a Power cluster and use the numpy or itc_utils library might stop responding.
Workaround
Option 1:
- Create a custom environment that is based on a standard 25.1 environment template and then increase its RAM reserve to at least 3GB.
- Run the notebook by using the new environment.
Option 2:
Insert a new cell in your notebook with this code and then run it before importing other libraries:
import os
os.environ["OMP_NUM_THREADS"] = "1"
This code turns off openmp multiprocessing.
Known issues for projects
Deleting some assets from GIT repo does not clean up all associated files.
Applies to: 5.2.0 and later
For example, deleting an AutoAI experiment might leave files for pipelines in the repository. If you identify residual files, you can delete them manually from the Git repository.
Publishing asset to a catalog results in an unclear error
Applies to: 5.2.0
Fixed in: 5.2.1
When publishing an asset from a project to a catalog, this error is generated:
Something went wrong publishing the asset.
To understand the cause of the error, review the cluster logs for detailed error information.
Error when adding data from a storage volume connection in an imported project using AutoAI
Applies to: 5.2.0 and later
When you are using the AutoAI model builder in an imported project, adding a data asset from the storage volume connection might result in an unexpected error. During project import, a storage volume connection asset is automatically created. The issue occurs if the imported project connects to a storage volume that already exists in the environment and has the same name as the storage connection it is connected to.
Workaround:
After importing the connection asset into the project:
- Delete the storage volume connection that was created automatically during project import.
- Create a new storage volume connection.
Creating a project activity report generates an incomplete file
Applies to: 5.2.0 and later
When creating an activity report for a project with logging enabled, the generated file omits expected events.
Workaround:
A cluster administrator must restart the event logger API pods. After the restart, create the report again. The new report will include new events, but previously missed events cannot be recovered.
Known issues for Visualizations
The column-level profile information for a connected data asset with a column of type DATE, does not show rows
Applies to: 5.2.0 and later
In the column-level profile information for a connected data asset with a column of type DATE, no rows are displayed when you click show rows in the tabs Data Classes, Format or Types.
Limitations for assets
Security for file uploads
Applies to: 5.2.0 and later
Files you upload through the Watson Studio or Watson Machine Learning UI are not validated or scanned for potentially malicious content. It is strongly recommended that you run security software, such as an anti-virus application on all files prior to uploading to ensure the security of your content.
Can't load CSV files to projects that are larger that 20 GB
You can't load a CSV file to a project in Cloud Pak for Data that is larger than 20 GB.
Limitations for previews of assets
You can't see previews of these types of assets:
- Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
- Connected data assets for image files in projects.
- Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
- Connected data assets for PDF files in projects.
Limitations for Hadoop integration
The Cloud Pak for Data cluster and the Hadoop Cluster have to be co-located within the same network
For the connection between Cloud Pak for Data and the Hadoop cluster to work, they must be located within the same network setup.
The Livy service does not restart when a cluster is rebooted
The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.
Workaround: Restart the Livy service.
Limitations for jobs
Jobs scheduled on repeat also run at the :00 minute
Jobs scheduled on repeat run at the scheduled time and again at the start of the next minute (:00).
Job run has wrong environment variable values if special characters are used
Environment variables defined in the job configuration are not passed correctly to the job runs if the variable values contain special characters. This might lead to job run failures, or the incorrect behavior of job runs. To resolve the problem, see Job run has wrong environment variable values if special characters are used.
Job runs fail when environments are deleted after a Cloud Pak for Data version upgrade
Job runs in deployment spaces or projects fail if the job is using an environment that is no longer secure and has been deleted after a Cloud Pak for Data version upgrade.
Workaround: To prevent job runs from failing due to an upgrade:
- Check which environments will be removed before the upgrade.
- Edit the job to point to an alternative environment that is not being removed.
- If your job can't work with any of the alternative environments, create a custom environment based on the existing environment and point the job to that custom environment. For details, see Customizing environments.
Excluding days when scheduling a job causes unexpected results
If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.
This issue only exists if you exclude days of a week when you schedule to run a job.
Limitations for projects
Assets from Git integrated projects don’t show up in intelligent search
Applies to: 5.2.0 and later
When using the intelligent search bar to search across workspaces, data assets from projects that use Git integration don’t show up in the results.
Workaround:
Go to the Git integrated project and look for the asset under Assets.
Cannot run multiple RStudio sessions in one project at the same time
Applies to: 5.2.0 and later
In a project, only one RStudio session can be active at any given moment. If you want to open multiple sessions at the same time, you must set up a separate project.
Unable to sync deprecated Git projects when all assets have been deleted
If you delete all assets from a deprecated Git project, the project can no longer sync with the Git repository.
Workaround: Retain at least one asset in the deprecated Git project.
In git-based projects, you cannot preview assets with managed attachments that are imported from catalogs
In git-based projects, you receive an error when you attempt to preview assets with managed attachments that are imported from catalogs. Previewing these assets in git-based projects is not supported.
Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
You shouldn't use the Git repository from a project with deprecated Git integration in a project with default Git integration as this can result in an error. For example, in Bitbucket, you will see an error stating that the repository contains content from a deprecated Git project although the selected branch contains default Git project content.
In a project with default Git integration, you can either use a new clean Git repository or link to one that was used in a project with default Git integration.
Import of a project larger than 1 GB in Watson Studio fails
If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.
Export of a large project in Watson Studio fails with a time-out
If you are trying to export a project with a large number of assets (for example, more than 7000), the export process can time-out and fail. In that case, although you could export assets in subsets, the recommended solution is to export using the CPDCTL command-line interface.
Cannot export Tuning Studio experiments or prompt sessions
If you are running Tuning Studio experiments in a project, you cannot export them or the prompt sessions. They will not show up in the candidate list when exporting a project.
Can't include a Cognos dashboard when exporting a project to desktop
Currently, you cannot select a Cognos dashboard when you export a project to desktop.
Workaround:
Although you cannot add a dashboard to your project export, you can move a dashboard from one project to the another.
To move a dashboard to another project:
- Download the dashboard JSON file from the original project.

- Export the original project to desktop by clicking the Export to desktop icon
from the project toolbar. - Create a new project by importing the project ZIP with the required data sources.
- Create a new dashboard by clicking the From file tab and adding the JSON file you downloaded from the original project.

- A dialog box will pop up asking you if you want to re-link each of your data sources. Click the re-link button and select the asset in the new project that corresponds to the data source.
Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
If your project is associated with a Git repository that was used in a project in another cluster and contains connections that require a JDBC driver, the connections will not work in your project. If you upload the required JDBC JAR file, you will see an error stating that the JDBC driver could not be initialized.
This error is caused by the JDBC JAR file that is added to the connection as a presigned URI. This URI is not valid in a project in another cluster. The JAR file can no longer be located even if it exists in the cluster, and the connection will not work.
Workaround
To use any of these connections, you need to create new connections in the project. The following connections require a JDBC driver and are affected by this error situation:
- Db2 for i
- Db2 for z/OS
- Generic JDBC
- Hive via Execution Engine for Apache Hadoop
- Impala via Execution Engine for Apache Hadoop
- SAP HANA
- Exasol
The maximum number of displayed projects is 10000
Applies to: 5.2.0 and later
For performance-related reasons, the maximum number of projects shown in the All active projects list is 10000.
Limitations for notebooks
Unable to open terminal windows in JupyterLab within a Spark environment
Applies to: 5.2.0 and later
All Terminal options are disabled when JupyterLab is used within a Spark environment.
Limitations for visualizations
Unable to use masked data in visualizations from data assets imported from version 4.8 or earlier
Applies to: 5.2.0 and later
If you import data assets with masked data from version 4.8 or earlier into your project, you cannot use these assets to create visualizations.
If you attempt to generate a chart in the Visualization tab of a data asset from an imported asset that has masked data, the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported.
Workaround: To properly mask data with imported data assets in visualization, you must configure your platform with Data Virtualization as a protection solution. For more information, see the Data Virtualization as a protection solution section of the Protection solutions for data source definitions topic.