Known issues and limitations for Watson Studio and supplemental services

The following known issues and limitations apply to Watson Studio.

Known issues

Known issues for Anaconda Repository for IBM Cloud Pak for Data
- Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters
Known issues for Data Refinery
- Data protection rules do not always mask data in Data Refinery visualizations
- Personal credentials are not supported for connected data assets in Data Refinery

Known issues for Hadoop integration
- Support for Spark versions
- Failure to connect to Impala via Execution Engine for Hadoop
Known issues for jobs
- Excluding days when scheduling a job causes unexpected results
- Can't delete notebook job stuck in starting or running state
Known issues for notebooks
- Failure to export a notebook to HTML in the Jupyter Notebook editor
- Passing the value "None" as the schema argument in the "do_put" PyArrow library method stops the kernel
Known issues for projects
- Connections that require uploaded JAR files might not work in a Git-based project after upgrade

Limitations

Limitations for assets
Limitations for Data Refinery
- Tokenize GUI operation might not work on large data assets
Limitations for Hadoop integration
- The Livy service does not restart when a cluster is rebooted
Limitations for jobs
Limitations for projects
Limitations for data visualizations
- Masked data is not supported in data visualizations

Known issues

Known issues for Anaconda Repository for IBM Cloud Pak for Data

Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters

When you create a channel in Anaconda Team Edition, you can't use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _

Known issues for Data Refinery

Data protection rules do not always mask data in Data Refinery visualizations

Applies to: 4.7.0 and later

If you set up data protection rules for an asset, data protection rules are not always enforced. As a result, in some circumstances the data can be seen in Data Refinery Visualizations charts.

Personal credentials are not supported for connected data assets in Data Refinery

Applies to: 4.7.0 and later

If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.

Workaround:

Go to the project page, and click the link for the connected data asset to open the preview.
Enter credentials.
Open Data Refinery and use the authenticated connected data asset for a source or target.

Known issues for Hadoop integration

Support for Spark versions

Applies to: 4.7.0

Apache Spark 3.1 for Power is not supported.
To run Jupyter Enterprise Gateway (JEG) on Cloud Pak for Data 4.7.0, 4.7.1, or 4.7.2, you must run the following commands as the first cell after the kernel starts:
```
from pyspark.sql import SparkSession
from pyspark import SparkContext
spark = SparkSession.builder.getOrCreate()
sc = SparkContext.getOrCreate()
```

Failure to connect to Impala via Execution Engine for Hadoop

On CDP version 7.1.8, the JDBC client fails and you receive the following SQL error message when you try to connect to Impala via Execution Engine for Hadoop:

SQL error: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Socket is closed by peer. ExecuteStatement for query "SHOW DATABASES".

Workaround: Set the property -idle_client_poll_period_s=0 to 0 and restart Impala:

Go to Cloudera Manager.
From the home page, click the Status tab.
Select Impala.
Click the Configuration tab.
In the Impala Command Line Argument Advanced Configuration Snippet (impalad_cmd_args_safety_valve), add the property: -idle_client_poll_period_s=0.
Restart Impala.

Known issues for jobs

Excluding days when scheduling a job causes unexpected results

If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.

This issue only exists if you exclude days of a week when you schedule to run a job.

Can't delete notebook job stuck in starting or running state

If a notebook job is stuck in starting or running state and won't stop, although you tried to cancel the job and stopped the active environment runtime, you can try deleting the job by removing the job-run asset manually using the API.

Retrieve a bearer token from the user management service using an API call:

curl -k -X POST https://PLATFORM_CLUSTER_URL/icp4d-api/v1/authorize -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{"username":"your_username","password":"your_password"}'

(Optional) Get the job-run asset and test the API call. Replace ${token}, ${asset_id}, and ${project_id} accordingly.

curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X GET "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"

Delete the job-run asset. Again replace ${token}, ${asset_id}, and ${project_id} accordingly.

curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X DELETE "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"

Known issues for notebooks

Failure to export a notebook to HTML in the Jupyter Notebook editor

When you are working with a Jupyter Notebook created in a tool other than Watson Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.

Workaround

In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.

Remove the following metadata:

"widgets": {
   "state": {},
   "version": "1.1.2"
}

Click Edit.
Save the notebook.

Passing the value "None" as the schema argument in the "do_put" PyArrow library method stops the kernel

When you run the FlightClient do_put method and you pass the value "None" as the schema argument, the kernel crashes.

Workaround

Ensure that a valid value of type "Schema" is passed as the schema argument to the FlightClient do_put method. The "None" value should not be used for the schema argument or any other required argument.

For example, do not use:

schema = None
flight_client.do_put(flight_descriptor, schema)

Known issues for projects

Connections that require uploaded JAR files might not work in a Git-based project after upgrade

These connections require that you upload one or more JAR files:

IBM Db2 for i
IBM Db2 for z/OS (Unless you have an IBM Db2 Connect Unlimited Edition license certificate file on the Db2 for z/OS server)
Generic JDBC
SAP Bulk Extract
SAP Delta Extract
SAP HANA
SAP IDoc

If you upgrade from a version of Cloud Pak for Data that is earlier than 4.7.0, the connections that use associated JAR files might not work.

Workaround: Edit the connection to use the JAR files from their new locations.

Limitations

Limitations for assets

Security for file uploads

Applies to: 4.7.0 and later

Files you upload through the Watson Studio or Watson Machine Learning UI are not validated or scanned for potentially malicious content. It is strongly recommended that you run security software, such as an anti-virus application on all files prior to uploading to ensure the security of your content.

Can't load CSV files to projects that are larger that 20 GB

You can't load a CSV file to a project in Cloud Pak for Data that is larger than 20 GB.

Limitations for previews of assets

You can't see previews of these types of assets:

Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
Connected data assets for image files in projects.
Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
Connected data assets for PDF files in projects.

Limitations for Data Refinery

Tokenize GUI operation might not work on large data assets

Data Refinery flow jobs that include the Tokenize GUI operation might fail for large data assets.

Limitations for Hadoop integration

The Livy service does not restart when a cluster is rebooted

The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.

Workaround: Restart the Livy service.

Limitations for jobs

Jobs scheduled on repeat also run at the :00 minute

Jobs scheduled on repeat run at the scheduled time and again at the start of the next minute (:00).

Job run has wrong environment variable values if special characters are used

Environment variables defined in the job configuration are not passed correctly to the job runs if the variable values contain special characters. This might lead to job run failures, or the incorrect behavior of job runs. To resolve the problem, see Job run has wrong environment variable values if special characters are used.

Job runs fail after environments are deleted or Cloud Pak for Data has been upgraded

Job runs in deployment spaces or projects fail if the job is using an environment that has been deleted or is no longer supported after a Cloud Pak for Data version upgrade. To get the job running again, edit the job to point to an alternative environment.

To prevent job runs from failing due to an upgrade, create custom environments based on custom runtime images. Jobs associated with these environments will still run after an upgrade. For details, see Building custom images.

Limitations for projects

Unable to sync deprecated Git projects when all assets have been deleted

If you delete all assets from a deprecated Git project, the project can no longer sync with the Git repository.

Workaround: Retain at least one asset in the deprecated Git project.

In git-based projects, you cannot preview assets with managed attachments that are imported from catalogs

In git-based projects, you receive an error when you attempt to preview assets with managed attachments that are imported from catalogs. Previewing these assets in git-based projects is not supported.

Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration

You shouldn't use the Git repository from a project with deprecated Git integration in a project with default Git integration as this can result in an error. For example, in Bitbucket, you will see an error stating that the repository contains content from a deprecated Git project although the selected branch contains default Git project content.

In a project with default Git integration, you can either use a new clean Git repository or link to one that was used in a project with default Git integration.

Import of a project larger than 1 GB in Watson Studio fails

If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.

Export of a large project in Watson Studio fails with a time-out

If you are trying to export a project with a large number of assets (for example, more than 7000), the export process can time-out and fail. In that case, although you could export assets in subsets, the recommended solution is to export using the APIs available from the CPDCTL command line interface tool.

Scheduling jobs is unsupported in Git-based projects

In Git-based projects, you must run all jobs manually. Job scheduling is not supported.

Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster

If your project is associated with a Git repository that was used in a project in another cluster and contains connections that require a JDBC driver, the connections will not work in your project. If you upload the required JDBC JAR file, you will see an error stating that the JDBC driver could not be initialized.

This error is caused by the JDBC JAR file that is added to the connection as a presigned URI. This URI is not valid in a project in another cluster. The JAR file can no longer be located even if it exists in the cluster, and the connection will not work.

Workaround

To use any of these connections, you need to create new connections in the project. The following connections require a JDBC driver and are affected by this error situation:

Db2 for i
Db2 for z/OS
Generic JDBC
Hive via Execution Engine for Apache Hadoop
Impala via Execution Engine for Apache Hadoop
SAP HANA
Exasol

Limitations for data visualizations

Masked data is not supported in data visualizations

Masked data is not supported in data visualizations. If you attempt to work with masked data while generating a chart in the Visualizations tab of a data asset in a project the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported.

Parent topic: Limitations and known issues in IBM Cloud Pak for Data