Known issues for Watson Studio and supplemental services
These known issues apply to Watson Studio and the services that require Watson Studio.
- General issues
- Process times for change-pvc-permissions job increase based on data size while upgrading from Cloud Pak for Data version 3.0.1 to 3.5
- Deployments view (Operations view/dashboard) has some limitations
- Backup and restore limitations
- Active runtimes cannot be stopped after upgrading from Cloud Pak for Data version 3.5 to 3.5.1 or from version 3.5 to 3.5.2
- UI might not display properly
- Projects
- Connections
- Cannot access or create connected data assets or open connections in Data Refinery
- Connection’s shared credentials are exported to a Git repository
- Db2 on Cloud: Connection fails after migrating to the Db2 on Cloud Lite plan
- Unable to change the password for a platform connection in a Watson Studio project
- The “Jar uri” list incorrectly contains a “jdbc” selection for two connections
- Restrictions for Cloud Pak for Data credentials
- Planning Analytics for Data Refinery: Target connections and target connected data assets are not supported
- Personal credentials are not supported for connected data assets in Data Refinery
- Assets
- Hadoop integration
- Unable to install pip packages using
install_packages()on a Power machine - On certain HDP clusters, installing Execution Engine for Apache Hadoop fails
- Cannot stop jobs for a registered Hadoop target host
- The Livy service does not restart when a cluster is rebooted
- Pushing the Python 3.7 image to a registered Execution Engine for Apache Hadoop system hangs on Power
- Pushing a Spectrum Conductor image again after initial push fails
- Pushing an image to a Spectrum Conductor environment fails
- Notebooks using a Hadoop environment that don’t have the Jupyter Enterprice Gateway service enabled can’t be used
- Cannot use HadoopLibUtils to connect thru Livy in RStudio
- Hadoop Refinery: Cannot output to a connected parquet asset that is a directory
- Hadoop notebook jobs don’t support using environment variables containing spaces
- Unable to install pip packages using
- Notebooks
- An error 500 is received after opening an existing notebook
- Issues with Insert-To-Code feature for notebooks
- Cloudera Distribution for Hadoop
- Notebook loading considerations
- Kernel not found when opening a notebook imported from a Git repository
- Environment runtime can’t be started because the software customization failed
- Notebook and GPU runtimes not stopped after idle timeout
- Incorrect software version used in the
Default Python 3.7 (legacy)environment - Notebook returns UTC time and not local time
- Python notebook language version changes when moved from a catalog to a project if notebook name is changed
- Anaconda Repository for IBM Cloud Pak for Data
- RStudio
- RStudio Sparklyr package 1.4.0 can’t connect with Spark 3.0 kernel
- Running job for R script and selected RStudio environment results in an error
- Git integration broken when RStudio crashes
- RStudio doesn’t open although you were added as project collaborator
- Unable to start RStudio in a Chrome browser web page
- Data Refinery
- Unable to save a new Data Refinery flow more than once in a session
- Data Refinery flow job fails when writing double-byte characters to an Avro file
- Data Refinery flow job fails with a large data asset
- Duplicate connections in a space resulting from promoting a Data Refinery flow to a space
- Data Refinery flow fails with “The selected data set wasn’t loaded” message
- Unable to promote a Data Refinery flow to a space after upgrade
- Cannot specify format options after you change the Data Refinery flow source
- System-level schemas aren’t filtered out (Db2 Warehouse on Cloud)
- Jobs
- Spark jobs are supported only by API
- Excluding days when scheduling a job causes unexpected results
- Error occurs when jobs are edited
- Can’t delete notebook job stuck in starting or running state
- Notebook runs successfully in notebook editor but fails when run as job
- Can’t add environment variables that have white spaces or special characters in the value to an existing job
- Watson Machine Learning
- Deployments might require software specification updates following upgrade
- AutoAI requires AVX2 support
- Do not import/export models between clusters running on different architectures
- Watson Machine Learning might require a manual rescale
- Deleting model definitions used in Deep Learning experiments
- Resetting stalled batch deployment jobs
General issues
Process times for change-pvc-permissions job increase based on data size while upgrading from Cloud Pak for Data version 3.0.1 to 3.5
While upgrading from 3.0.1 to 3.5, the change-pvc-permissions job will take longer to process based on the amount of data in storage.
Backup and restore limitations
Offline quiesce is supported only at the OpenShift project level and it restores only to the same machine and to the same namespace.
Deployments view (Operations view/dashboard) has some limitations
The Deployments view has the following limitation:
- When long names are used, they’re not fully truncated and can be obscured on the screen.
Active runtimes cannot be stopped after upgrading from Cloud Pak for Data version 3.5 to 3.5.1 or from version 3.5 to 3.5.2
Before you upgrade from Cloud Pak for Data version 3.5 to 3.5.1 or from 3.5 to 3.5.2, you must ensure that all active Watson Studio runtimes are stopped on the Projects > Active runtimes tab.
Workaround if runtimes were not stopped
In case you upgraded from Cloud Pak for Data version 3.5 to 3.5.1 or from version 3.5 to 3.5.2 without first stopping all active runtimes, and you are seeing runtimes which you cannot delete, perform the following steps in the given order.
Required role: You must be a Cloud Pak for Data administrator to perform these steps.
Execute the following commands from the command line using the OpenShift CLI:
- Log in to the Red Hat OpenShift cluster as a project administrator:
oc login OpenShift_URL:port - Run the following to delete all services for invalid deployments:
oc get deploy -l created-by=spawner,dsxScopeType!=project,dsxScopeType!=space | grep -v NAME |awk '{print $1}' |xargs -I {} oc delete svc {}-svc - Then run this command to delete all invalid deployments:
oc delete deploy -l created-by=spawner,dsxScopeType!=project,dsxScopeType!=space
Applies to: 3.5.1 and 3.5.2 when upgrading from 3.5
Important: Does not apply when upgrading from 3.5.1 to 3.5.2
Fixed in: 3.5.3
UI display might not load properly
If the UI does not load properly, the Watson Studio administrator must restart redis.
Workaround
If the client behaves unexpectedly, for example, enters a redirection loop or parts of the user interface fail to load, then complete the following steps to restart redis:
- Log in to the OpenShift cluster.
- Restart the pods for
redisusing the following command:oc delete po -n <project> | oc get po -n <project> -l component=redis -o jsonpath="{.items[*].metadata.name}"
Projects
Import of a project larger than 1 GB in Watson Studio fails
If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.
Cannot synchronize project assets after upgrading from Cloud Pak for Data 3.0.1 to 3.5
If you created a project with Git integration in Cloud Pak for Data 3.0.1 and you upgrade to Cloud Pak for Data 3.5, you will not be able to synchronize data assets between the project that was upgraded and the Git repository. The reason is that your old Git token from your Cloud Pak for Data 3.0.1 project is not recognized after you upgraded.
Workaround:
To synchronize assets, add a new Git token to the project using the same personal access token you had to access the repository before you upgraded:
- Click your avatar in the IBM Cloud Pak for Data banner and then click Profile and settings.
- Click the Git integrations tab, select your platform, enter your personal access token, and give the token a name.
- In your project, click
from the project toolbar and select Pull and push. - Select the new Git token you just created and choose the assets to synchronize.
Connections
Cannot access or create connected data assets or open connections in Data Refinery
The following scenario can prevent you from accessing or creating connected data assets in a project and from opening connections in Data Refinery:
- Create a connection in a catalog. Add that same connection to a project as a platform connection. In the project, create a connected data asset that references the connection. Delete the connection from the catalog.
As a workaround for this scenario, delete any orphaned referenced connections that are still in the project.
Applies to: 3.5.10
Fixed in: 4.0.5
Connection’s shared credentials are exported to a Git repository
If you have a project that includes a connection with shared credentials, and you export the project to an external Git repository, the shared credentials are exposed as plain text in the connection’s metadata.
To prevent shared credentials from being exported, remove connections that have shared credentials before you export the project to Git.
Applies to: 3.5.0 and later
Db2 on Cloud: Connection fails after migrating to the Db2 on Cloud Lite plan
If you are connecting to a Db2 on Cloud data source and you have migrated to the Db2 on Cloud Lite plan, the connection from Cloud Pak for Data fails because the Db2 on Cloud hostname and the port number are changed. Use one of the following workarounds to connect to Db2 on Cloud:
- Existing connection: Use the REST API to set the hostname and port number.
- New connection: Use the Db2 connection to connect to Db2 on Cloud.
Applies to: 3.5.0
Fixed in: 3.5.10
Unable to change the password for a platform connection in a Watson Studio project
If you add a new connection from Platform connections in a Watson Studio project, you cannot change the password for that connection.
Workaround: Edit the password from the Platform connections page.
Applies to: 3.5.2 and later
The “Jar uri” list incorrectly contains a “jdbc” selection for two connections
The following connections contain an erroneous jdbc selection in the Jar uris section of the New connection page:
- Generic JDBC connection
- SAP HANA connection
If you want to create a Generic JDBC connection or a SAP HANA connection, you must select a different JDBC driver from the Jar uris list. Do not select the jdbc item. Instead, if the desired driver is not in the list, upload a new JAR file from the Upload new file hyperlink. Alternatively, upload a JAR file from the JDBC drivers tab from Data > Platform connections. You must have the Administer platform permission to upload a JAR file in either place.
Applies to: 3.5.0 - 3.5.1
Fixed in: 3.5.2
Restrictions for Cloud Pak for Data credentials
SPSS Modeler does not support Cloud Pak for Data credentials for the following connections:
- IBM Data Virtualization
- IBM Db2
Workaround: Use your username and password to connect to the data source.
Applies to: 3.5.0 and later
Data Refinery does not support Cloud Pak for Data credentials when you run the Data Refinery flow in a Spark runtime environment for these connections:
| Connection | Workaround |
|---|---|
| IBM Cognos Analytics | Use your username and password to connect to the data source or use the Default Data Refinery XS runtime environment |
| IBM Data Virtualization | Use your username and password to connect to the data source or use the Default Data Refinery XS runtime environment |
| IBM Db2 | Use your username and password to connect to the data source or use the Default Data Refinery XS runtime environment |
| Storage volume | Use the Default Data Refinery XS runtime environment |
Applies to: 3.5.0 and later
Planning Analytics for Data Refinery: Target connections and target connected data assets are not supported
Jobs for Data Refinery flows that use a Planning Analytics target connection or a target connected data asset will fail.
Applies to: 3.5.0 and later
Personal credentials are not supported for connected data assets in Data Refinery
If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.
Workaround:
- Go to the project page, and click the link for the connected data asset to open the preview.
- Enter credentials.
- Open Data Refinery and use the authenticated connected data asset for a source or target.
Applies to: 3.5.0 and later
Assets
Limitations for previews of assets
You can’t see previews of these types of assets:
- Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
- Connected data assets for image files in projects.
- Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
- Connected data assets for PDF files in projects.
Can’t load files to projects that have #, %, or ? characters in the name
You can’t create a data asset in a project by loading a file that contains a hash character (#), percent sign (%), or a question mark (?) in the file name.
Applies to: 3.5.0 and later
Can’t load CSV files to projects that are larger that 20 GB
You can’t load a CSV file to an analytics project in Cloud Pak for Data that is larger than 20 GB.
Hadoop integration
Unable to install pip packages using install_packages() on a Power machine
If you are using a Power cluster, you might see the following error when attempting to install pip packages with hi_core_utils.install_packages():
ModuleNotFoundError: No module named '_sysconfigdata_ppc64le_conda_cos6_linux_gnu'
To work around this known limitation of hi_core_utils.install_packages() on Power, export the following environment variable before calling install_packages():
# For Power machines, export this env var to work around a known issue in
# hi_core_utils.install_packages()...
os.environ['_CONDA_PYTHON_SYSCONFIGDATA_NAME'] = "_sysconfigdata_powerpc64le_conda_cos7_linux_gnu"
On certain HDP Clusters, the Execution Engine for Apache Hadoop service installation fails
The installation fails during the Knox Gateway Configuration step. The issue is because the Knox gateway fails to start and happens on some nodes. See the Apache JIRA database for more information about this issue.
The following errors occur:
-
Failed to configure gateway keystore Exception in thread "main" Caused by: java.lang.NoSuchFieldError: DEFAULT_XML_TYPE_ATTRIBUTE -
Exception in thread "main" java.lang.reflect.InvocationTargetException Caused by: java.lang.NoSuchMethodError: org.eclipse.persistence.internal.oxm.mappings.Field.setNestedArray(Z)VException in thread "main" java.lang.reflect.InvocationTargetException
The workaround is to remove the org.eclipse.persistence.core-2.7.2.jar file from the installation directory by using the following command:
mv /opt/ibm/dsxhi/gateway/dep/org.eclipse.persistence.core-2.7.2.jar /tmp/
Cannot stop jobs for a registered Hadoop target host
When a registered Hadoop cluster is selected as the Target Host for a job run, the job cannot be stopped. As a workaround, view the Watson Studio Local job logs to find the Yarn applicationId; then, use the ID to manually stop the Hadoop job on the remote system. When the remote job is stopped, the Watson Studio Local job will stop on its own with a “Failed” status. Similarly, jobs that are started for registered Hadoop image push operations cannot be stopped either.
Livy services does not start up when cluster is rebooted
The Livy service will not automatically restart after a system reboot if the HDFS Namenode is not in an active state.
Applies to: 3.5.0 and later
Pushing the Python 3.7 image to a registered Execution Engine for Apache Hadoop system hangs on Power
When the Python 3.7 Jupyter image is pushed to an Execution Engine for Apache Hadoop registered system using platform configurations, installing dist-keras package into the image (using pip) hangs. The job runs for hours but never completes, and the console output for the job ends with:
Attempting to install HI addon libs to active environment ...
==> Target env: /opt/conda/envs/Python-3.7-main ...
====> Installing conda packages ...
====> Installing pip packages ...
This hang is caused by a pip regression in dependency resolution, as described in https://github.com/pypa/pip/issues/9215.
To workaround this problem, do the following steps:
-
Stop the image push job that is hung. To find the job that is hung, use the following command:
oc get job -l headless-type=img-saver -
Delete the job:
oc delete job <jobid> -
Edit the Execution Engine for Apache Hadoop image push script located at
/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh. -
Add the flag
--use-deprecated=legacy-resolverto the pip install command, as follows:@@ -391,7 +391,7 @@ conda-env=2.6.0 opt_einsum=3.1.0 > /tmp/hiaddons.out 2>&1 ; then echo " OK" echo -n " ====> Installing pip packages ..." - if pip install dist-keras==0.2.1 >> /tmp/hiaddons.out 2>&1 ; then + if pip install --use-deprecated=legacy-resolver dist-keras==0.2.1 >> /tmp/hiaddons.out 2>&1 ; then echo " OK" hiaddonsok=true -
Re-start your image push operation again by clicking Replace image in the Execution Engine for Apache Hadoop registration page.
To edit the Execution Engine for Apache Hadoop image push script:
-
Access the pod running
utils-api. This has the/cc-home/.scriptsdirectory mounted.oc get pod | grep utils-api -
Extract the existing
/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.shfile:oc cp <utils_pod_id>:/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh dsx-hi-save-image.sh -
Modify the file per the workaround steps:
vi dsx-hi-save-image.sh -
Copy the new file into the pod, in the
/tmp dir:oc cp dsx-hi-save-image.sh <utils_pod_id>:/tmp -
Exec into the pod, and do:
oc rsh <utils_pod_id> -
Make a backup:
cp -up /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.bak -
Dump the content in
/tmp/dsx-hi-save-image.shto/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh:cat /tmp/dsx-hi-save-image.sh > /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh` -
Do a diff to make sure you have the changes:
diff /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.bak -
Exit the utils-api pod:
exit
Pushing a Spectrum Conductor image again after initial push fails
If you push the Spectrum Conductor image again after the initial push for a configured Spectrum Conductor for the Anaconda instance, an error can occur. On the image push log, you will see the following error stack:
create_environment_from_yamls ...
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/cc-home/.scripts/proxy-pods/dsx-hi/pushToDSXHI.py", line 389, in create_environment_from_yamls
response = createAndCheckEnvStatus(yams_list, instanceUUID)
File "/cc-home/.scripts/proxy-pods/dsx-hi/pushToDSXHI.py", line 412, in createAndCheckEnvStatus
raise Exception('something went wrong while pushing image to conductor cluster. Please check logs for conductor cluster')
Exception: something went wrong while pushing image to conductor cluster. Please check logs for conductor cluster
Locate the Anaconda instance and the name of the Anaconda environment that the image push is trying to create. The name of the environment is in the following format: <topology_name>__jupyter-py37.
The error from Spectrum Conductor should look like the following message:
The following specifications were found to be incompatible with each other:
<a long list of packages>
To work around the issue:
- Click Clear Error on the Spectrum Conductor environment that failed.
- Select the environment, and then click Remove.
- On Cloud Pak for Data, log in as an admin.
- From the Platform configuration page, click the system where the image push failed.
- Run the image push again for the failed image.
Applies to: 3.5.1 and later
Pushing an image to a Spectrum Conductor environment fails
If you push an image to a Spectrum Conductor environment, the image push fails.
Applies to: 3.5.1 and later
Notebooks using a Hadoop environment that don’t have the Jupyter Enterprice Gateway service enabled can’t be used
When a notebook is opened using a Hadoop environment that does not have the Jupyter Enterprise Gateway (JEG) service enabled, the kernel remains disconnected. This notebook does not display any error message, and it cannot be used.
To address this issue, confirm that the Hadoop environment that is defined to use with a JEG service does have JEG enabled.
Cannot use HadoopLibUtils to connect thru Livy in RStudio
RStudio includes a new version of Sparklyr. Sparklyr doesn’t work with the HadoopLibUtils library that’s provided to connect to remote Hadoop clusters using Livy. The following error occurs: Error: Livy connections now require the Spark version to be specified
You must install sparklyr-1.0.2 using the steps below:
require(remotes)
library(remotes)
install_version(“sparklyr”, version = “1.0.2”, repos = http://cran.us.r-project.org)
Note: While you’re installing the older Sparklyr package, press Enter when you’re asked about updating packages, “ Enter one or more numbers, or an empty line to skip updates:” You don’t need to update the dependent packages.
After the package is installed, load sparklyr 1.0.2 when you are using HadoopLibUtilsR.
- Applies to: 3.5.0
- Fixed in: 3.5.1
Hadoop Refinery: Cannot output to a connected parquet asset that is a directory
This is a specific problem for when running a Refinery data shaping, using the Execution Engine for Apache Hadoop connector for HDFS. When you select an output target directory that contains a set of parquet files, the Data Refinery layer cannot determine the file format for this directory.
Instead, it leaves the File Format field blank. This causes a failure when writing the output because File Format ‘ ‘ is not supported.
To work around this issue, select an output target that is empty or not a directory.
Hadoop notebook jobs don’t support using environment variables containing spaces
When you’re setting up a notebook job using a remote Hadoop environment, there is an option to specify environment variables that can be accessed by the notebook. There is an issue, where if the environment variable is defined to contains spaces, the JEG job will fail. The remote Hadoop jeg.log displays:
[E 2020-05-13 10:28:54.542 EnterpriseGatewayApp] Error occurred during launch of KernelID
To work around this issue, it is recommended to not define an environment variable that contains spaces as a value. You can also, if necessary, encode the value and decode when the notebook is using the content of the environment variable.
Notebooks
An error 500 is received after opening an existing notebook
An error 500 is received after opening an existing notebook with a large amount of data assets. The notebook takes a long time to load and the notebook-UI pod considers it as a failure.
Workaround: Use the notebook URL directly instead of clicking on the project, then the notebook.
Issues with Insert-To-Code feature for notebooks
If you use the Insert-To-Code feature for a connection to an Informix database, and if Informix is configured for case-sensitive identifiers, the inserted code throws a runtime error if you try to query a table with an upper-case name. In the cell output, there’s an error message similar to the following:
DatabaseError: Execution failed on sql: SELECT * FROM informix.FVT_EMPLOYEE
java.sql.SQLException: The specified table (informix.fvt_employee) is not in the database.
unable to rollback
Workaround
In your notebook, edit the inserted code. Add connection property 'DELIMIDENT=Y' to the connection, and surround the upper-case identifier with double-quotes "".
For example, replace the following lines:
informix_connection = jaydebeapi.connect('com.informix.jdbc.IfxDriver',
'{}://{}:{}/{}:user={};password={};'.format('jdbc:informix-sqli',
informix_credentials['host'],
informix_credentials['port'],
informix_credentials['database'],
informix_credentials['username'],
informix_credentials['password']), [informix_credentials['username'],
_credentials['password']])
query = 'SELECT * FROM informix.FVT_EMPLOYEE'
With this version:
informix_connection = jaydebeapi.connect('com.informix.jdbc.IfxDriver',
'{}://{}:{}/{}:user={};password={};'.format('jdbc:informix-sqli',
informix_credentials['host'],
informix_credentials['port'],
informix_credentials['database'],
informix_credentials['username'],
informix_credentials['password']),
{
'user': informix_credentials['username'],
'password': informix_credentials['password'],
'DELIMIDENT': 'Y'
})
query = 'SELECT * FROM informix."FVT_EMPLOYEE"'
Error in notebooks when rendering data from Cloudera Distribution for Hadoop
When running Jupyter notebooks against Cloudera Distribution for Hadoop 5.16 or 6.0.1 with Spark 2.2, the first dataframe operation for rendering cell output from the Spark driver results in a JSON encoding error.
Workaround:
When running Jupyter notebooks against CDH 5.16 or 6.0.1 with Spark 2.2, the first dataframe operation for rendering cell output from the Spark driver results in a JSON encoding error. To workaround this error, do one of the following procedures:
-
For interactive notebook sessions
Manually re-run the first cell that renders data.
-
For non-interactive notebook sessions
Add the following non-intrusive code after establishing the Spark connection to trigger the first failure:
%%spark
import sys, warnings
def python_major_version ():
return(sys.version_info[0])
with warnings.catch_warnings(record=True):
print(sc.parallelize([1]).map(lambda x: python_major_version()).collect())
Notebook loading considerations
The time that it takes to create a new notebook or to open an existing one for editing purposes might vary. If no runtime container is available, a container needs to be created and only after it is available, the Jupyter notebook user interface can be loaded. The time it takes to create a container depends on the cluster load and size. Once a runtime container exists, subsequent calls to open notebooks will be significantly faster.
Kernel not found when opening a notebook imported from a Git repository
If you import a project from a Git repository that contains notebooks that were created in JupyterLab, and try opening the notebooks from the project Assets page, you will see a message stating that the required notebook kernel can’t be found.
The reason is that you are trying to open the notebook in an environment that doesn’t support the kernel required by the notebook, for example in an environment without Spark for a notebook that uses Spark APIs. The information about the environment dependency of a notebook that was created in JupyterLab and exported in a project is currently not available when this project is imported again from Git.
Workaround:
You need to associate the notebook with the correct environment definition. You can do this:
-
From the notebook opened in edit mode by:
- Clicking the Notebook Info icon (
) from the notebook toolbar and then clicking Environment. - Selecting the correct environment definition for your notebook from the list under Environments.
- Clicking the Notebook Info icon (
-
Before you open the notebook, from the project Assets page by:
- Selecting the notebook and unlocking it if it is locked. You can only change the environment of a notebook if the notebook is unlocked.
- Clicking Actions > Change Environment and selecting the correct environment definition for your notebook.
Environment runtime can’t be started because the software customization failed
If your Jupyter notebook runtime can’t be started and a 47 killed error is logged, the software customization process could not be completed because of lack of memory.
You can customize the software configuration of a Jupyter notebook environment by adding conda and pip packages. However, be aware that conda does dependency checking when installing packages which can be memory intensive if you add many packages to a customization.
To complete a customization successfully, you must make sure that you select an environment with sufficient RAM to enable dependency checking at the time the runtime is started.
If you only want packages from one conda channel, you can prevent unnecessary dependency checking by excluding the default channels. To do this, remove defaults from the channels list in the customization template and add nodefaults.
Notebook and GPU runtimes not stopped after idle timeout
In the Cloud Pak for Data Version 3.5 January refresh, the Jupyter notebook runtimes, including those started for JupyterLab, and the GPU runtimes are not shutdown automatically, although the default idle timeout or the user configured idle timeout was reached.
Workaround:
You need to manually stop all active notebook (including the JupyterLab) and GPU environment runtimes after you have completed your work. You can do this from the project’s Environments tab. See Stopping active runtimes.
Project administrators can monitor and stop active environment runtimes across all projects. See Viewing active runtimes across projects.
Applies to: 3.5.1 only.
Fixed in: 3.5.2
Incorrect software version used in the Default Python 3.7 (legacy) environment
The software version for the Default Python 3.7 (legacy) environment was incorrectly named Default Python 3.7 OpenCE (jupyter-py37-legacy). It should be named Default Python 3.7 (legacy).
Applies to: 3.5.3 only
Notebook returns UTC time and not local time
The Python function datetime returns the date and time for the UTC time zone and not the local time zone where a user is located. The reason is that the default environment runtimes use the time zone where they were created, which is UTC time.
Workaround:
If you want to use your local time, you need to download and modify the runtime configuration file to use your time zone. You don’t need to make any changes to the runtime image. After you upload the configuration file again, the runtime will use the time you set in the configuration file.
Required role: You must be a Cloud Pak for Data cluster administrator to change the configuration file of a runtime.
To change the time zone:
- Download the configuration file of the runtime you are using. Follow the steps in Downloading the runtime configuration.
- Update the runtime definition JSON file and extend the environment variable section to include your time zone, for example, for Europe/Viena use:
{ "name": "TZ", "value": "Europe/Vienna" } -
Upload the changed JSON file to the Cloud Pak for Data cluster. You can use the Cloud Pak for Data API.
- Get the required platform access token. The command returns the bearer token in the accessToken field:
curl <CloudPakforData_URL>/v1/preauth/validateAuth -u <username>:<password> - Upload the JSON file:
curl -X PUT \ 'https://<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm' \ -H 'Authorization: Bearer <platform-access-token>' \ -H 'content-type: multipart/form-data' \ -F upFile=@/path/to/runtime/def/<custom-def-name>-server.jsonImportant: Change the name of the modified JSON file. The file name must end with
server.jsonand the same file name must be used across all clusters to enable exporting and importing analytics projects across cluster boundaries.If the changed JSON file was uploaded successfully, you will see the following response:
{ "_messageCode_": "Success", "message": "Successfully uploaded file and created the necessary directory structure" }
- Get the required platform access token. The command returns the bearer token in the accessToken field:
- Restart the notebook runtime.
Python notebook language version changes when moved from a catalog to a project if notebook name is changed
When you create a Python 3.7 notebook in a project, add it to the catalog, change its name in the catalog and then add it from the catalog back to a project, the code language of the notebook is changed from Python 3.7 to Python 3.6.
Workaround:
When you add a Python 3.7 notebook from the catalog to a project and you changed the name of the notebook, you need to change the environment of the notebook back to a Python 3.7 environment.
You can change the environment of a notebook:
-
From the notebook opened in edit mode:
- Click the Notebook Info icon (
) from the notebook toolbar and then click Environment. A short description of the environment is displayed. - Select a Python 3.7 runtime from the list under Environments.
The active runtime is stopped and the runtime you selected is instantiated. - Select Change environment. This stops the active runtime and starts the newly selected environment.
- Click the Notebook Info icon (
-
From the Assets page of your project:
-
Select the notebook in the Notebooks section, click Actions > Change Environment and select a Python 3.7 environment. The notebook kernel must be stopped before you can change the environment.
This new runtime environment will be instantiated the next time the notebook is opened for editing.
-
Applies to: 3.5.0
Fixed in: 3.5.10
Anaconda Repository for IBM Cloud Pak for Data
Channel names for Anaconda Repository for IBM Cloud Pak for Data don’t support double-byte characters
When you create a channel in Anaconda Team Edition, you can’t use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _
RStudio
RStudio Sparklyr package 1.4.0 can’t connect with Spark 3.0 kernel
When users try to connect the Sparklyr R package in RStudio with a remote Spark 3.0 kernel, the connection fails because of Sparklyr R package connection issues. The connection issues are due to recent changes to Sparklyr R package version 1.4.0. This will be addressed in future releases. The workaround is to use the Spark 2.4 kernel.
Running job for R script and selected RStudio environment results in an error
When you’re running a job for an R Script and a custom RStudio environment was selected, the following error occurs if the custom RStudio environment was created with a previous release of Cloud Pak for Data:
The job uses an environment that is not supported. Edit your job to select an alternative environment.
To work around this issue, delete and re-create the custom RStudio environment with the same settings.
Git integration broken when RStudio crashes
If RStudio crashes while working on a script and you restart RStudio, integration to the associated Git repository is broken. The reason is that the RStudio session workspace is in an incorrect state.
Workaround
If Git integration is broken after RStudio crashed, complete the following steps to reset the RStudio session workspace:
- Click on the Terminal tab next to the Console tab to create a terminal session.
- Navigate to the working folder
/home/wsuserand rename the.rstudiofolder to.rstudio.1. - From the File menu, click Quite Session… to end the R session.
- Click Start New Session when prompted. A new R project with Git integration is created.
RStudio doesn’t open although you were added as project collaborator
If RStudio will not open and all you see is the endless spinner, the reason is that, although you were added as collaborator to the project, you have not created your own personal access token to the Git repository associated with the project. To open RStudio with Git integration, you must select your own access token.
To create your own personal access token, see Collaboration in RStudio.
Unable to start RStudio in a Chrome browser web page
The RStudio IDE cannot be launched in a Chrome browser after upgrading from IBM Cloud Pak for Data version 3.0.1 to version 3.5 or installing version 3.5.2. Sign-in authentication fails because the redirected URL is taken from the Chrome cache that is never cleared.
Workaround: Disable the browser cache in Developer Tools. Bear in mind that disabling the cache will affect every web page in Chrome you browse. So you may want to turn it off again when you close the RStudio IDE.
Applies to: 3.5.2, after upgrading from 3.0.1 or when installing 3.5.2
Data Refinery
Unable to save a new Data Refinery flow more than once in a session
If you create a new Data Refinery flow and save it, and then try to save it again, you might receive the following error:
Unable to save Data Refinery flow: Failed to apply patch to JSON object: Cannot construct instance of com.github.fge.jsonpatch.ReplaceOperation, problem: java.lang.NullPointerException at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: java.util.ArrayList[1])
Workaround: Edit the Data Refinery flow name: Open the Information pane’s Details tab. In the DATA REFINERY FLOW NAME section, click the Edit icon, and then click Apply. (You don’t need to change the name.) Thereafter, you can save the Data Refinery flow multiple times.
Applies to: 3.5.14 and later
Data Refinery flow job fails when writing double-byte characters to an Avro file
If you run a job for a Data Refinery flow that uses a double-byte character set (for example, the Japanese or Chinese languages), and the output file is in the Avro file format, the job will fail.
Applies to: 3.5.0 and later
Data Refinery flow job fails with a large data asset
If your Data Refinery flow job fails with a large data asset, try these troubleshooting tips to fix the problem:
- Instead of using a project data asset as the target of the Data Refinery flow (default), use cloud storage for the target. For example, IBM Cloud Object Storage, Amazon S3, or Google Cloud Storage.
- Select a Spark & R 3.6 environment for the Data Refinery flow job or create a new Spark & R 3.6 environment definition.
- Be aware that certain Data Refinery flow operations might not work on large data assets:
- Convert column type to Date or to Timestamp (Also applies to the Convert column type operation as the automatic first step in a Data Refinery flow)
- Convert column type Manual convert to an Integer with a grouping symbol other than the default comma.
- Convert column type to Decimal with a decimal marker other than the default dot.
- Remove stop words
- Replace substring
- Split column
- Text > Trim quotes
- Text > Pad characters
- Text > Substring
- Tokenize
- Increase the load balancer timeout on the cluster. For instructions, see Watson Knowledge Catalog processes time out before completing.
Applies to: 3.5.0 and later
Duplicate connections in a space resulting from promoting a Data Refinery flow to a space
When you promote a Data Refinery flow to a space, all dependent data is promoted as well. If the Data Refinery flow that is being promoted has a dependent connection asset and a dependent connected data asset that references the same connection asset, the connection asset will be duplicated in the space.
The Data Refinery flow will still work. Do not delete the duplicate connections.
Applies to: 3.5.0 and later
Data Refinery flow fails with “The selected data set wasn’t loaded” message
The Data Refinery flow might fail if there are insufficient resources. The administrator can monitor the resources and then add resources by scaling the Data Refinery service or by adding nodes to the Cloud Pak for Data cluster.
Applies to: 3.5.0 and later
Unable to promote a Data Refinery flow to a space after upgrade
If you created a Data Refinery flow in Cloud Pak for Data 3.0.1, and then upgraded to Cloud Pak for Data 3.50 or later, you cannot promote the Data Refinery flow to a space.
Workaround: Open the Data Refinery flow, and save it.
If the opening and saving the Data Refinery flow does not fix the problem, open the Data Refinery flow, change its name, and then save it:
- Click the Data Refinery flow name in the Assets page.
- In the Information pane, click Edit.
- Change the name in the DATA REFINERY FLOW NAME field.
- Click Done, and then save the Data Refinery flow.
You will then be able to promote the Data Refinery flow to a space.
Applies to: 3.5.0 and later
Cannot specify format options after you change the Data Refinery flow source
When data is read into Data Refinery, you can scroll down to the SOURCE FILE information at the bottom of the page and click the “Specify data format” icon to specify format options for CSV or delimited files. However, if you changed the source of a Data Refinery flow, this feature is not available.
Applies to: 3.5.0 - 3.5.1
Fixed in: 3.5.2
System-level schemas aren’t filtered out (Db2 Warehouse on Cloud)
When creating a connection to IBM Db2 Warehouse on Cloud (previously named IBM dashDB), system-level schemas aren’t filtered out.
Applies to: 3.5.0 and later
Target connection limitations (Compose for MySQL)
When you save the Data Refinery flow output (target data sets) to connections, Data Refinery flows that have a target on a Compose for MySQL connected data asset are not supported.
Applies to: 3.5.0 and later
Jobs
Spark jobs are supported only by API
If you want to run analytical and machine learning applications on your Cloud Pak for Data cluster without installing Watson Studio, you must use the Spark jobs REST APIs of Analytics Engine powered by Apache Spark. See Getting started with Spark applications.
Excluding days when scheduling a job causes unexpected results
If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.
This issue only exists if you exclude days of a week when you schedule to run a job.
Error occurs when jobs are edited
You cannot edit jobs that were created prior to upgrading to Cloud Pak for Data version 3.0 or later. An erorr occurs when you edit those jobs. Create new jobs after upgrading to Cloud Pak for Data version 3.0 or later.
Errors can also occur if the user who is trying to edit the job or schedule is different from the user who started or created the job. For example, if a Project Editor attempts to edit a schedule that was created by another user in the project, an error occurs.
Can’t delete notebook job stuck in starting or running state
If a notebook job is stuck in starting or running state and won’t stop, although you tried to cancel the job and stopped the active environment runtime, you can try deleting the job by removing the job-run asset manually using the API.
- Retrieve a bearer token from the user management service using an API call:
curl -k -X POST https://PLATFORM_CLUSTER_URL/icp4d-api/v1/authorize -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{"username":"your_username","password":"your_password"}' - (Optional) Get the job-run asset and test the API call. Replace
${token},${asset_id}, and${project_id}accordingly.curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X GET "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}" - Delete the job-run asset. Again replace
${token},${asset_id}, and${project_id}accordingly.curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X DELETE "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
Notebook runs successfully in notebook editor but fails when run as job
Some libraries require a kernel restart after a version change. If you need to work with a library version that isn’t pre-installed in the environment in which you start the notebook, and you install this library version through the notebook, the notebook only runs successfully after you restart the kernel. However, when you run the notebook non-interactively, for example as a notebook job, it fails because the kernel can’t be restarted. To avoid this, define an environment defintion and add the library version you require as a software customization. See Creating environment definitions.
Can’t add environment variables that have white spaces or special characters in the value to an existing job
You cannot edit an existing Python script or notebook job and add environment variables that include characters other than [A-Z a-z 0-9 ; , / ? : @ & = + $ - _ . ! ~ * ‘ ( ) #]. For example, if you add VAR1=with spaces, you will get an error saying that the job could not be updated because the environment variables were not entered in the correct format.
To work around this issue, delete the existing job and create a new one that includes environment variables. Each variable declaration must be made for a single variable in the following format VAR_NAME=foo and appear on its own line.
Fixed in: 3.5.3
Watson Machine Learning
Deployments might require software specification updates following upgrade
After upgrading from 3.0.1 release to 3.5.14 or later, predictions on deployments of a Python function or model that are associated with ai-function_0.2-py3.7 or scikit-learn_0.23-py3.7 software specifications might fail with this type of error:
Software specification pytorch-onnx_1.3-py3.7-edt for function is not supported. Supported
software specification for function are ai-function_0.2-py3.6, default_py3.6 and default_py3.7, default_py3.7_opence.
To recover from this error, users can patch the affected Python functions or models with the supported software specification corresponding to the model/python function and retry the prediction. For details, see Specifying a model type and software specification.
AutoAI requirement for AVX2
The AVX2 instruction set is not required to run AutoAI experiments, however it does improve performance. AutoAI experiments will run more slowly without AVX2.
Watson Machine Learning might require manual rescaling
By default, the small installation of Watson Machine Learning comes up with one pod. When the load on the service increases, you may experience these symptoms, indicating the need to manually scale the wmlrepository service:
wmlrepositoryservice pod restarts with anOut Of Memoryerrorwmlrepositoryservice request fails with this error:
Generic exception of type HttpError with message: akka.stream.BufferOverflowException: Exceeded configured max-open-requests value of [256]. This means that the request queue of this pool has completely filled up because the pool currently does not process requests fast enough to handle the incoming request load. Please retry the request later. See http://doc.akka.io/docs/akka-http/current/scala/http/client-side/pool-overflow.html for more information.
Use this command to scale the repository:
./cpd-linux scale -a wml --config medium -s server.yaml -n <namespace>
medium.yaml
commands:
- scale --replicas=2 deployment wmlrepository
Do not import/export models between clusters running on different architectures
When you export a project or space, the contents, including model assets, are included in the export package. You can then import the project or space to another server cluster. Note that the underlying architecture must be the same or you might encounter failures with the deployment of your machine learning models. For example, if you export a space from a cluster running the Power platform, then import to a cluster running x86-64, you may be unable to deploy your machine learning models.
Deleting model definitions used in Deep Learning experiments
Currently, users can create create model definition assets from the Deep Learning Experiment Builder but cannot delete a model definition. They must use REST APIs to delete model definition assets.
Resetting stalled batch deployment jobs
Some of the failed batch deployment jobs can cause all other subsequent jobs to stall (they stay in pending or starting state indefinitely). This happens when the hardware_spec.num_nodes parameter is not properly updated.
To fix this issue, perform one of these tasks:
- Delete the existing batch-deployment and create a new batch deployment.
- Use the REST API to patch the existing batch deployment (increment the value of
hardware_spec.num_nodesby 1):
- Run a GET query on the existing batch deployment by its ID. In this example,
entity.hardware_spec.num_nodeshas the value of1:
curl -ik -X GET --header 'Content-Type: application/json' --header 'Authorization: Bearer token' "<deployment-url>"
HTTP/1.1 200 OK
Date: Thu, 19 May 2022 11:06:32 GMT
Content-Type: application/json
Content-Length: 704
Connection: keep-alive
x-envoy-upstream-service-time: 312
Server: ---
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=31536000; includeSubDomains
{
"entity": {
"asset": {
"id": "<id>"
},
"batch": {
},
"custom": {
},
"deployed_asset_type": "model",
"hardware_spec": {
"id": "<id>",
"name": "S",
"num_nodes": 1
},
"name": "scikit-batch1",
"space_id": "<space-id>",
"status": {
"state": "ready"
}
},
"metadata": {
"created_at": "<date>",
"id": "<id>",
"modified_at": "<date>",
"name": "scikit-batch1",
"owner": "<owner>",
"space_id": "<space_id>"
}
}%
- Increment
entity.hardware_spec.num_nodesvalue by 1:
curl -ik -X PATCH --header 'Content-Type: application/json' --header 'Authorization: Bearer token' "<deployment-url>" -d @v4patch.json
% cat v4patch.json
[
{
"op": "replace",
"path": "/hardware_spec",
"value": {
"name": "S",
"num_nodes": 2
}
}
]