Known issues for Watson Studio and supplemental services
These known issues apply to Watson Studio and the services that require Watson Studio.
- General issues
- Common Core Services operator status is hung on infinite reconcile loop during upgrade with lingering cronjobs
- Internal service error occurs when you upload files, that already exist on a server, to a remote NFS volume server
- Deployments view (Operations view/dashboard) has some limitations
- Spark environments can be selected although Spark is not installed
- Backup and restore limitations
- UI might not display properly
- Can’t stop an active runtime for notebooks, JupyterLab, Data Refinery and SPSS Modeler
-
Projects
- Option to log all project activities is enabled but project logs don't contain activities and return an empty list
- Job scheduling does not work consistently with default git projects
- Import of a project larger than 1 GB in Watson Studio fails
- Export of a large project in Watson Studio can timeout
- Cannot stop job runs in a project associated with a Git repository
- Cannot work with all assets pulled from a Git repository
- Can't include a Cognos dashboard when exporting a project to desktop
- Can't create a project although the project name is unique
- Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
- Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
- Importing a project that contains a Data Refinery job that uses Spark 2.4 fails
-
Connections
- Cannot create a connection to Oracle when credentials must be stored in a secret in a vault
- Authentication fields unavailable in the Cloud Object Storage (COS) connection
- FTP connection: Use the SSH authentication mode only with the SSH connection method
- Cannot retrieve data from a Greenplum connection
- Cannot access or create connected data assets or open connections in Data Refinery
- Cannot access an Excel file from a connection in cloud storage
- Cannot create an SQL Query connected data asset that has personal credentials
- Personal credentials are not supported for connected data assets in Data Refinery
-
Assets
- Hadoop integration
- Unable to install pip packages using
install_packages()on a Power machine - On certain HDP clusters, installing Execution Engine for Apache Hadoop fails
- Cannot stop jobs for a registered Hadoop target host
- Apache Livy session on RStudio and Data Refinery has known issues with curl packages
- Support for Spark versions
- Software version list disappears when defining a Cloud Pak for Data environment
- Support for specific Python versions with Execution Engine for Apache Hadoop
- Code that is added to a Python 3.7 or Python 3.8 notebook for an HDFS-connected asset fails
- Jupyter notebook with Python 3.8 is not supported by Execution Engine for Apache Hadoop
- The Livy service does not restart when a cluster is rebooted
- Pushing the Python 3.7 image to a registered Execution Engine for Apache Hadoop system hangs on Power
- Pushing an image to a Spectrum Conductor environment fails
- Notebooks using a Hadoop environment that don't have the Jupyter Enterprice Gateway service enabled can't be used
- Cannot use HadoopLibUtils to connect through Livy in RStudio
- Hadoop Refinery: Cannot output to a connected parquet asset that is a directory
- Hadoop notebook jobs don't support using environment variables containing spaces
- Unable to install pip packages using
- Notebooks
- An error 500 is received after opening an existing notebook
- Notebook fails to load in default Spark 3.0 and R 3.6 environments
- No indication that notebooks reference deprecated or unsupported Python versions in deployment spaces
- Insert to code function does not work on SSL-enabled Db2 on Cloud connections from 3.5.x imported projects or 3.5.x git repos
- Issues with Insert to code function for an Informix connection
- Cloudera Distribution for Hadoop
- Insert to code function on IBM Z can cause kernel failure
- R 3.6 notebook kernel won't start because of slow kernel connection
- Notebook loading considerations
- Kernel not found when opening a notebook imported from a Git repository
- Environment runtime can't be started because the software customization failed
- Notebook returns UTC time and not local time
- Can't promote existing notebook because no version exists
- JuypterLab hangs if notebook code cell has more than 1000 lines of code
- Errors when mixing Insert to code function options in R notebooks with Spark
- SparkSession fails to start in Spark 3.0 & R 3.6 notebooks after upgrading to version 4.0.5
- Insert to code function in notebooks on Spark 2.4 with Scala or R doesn't support Flight Service
- Code inserted by the Insert to code function for Mongo DB connections in Scala 2.12 notebooks with Spark 3.0 sometimes returns errors
- Python 3.8 notebook kernel dies while running generated code from Insert to code function
- Save data and upload file size limitation in
project-libandibm-watson-studio-libfor Python - Notebooks fail to start even after custom environment definition is fixed
- Notebook or JupyterLab runtimes might not be accessible after running for more than 12 hours
- Insert to code fails when the Flight service load is very high
- Error when trying to access data in an Oracle database
- Anaconda Repository for IBM Cloud Pak for Data
- RStudio
- RStudio Sparklyr package 1.4.0 can't connect with Spark 3.0 kernel
- Running job for R script and selected RStudio environment results in an error
- Git integration broken when RStudio crashes
- No Git tab although RStudio is launched with Git integration
- RStudio doesn't open although you were added as project collaborator
- Data in persistent storage volume not mounted when RStudio is launched
- Can't connect to Hadoop Livy in RStudio
- Runtime pod fails when runtime is started
- Data Refinery
- Cannot run a Data Refinery flow job with certain unsigned data types
- Cannot view visualization charts in Data Refinery after upgrade
- Cannot run a Data Refinery flow job with data from a Hadoop cluster
- Option to open saved visualization assets is disabled in Data Refinery
- Cannot refine data that uses commas in the source data and a target that uses a delimited file format
- Data Refinery flow job fails when writing double-byte characters to an Avro file
- Data Refinery flow job fails with a large data asset
- Certain Data Refinery flow operations might not work on large data assets
- Data Refinery flows with large data sets need updating when using certain GUI operations
- Data Refinery flow job fails for large Excel files
- Cannot run a Data Refinery flow job with data from an Amazon RDS for MySQL connection
- Duplicate connections in a space resulting from promoting a Data Refinery flow to a space
- Data Refinery flow fails with "The selected data set wasn't loaded" message
- Jobs
- Spark jobs are supported only by API
- UI displays job run started by Scheduler and not by a specific user
- Excluding days when scheduling a job causes unexpected results
- Error occurs when jobs are edited
- Can't delete notebook job stuck in starting or running state
- Notebook runs successfully in notebook editor but fails when run as job
- Can't change the schedule in existing jobs after upgrading to Cloud Pak for Data 4.0.7
- Can't run a Scala 2.12 with Spark 3.0 notebook job in a deployment space
- Federated Learning
- Watson Machine Learning
- Deployments fail for Keras models published to catalog then promoted from project to space
- Predictions API in Watson Machine Learning service can timeout too soon
- Deployment of AutoAI can fail when training input and deployment input don't match
- Deploying SPSS Modeler flows with Data Asset Import node inside supernode fails
- Deploying some SPSS model types saved as PMML fails
- Deployments can fail with framework mismatch between training and WMLA
- SPSS deployment jobs with no schema ID fail
- Deployment unusable because deployment owner left the space
- Duplicate deployment serving names need updating
- Upgrade from Cloud Pak for Data 3.5 looks like it fails
- Spark and PMML models are not supported on FIPS-enabled clusters
- Restrictions for IBM Z and IBM LinuxONE users
- Deployments might fail after restore from backup
- Job run retention not working as expected
- Cannot use deployment if owner ID is removed
- AutoAI requirement for AVX2 support
- Do not import/export models between clusters running on different architectures
- Watson Machine Learning might require a manual rescale
- Deleting model definitions used in Deep Learning experiments
- RShiny app might load an empty page if user application sources many libraries from an external network
- Python function or Python script deployments may fail if itc_utils library and flight service is used to access data
- Automatic mounting of storage volumes not supported by online and batch deployments
General issues
Common Core Services operator status is hung on infinite reconcile loop during upgrade with lingering cronjobs
The status of the Common Core Services operator is hung on an infinite reconcile loop during an upgrade with lingering cronjobs.
Workaround
-
Check if there are any cronjobs in suspended state using the following labels:
oc get cronjobs -n <cpd_instance_namespace> -l 'created-by=spawner,ccs.cpd.ibm.com/upgradedTo4x!=4.0.6'Example response:
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE 038207f9-6e91-4df4-9860-e8ef6c30aca0-1000330999 13 21 4 2 * True 0 <none> 4d23h 0954d225-2600-44de-8d60-f29ae31aa96c-1000330999 51 18 4 2 * True 0 <none> 5d2h 14ce1614-7059-40c9-9a7e-5f2fa33d65a8-1000330999 12 23 4 2 * False 0 <none> 4d21h 184b04fe-d893-481c-89c9-7ce96c7fb4d4-1000330999 0 * * * * False 0 100s 6d14h 253b988a-31a8-46d0-ba9c-239d83746dc3-1000330999 28 * * * * False 0 33m 6d14h -
Delete each of the returned suspended cronjobs and associated secrets by running the following command.
Only the jobs that are in suspended state (third column
SUSPENDisTrue) will be deleted. The rest should be untouched.oc delete cronjob -n <cpd_instance_namespace> <cronjob_name> oc delete secret -n <cpd_instance_namespace> <cronjob_name>-sctNote: If an error is received stating the secret could not be found, the error can be ignored.
-
After completing the previous step, the Common Core Services operator should stop running reconcile phases and mark the installation as complete.
Applies to: 4.0.6.
Fixed in: 4.0.7.
Internal service error occurs when you upload files, that already exist on a server, to a remote NFS volume server
If you are uploading files that already exist on a remote NFS volume server, you must update the permissions of the existing files on the remote server or create a new directory to upload all files to that directory. Otherwise, an internal service occurs.
Only users who have access to the NFS server can change the permission of the files and create new directories.
Backup and restore limitations
Offline quiesce is supported only at the OpenShift project level and it restores only to the same machine and to the same namespace.
Deployments view (Operations view/dashboard) has some limitations
The Deployments view has the following limitation:
- When long names are used, they're not fully truncated and can be obscured on the screen.
Spark environments can be selected although Spark is not installed
When you create a job for a notebook or a Data Refinery flow that you promoted to a deployment space, and select a Spark environment, you might see the following error message:
Error submitting job. Unable to fetch environment access info from envId spark30****. Error: [{"statusCode":500,"message":"{\"trace\":\"39e52bbb-2816-4bf6-9dad-5aede584ac7a\",\"errors\":[{\"code\":\"default_spark_create_failed\",\"message\":\"Could not create Hummingbird instance, because a wrong status code was returned: 404.\"}]}"}]
The reason for this error is that Spark was not installed on the cluster on which you created the deployment space. Contact your administrator to install the Spark service on that cluster.
Applies to: 4.0.0 and 4.0.1.
Fixed in: 4.0.2
UI display might not load properly
If the UI does not load properly, the Watson Studio administrator must restart redis.
Workaround
If the client behaves unexpectedly, for example, enters a redirection loop or parts of the user interface fail to load, then complete the following steps to restart redis:
- Log in to the OpenShift cluster.
-
Restart the pods for
redisusing the following command:oc delete po -n <project> | oc get po -n <project> -l component=redis -o jsonpath="{.items[*].metadata.name}"
Can’t stop an active runtime for notebooks, JupyterLab, Data Refinery and SPSS Modeler
If you try stopping an active runtime for notebooks, JupyterLab, Data Refinery or SPSS Modeler from the environments page in a project, the runtime is removed from the list. However, when you reload the page, the runtime appears again.
The reason is that the runtime couldn't be deleted properly. To delete the runtime entirely, you must have Cloud Pak for Data administrator rights to view the logs and OpenShift administrator rights to delete the runtime.
Workaround
To stop the runtime and remove it from the list of active runtimes:
-
As Cloud Pak for Data administrator, check the log file for information you need before you can delete the runtime.
- Navigate to Administration > Monitoring.
- Select Pods.
- In Find Pods, enter
spawner. This should return one pod namedspawner-api-<id>. -
Click (
) on the right of the entry and select View Log. You might have to download the log file
to see its full contents.The log file shows entries such as:
WARN : Cannot find service for type=<type>,dsxProjectId=<project>,dsxUserId=<user>,runtimeEnvId=<id>For example:
Example: WARN : Cannot find service for type=jupyter-py37,dsxProjectId=1d35b7f8-cef1-4432-92ae-aa08afe4c8c6,dsxUserId=1000330999,runtimeEnvId=jupconda37oce-1d35b7f8-cef1-4432-92ae-aa08afe4c8c6
- Log in to the Openshift cluster that Cloud Pak for Data is installed on as an OpenShift administrator.
- Run the following command using the values from the log entry:
For example:oc get deployment -l type=<type>,dsxProjectId=<project>,dsxUserId=<user>,runtimeEnvId=<id>
This command returns a deployment with a particularoc get deployment -l type=jupyter-py37,dsxProjectId=1d35b7f8-cef1-4432-92ae-aa08afe4c8c6,dsxUserId=1000330999,runtimeEnvId=jupconda37oce-1d35b7f8-cef1-4432-92ae-aa08afe4c8c6name. - Run the following command:
oc delete deployment <name> - Then run:
For example:oc get secret -l type=<type>,dsxProjectId=<project>,dsxUserId=<user>,runtimeEnvId=<id>
This command returns a secret with a particularoc get secret -l type=jupyter-py37,dsxProjectId=1d35b7f8-cef1-4432-92ae-aa08afe4c8c6,dsxUserId=1000330999,runtimeEnvId=jupconda37oce-1d35b7f8-cef1-4432-92ae-aa08afe4c8c6name. - Finally run the following command with the secret name:
oc delete secret <name>
Fixed in: 4.0.3
Projects
Option to log all project activities is enabled but project logs don't contain activities and return an empty list
Log all project activities is enabled, but the project logs don't contain activities and return an empty list.
Workaround: If the project logs are empty after 30 minutes or more, restart the rabbitmq pod by completing the following steps:
-
Search for all the sts (stateful sets) of rabbitmq by running
oc get pods | grep rabbitmq-ha.This will return 3 pods:
[root@api.xen-ss-ocs-408-409.cp.fyre.ibm.com ~]# oc get pods | grep rabbitmq-ha rabbitmq-ha-0 1/1 Running 0 4d6h rabbitmq-ha-1 1/1 Running 0 4d6h rabbitmq-ha-2 1/1 Running 0 4d7h -
Restart each pod by running
oc delete pod rabbitmq-ha-0 rabbitmq-ha-1 rabbitmq-ha-2.
Applies to: 4.0.7 and later.
Job scheduling does not work consistently with default git projects
Creating a new default git project or changing branched in your local clone in an existing git-based project will corrupt your existing job schedules for that project.
Import of a project larger than 1 GB in Watson Studio fails
If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.
Export of a large project in Watson Studio fails with a time-out
If you are trying to export a project with a large number of assets (for example, more than 7000), the export process can time-out and fail. In that case, although you could export assets in subsets, the recommended solution is to export using the APIs available from the CPDCTL command line interface tool.
Git operations fail with invalid token error
If you are performing a git action and the associated git token of the project has become invalid, the operation will fail. You will be unable to use a valid token to complete the action. To resolve the issue, use this command from the project terminal to add a valid token.
git remote set-url origin https://[USERNAME]:[NEW TOKEN]@github.com/[USERNAME]/[REPO].git
Applies to: 4.0.2 and 4.0.3
Cannot switch checkout branch in project with default Git integration after changing project assets or files
If the local Git repository has untracked changes, sometimes checkout would fail with unexpected response code: 500. This is caused by files in the new branch that would overwrite your local changes.
Workaround:
Before checking out a different branch, first commit all your changes. Alternatively, use the project terminal to stash or revert any untracked changes.
Fixed in: 4.0.6
Cannot stop job runs in a project associated with a Git repository
Files containing information about job runs are not, by default, included in the files that are pushed to the Git repository associated with a project. They are excluded in the .gitignore file. However, if the .gitignore is updated to include such files and a user commits and pushes those files while a job run is active, then users working with that Git repository (in the same project or in a separate project) will see those job runs as active after pulling the
changes. They will get an error if they try to stop any of these job runs.
Workaround:
To remove job runs that are marked as active but cannot be stopped, ask the user who pushed the files for those active job runs to push the files again after the job runs have completed.
Cannot work with all assets pulled from a Git repository
If you work in a project with default Git integration, the Git repository might contain assets added from another project that uses the same Git repository. If this project is on a different Cloud Pak for Data cluster where other services are installed, you will not be able to work with any assets if the necessary service is not available on your cluster.
For example, if the Git repository contains SPSS Modeler flows from another project on a different cluster and SPSS Modeler is not installed on your cluster, when you try to open a SPSS Modeler flow, the Web page will be blank.
Workaround:
To work with all assets pulled from a Git repository used in projects on another cluster, you need to ask a system administrator to install the missing services on your cluster.
Can't include a Cognos dashboard when exporting a project to desktop
Currently, you cannot select a Cognos dashboard when you export a project to desktop.
Workaround:
Although you cannot add a dashboard to your project export, you can move a dashboard from one project to the another.
To move a dashboard to another project:
- Download the dashboard JSON file from the original project.

- Export the original project to desktop by clicking
from the project toolbar. - Create a new project by importing the project ZIP with the required data sources.
- Create a new dashboard by clicking the From file tab and adding the JSON file you downloaded from the original project.

- A dialog box will pop up asking you if you want to re-link each of your data sources. Click the re-link button and select the asset in the new project that corresponds to the data source.
Can't create a project although the project name is unique
If you want to create a project and are getting an error message stating that the project name already exists, although the name is unique, the reason might be that you have been assigned a role with either the permission to "manage projects" or "monitor project workloads". There is currently a defect preventing these permissions from creating projects.
Workaround
Ask a Cloud Pak for Data administrator to remove the permissions "manage projects" or "monitor project workloads" that were assigned to your role to enable you to create a project.
Should these permissions be needed, they may be assigned to a dedicated monitoring role that does not need to create projects.
Fixed in: 4.0.5
Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
You shouldn't use the Git repository from a project with deprecated Git integration in a project with default Git integration as this can result in an error. For example, in Bitbucket, you will see an error stating that the repository contains content from a deprecated Git project although the selected branch contains default Git project content.
In a project with default Git integration, you can either use a new clean Git repository or link to one that was used in a project with default Git integration.
Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
If your project is associated with a Git repository that was used in a project in another cluster and contains connections that require a JDBC driver, the connections will not work in your project. If you upload the required JDBC JAR file, you will see an error stating that the JDBC driver could not be initialized.
This error is caused by the JDBC JAR file that is added to the connection as a presigned URI. This URI is not valid in a project in another cluster. The JAR file can no longer be located even if it exists in the cluster, and the connection will not work.
Workaround
To use any of these connections, you need to create new connections in the project. The following connections require a JDBC driver and are affected by this error situation:
- Db2 for i
- Db2 for z/OS
- Generic JDBC
- Hive via Execution Engine for Apache Hadoop
- Impala via Execution Engine for Apache Hadoop
- SAP HANA
- Exasol
Importing a project that contains a Data Refinery job that uses Spark 2.4 fails
If you try to import a project that contains a Data Refinery job that runs in a Spark 2.4 environment, the import will fail. The reason for this is that Spark 2.4 was removed in Cloud Pak for Data 4.0.7.
Workaround
Create a new Data Refinery job and select a Spark 3.0 environment.
Applies to: 4.0.7
Fixed in: 4.0.8
Connections
Cannot create a connection to Oracle when credentials must be stored in a secret in a vault
If you try to create a connection to Oracle and the administrator has configured Cloud Pak for Data to enforce using secrets from a vault, the connection will fail.
Workaround: Disable the vault enforcement temporarily. Users will be able to create a connection to Oracle that uses a vault for credentials. However, users will also be able temporarily to create connections (including connections other than Oracle) without using credentials that are stored in a secret in a vault. After the connection to Oracle is created, you can enforce using secrets from a vault again.
Applies to: 4.0.8
Fixed in: 4.0.9
Authentication fields unavailable in the Cloud Object Storage (COS) connection
When you create or edit a Cloud Object Storage connection, the authentication fields are not available in the user interface if you change the authentication method.
Workaround: If you change the authentication method, clear any fields where you have already entered values.
Applies to: 4.0.7 and later
FTP connection: Use the SSH "authentication mode" only with the SSH "connection method"
If you create an FTP connection with the Anonymous, Basic, or SSL connection mode and you specify the SSH authentication mode, the Test connection will fail.
Workaround: Specify the SSH authentication mode only when you specify the SSH connection mode.
Applies to: 4.0.7
Fixed in: 4.0.8
Cannot retrieve data from a Greenplum connection
After you create a connection to Greenplum, you might not be able to select its tables or assets.
Workaround: In the asset browser, click the Refresh button (
) to access the table or asset. You might need to refresh several times.
Applies to: 4.0.6
Fixed in: 4.0.9
Cannot access or create connected data assets or open connections in Data Refinery
The following scenario can prevent you from accessing or creating connected data assets in a project and from opening connections in Data Refinery:
- Create a connection in a catalog. Add that same connection to a project as a platform connection. In the project, create a connected data asset that references the connection. Delete the connection from the catalog.
As a workaround for this scenario, delete any orphaned referenced connections that are still in the project.
Applies to: 3.5.10
Fixed in: 4.0.5
Cannot access an Excel file from a connection in cloud storage
This problem can occur when you create a connected data asset for an Excel file in a space, catalog, or project. The data source can be any cloud storage connection. For example, IBM Cloud Object Storage, Amazon S3, or Google Cloud Storage.
Workaround: When you create a connected data asset, select which spreadsheet to add.
Applies to: 4.0.4
Fixed in: 4.0.5
Cannot create an SQL Query connected data asset that has personal credentials
If you want to create a connected data asset for an SQL Query connection that has personal credentials, the Select connection source page might stop responding when you click the SQL Query connection.
Workaround: Edit the connection from Edit connection page.
- Go to the project's Assets page and click the link for the SQL Query connection to open the Edit connection page.
- Enter the credentials and click Save.
- Return to Add to project > Connected data > Select source, and select data from the SQL Query connection.
Applies to: 4.0.3
Fixed in: 4.0.4
Personal credentials are not supported for connected data assets in Data Refinery
If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.
Workaround:
- Go to the project page, and click the link for the connected data asset to open the preview.
- Enter credentials.
- Open Data Refinery and use the authenticated connected data asset for a source or target.
Applies to: 3.5.0 and later
Assets
Limitations for previews of assets
You can't see previews of these types of assets:
- Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
- Connected data assets for image files in projects.
- Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
- Connected data assets for PDF files in projects.
Can't load files to projects that have #, %, or ? characters in the name
You can't create a data asset in a project by loading a file that contains a hash character (#), percent sign (%), or a question mark (?) in the file name.
Applies to: 3.5.0
Fixed in: 4.0.6
Can't load CSV files to projects that are larger that 20 GB
You can't load a CSV file to an analytics project in Cloud Pak for Data that is larger than 20 GB.
Hadoop integration
Unable to install pip packages using install_packages() on a Power machine
If you are using a Power cluster, you might see the following error when attempting to install pip packages with hi_core_utils.install_packages():
ModuleNotFoundError: No module named '_sysconfigdata_ppc64le_conda_cos6_linux_gnu'
To work around this known limitation of hi_core_utils.install_packages() on Power, export the following environment variable before calling install_packages():
# For Power machines, export this env var to work around a known issue in
# hi_core_utils.install_packages()...
os.environ['_CONDA_PYTHON_SYSCONFIGDATA_NAME'] = "_sysconfigdata_powerpc64le_conda_cos7_linux_gnu"
On certain HDP Clusters, the Execution Engine for Apache Hadoop service installation fails
The installation fails during the Knox Gateway Configuration step. The issue is because the Knox gateway fails to start and happens on some nodes. See more information about the Knox gateway issue here.
The following errors occur:
Failed to configure gateway keystore Exception in thread "main" Caused by: java.lang.NoSuchFieldError: DEFAULT_XML_TYPE_ATTRIBUTEException in thread "main" java.lang.reflect.InvocationTargetException Caused by: java.lang.NoSuchMethodError: org.eclipse.persistence.internal.oxm.mappings.Field.setNestedArray(Z)VException in thread "main" java.lang.reflect.InvocationTargetException
The workaround is to remove the org.eclipse.persistence.core-2.7.2.jar file from the installation directory by using the following command:
mv /opt/ibm/dsxhi/gateway/dep/org.eclipse.persistence.core-2.7.2.jar /tmp/
Cannot stop jobs for a registered Hadoop target host
When a registered Hadoop cluster is selected as the Target Host for a job run, the job cannot be stopped. As a workaround, view the Watson Studio Local job logs to find the Yarn applicationId; then, use the ID to manually stop the Hadoop job on the remote system. When the remote job is stopped, the Watson Studio Local job will stop on its own with a "Failed" status. Similarly, jobs that are started for registered Hadoop image push operations cannot be stopped either.
Apache Livy session on RStudio and Data Refinery has known issues with curl packages
The workaround the curl package issue is to downgrade the curl package using the following command:
install.packages("https://cran.r-project.org/src/contrib/Archive/curl/curl_3.3.tar.gz", repos=NULL)
Related to: 4.0.7 and later
Code that is added to a Python 3.7 or Python 3.8 notebook for an HDFS-connected asset fails
The HDFS-connected asset fails because the Set Home as Root setting is selected for the HDFS connection. To work around this issue, create the connected asset using the HDFS connection without selecting Set Home as Root.
Support for Spark versions
Apache Spark 3.1 for Power is not supported.
Applies to: 4.0.6
Software version list disappears when defining a Cloud Pak for Data environment
When you're defining a Cloud Pak for Data environment, the Software version list disappears in the following situations as you're choosing the system configuration for a Hadoop cluster edge node:
- The Spark service isn't running on the Hadoop cluster.
- Jupyter Enterprise Gateway (JEG) is not available.
Applies to: 4.0.6
Support for specific Python versions with Execution Engine for Apache Hadoop
- Python 3.7 is not available as an option on Cloud Pak for Data and it cannot be pushed to a Hadoop cluster.
-
Python 3.8 supports only Spark 3.0 or higher. See more information about Spark and Python 3.8 that is described in Apache's JIRA tracker.
Note: Execution Engine for Apache Hadoop only supports Spark 3.0.x. You can manually install Spark 3.0.x on your CDH cluster by following the Cloudera procedure for Installing CDS 3.1 Powered by Apache Spark.
- Python 3.9 is not supported.
Applies to: 4.0.6
Jupyter notebook with Python 3.8 is not supported by Execution Engine for Apache Hadoop
The following issues are the result of Python 3.8 not being supported by Execution Engine for Apache Hadoop:
- When a Jupyter Enterprise Gateway Python 3.8 notebook is running on Spark 2.4, the notebook is unable to be launched. The error occurs because Python 3.8 is not supported on Spark 2.4.
- A Livy session fails to be established with a Jupyter notebook with Python 3.8 runtime pushed image when the registered Hadoop cluster has Spark 2.4.
See more information about Spark and Python 3.8 that is described in Apache's JIRA tracker.
Applies to: 4.0.1
Fixed in: 4.0.6
The Livy service does not restart when a cluster is rebooted
The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.
Applies to: 3.5.0 and later
Pushing the Python 3.7 image to a registered Execution Engine for Apache Hadoop system hangs on Power
When the Python 3.7 Jupyter image is pushed to an Execution Engine for Apache Hadoop registered system using platform configurations, installing dist-keras package into the image (using pip) hangs. The job runs for hours but never completes,
and the console output for the job ends with:
Attempting to install HI addon libs to active environment ...
==> Target env: /opt/conda/envs/Python-3.7-main ...
====> Installing conda packages ...
====> Installing pip packages ...
This hang is caused by a pip regression in dependency resolution, as described in the New resolver downloads hundreds of different package versions, without giving reason issue.
To work around this problem, do the following steps:
-
Stop the image push job that is hung. To find the job that is hung, use the following command:
oc get job -l headless-type=img-saver -
Delete the job:
oc delete job <jobid> -
Edit the Execution Engine for Apache Hadoop image push script located at
/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh. -
Add the flag
--use-deprecated=legacy-resolverto the pip install command, as follows:``` @@ -391,7 +391,7 @@
conda-env=2.6.0 opt_einsum=3.1.0 > /tmp/hiaddons.out 2>&1 ; then echo " OK" echo -n " ====> Installing pip packages ..." - if pip install dist-keras==0.2.1 >> /tmp/hiaddons.out 2>&1 ; then
-
if pip install --use-deprecated=legacy-resolver dist-keras==0.2.1 >> /tmp/hiaddons.out 2>&1 ; then
echo " OK" hiaddonsok=true ```
- Re-start your image push operation again by clicking Replace image in the Execution Engine for Apache Hadoop registration page.
To edit the Execution Engine for Apache Hadoop image push script:
-
Access the pod running
utils-api. This has the/cc-home/.scriptsdirectory mounted.oc get pod | grep utils-api -
Extract the existing
/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.shfile:oc cp <utils_pod_id>:/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh dsx-hi-save-image.sh -
Modify the file per the workaround steps:
vi dsx-hi-save-image.sh -
Copy the new file into the pod, in the
/tmp dir:oc cp dsx-hi-save-image.sh <utils_pod_id>:/tmp -
Exec into the pod, and do:
oc rsh <utils_pod_id> -
Make a backup:
cp -up /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.bak -
Dump the content in
/tmp/dsx-hi-save-image.shto/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh:cat /tmp/dsx-hi-save-image.sh > /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh` -
Do a diff to make sure you have the changes:
diff /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.bak -
Exit the utils-api pod:
exit
Pushing an image to a Spectrum Conductor environment fails
If you push an image to a Spectrum Conductor environment, the image push fails.
Applies to: 4.0.0 and later
Fixed in: 4.0.2
Notebooks using a Hadoop environment that don't have the Jupyter Enterprise Gateway service enabled can't be used
When a notebook is opened using a Hadoop environment that does not have the Jupyter Enterprise Gateway (JEG) service enabled, the kernel remains disconnected. This notebook does not display any error message, and it cannot be used.
To address this issue, confirm that the Hadoop environment that is defined to use with a JEG service does have JEG enabled.
Cannot use HadoopLibUtils to connect through Livy in RStudio
RStudio includes a new version of sparklyr. sparklyr doesn't work with the HadoopLibUtils library that's provided to connect to remote Hadoop clusters using Livy. The following error occurs: Error: Livy connections now require the Spark version to be specified.
As a result, you cannot create tables using Hadoop.
If you're not rolling back the sparklyr 1.6.3 changes, you can install sparklyr 1.6.2. Restart an R session and use sparklyr 1.6.2 for Hadoop.
Install sparklyr 1.6.2, by using the following steps:
require(remotes)
library(remotes)
install_version(“sparklyr”, version = “1.6.2”, repos = http://cran.us.r-project.org)
packageVersion("sparklyr")
After the package is installed, load sparklyr 1.6.2 when you are using HadoopLibUtilsR.
- Applies to: 4.0.0
Hadoop Refinery: Cannot output to a connected parquet asset that is a directory
This is a specific problem for when running a Refinery data shaping, using the Execution Engine for Apache Hadoop connector for HDFS. When you select an output target directory that contains a set of parquet files, the Data Refinery layer cannot determine the file format for this directory.
Instead, it leaves the File Format field blank. This causes a failure when writing the output because File Format ‘ ‘ is not supported.
To work around this issue, select an output target that is empty or not a directory.
Hadoop notebook jobs don't support using environment variables containing spaces
When you're setting up a notebook job using a remote Hadoop environment, there is an option to specify environment variables that can be accessed by the notebook. There is an issue, where if the environment variable is defined to contains spaces, the JEG job will fail. The remote Hadoop jeg.log displays:
[E 2020-05-13 10:28:54.542 EnterpriseGatewayApp] Error occurred during launch of KernelID
To work around this issue, it is recommended to not define an environment variable that contains spaces as a value. You can also, if necessary, encode the value and decode when the notebook is using the content of the environment variable.
Notebooks
An error 500 is received after opening an existing notebook
An error 500 is received after opening an existing notebook with a large amount of data assets. The notebook takes a long time to load and the notebook-UI pod considers it as a failure.
Workaround: Use the notebook URL directly instead of clicking on the project, then the notebook.
Notebook fails to load in default Spark 3.0 and R 3.6 environments
Your notebook fails to load in default Spark 3.0 and R 3.6 environments and you receive the Failed to load notebook error message.
To resolve this issue:
- Go to Active runtimes.
- Delete the runtime that is in Starting status.
- Restart the runtime.
Applies to: 4.0.8.
No indication that notebooks reference deprecated or unsupported Python versions in deployment spaces
The user is not notified that their deployment space contains notebooks that are referencing deprecated or unsupported Python versions.
Applies to: 4.0.7
Insert to code function can cause kernel to fail on IBM Z
For JupyterLab notebooks running on IBM Z and LinuxONE platforms, using the Insert to Code feature or utility in a notebook to load data can result in a kernel failure.
Important: These changes only apply to newly created Runtimes. In case there are still active runtimes when you apply the change, make sure to stop and start them again.
To resolve this issue:
-
Log in to Cloud Pak for Data as administrator and paste the following URL into the browser. Replace
{CloudPakforData_URL>with the URL of your Cloud Pak for Data system.<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm%2Fjupyter-lab-py38-server.json -
Open the file in your favorite editor. The last 2 lines of the file will look like this:
} }Append this code Between the second last and the last curly brace (}):
, "env": [ { "name": "APP_ENV_ENABLE_MEM_LIMIT_KERNEL_MANAGER", "value": "false" } ]The end of the file should look like this:
}, "env": [ { "name": "APP_ENV_ENABLE_MEM_LIMIT_KERNEL_MANAGER", "value": "false" } ] } -
Save the file.
-
Get the required platform access token. This command returns the bearer token in the
accessTokenfield:curl <CloudPakforData_URL>/v1/preauth/validateAuth -u <username>:<password> -
Upload the JSON file you edited in previous steps:
curl -k -X PUT \ '<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm' \ -H 'Authorization: Bearer <platform-access-token>' \ -H 'content-type: multipart/form-data' \ -F upFile=@/path_to_runtime_def/jupyter-lab-py38-server.jsonIf the changed JSON file uploads successfully, you will see the following response:
{ "_messageCode_": "Success", "message": "Successfully uploaded file and created the necessary directory structure" }Note: If your cluster is using a self-signed certificate which you did not add to your client, use the
-koption to avoid certificate issues.
Applies to: 4.0.2
Fixed in: 4.0.3
R 3.6 notebook kernel won't start because of slow kernel connection
For R Jupyter notebooks running on specific IBM Power platforms, the R kernel will not start and a slow kernel connection message is displayed.
To resolve this issue:
-
Log in to Cloud Pak for Data as administrator and paste the following URL into the browser. Replace
{CloudPakforData_URL>with the URL of your Cloud Pak for Data system.<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm%2Fjupyter-r36-server.json -
Open the file in your favorite editor. The last 2 lines of the file will look like this:
} }Append this code Between the second last and the last curly brace (}):
, "env": [ { "name": "APP_ENV_ENABLE_MEM_LIMIT_KERNEL_MANAGER", "value": "false" } ]The end of the file should look like this:
}, "env": [ { "name": "APP_ENV_ENABLE_MEM_LIMIT_KERNEL_MANAGER", "value": "false" } ] } -
Save the file.
-
Get the required platform access token. This command returns the bearer token in the
accessTokenfield:curl <CloudPakforData_URL>/v1/preauth/validateAuth -u <username>:<password> -
Upload the JSON file you edited in previous steps:
curl -k -X PUT \ '<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm' \ -H 'Authorization: Bearer <platform-access-token>' \ -H 'content-type: multipart/form-data' \ -F upFile=@/path_to_runtime_def/jupyter-r36-server.jsonIf the changed JSON file uploads successfully, you will see the following response:
{ "_messageCode_": "Success", "message": "Successfully uploaded file and created the necessary directory structure" }Note: If your cluster is using a self-signed certificate which you did not add to your client, use the
-koption to avoid certificate issues.
Insert to code function does not work on SSL-enabled Db2 on Cloud connections from 3.5.x imported projects or 3.5.x git repos
If you import a project that was created in Cloud Pak for Data 3.5.x and the project contains a Db2 on Cloud connection with SSL enabled, the Notebooks "Insert to code" feature will not work. The problem also occurs if you synchronize with a Git project from Cloud Pak for Data version 3.5.x.
To fix this problem, edit the connection: Click the connection on the project Data assets page. Clear and re-select the Port is SSL-enabled checkbox in the Edit connection page.
Applies to: 4.0.0 and later
Issues with Insert to code function for an Informix connection
If you use the Insert to code function for a connection to an Informix database, and if Informix is configured for case-sensitive identifiers, the inserted code throws a runtime error if you try to query a table with an upper-case name.
In the cell output, there's an error message similar to the following:
DatabaseError: Execution failed on sql: SELECT * FROM informix.FVT_EMPLOYEE
java.sql.SQLException: The specified table (informix.fvt_employee) is not in the database.
unable to rollback
Workaround
In your notebook, edit the inserted code.
For example:
- For Python, add the connection property
'DELIMIDENT=Y'to the connection and surround the upper-case identifier with double-quotes (""). Replace the following lines:
With:informix_connection = jaydebeapi.connect('com.informix.jdbc.IfxDriver', '{}://{}:{}/{}:user={};password={};'.format('jdbc:informix-sqli', informix_credentials['host'], informix_credentials['port'], informix_credentials['database'], informix_credentials['username'], informix_credentials['password']), [informix_credentials['username'], _credentials['password']]) query = 'SELECT * FROM informix.FVT_EMPLOYEE'informix_connection = jaydebeapi.connect('com.informix.jdbc.IfxDriver', '{}://{}:{}/{}:user={};password={};'.format('jdbc:informix-sqli', informix_credentials['host'], informix_credentials['port'], informix_credentials['database'], informix_credentials['username'], informix_credentials['password']), { 'user': informix_credentials['username'], 'password': informix_credentials['password'], 'DELIMIDENT': 'Y' }) query = 'SELECT * FROM informix."FVT_EMPLOYEE"' - For R, add the connection property
'DELIMIDENT=Y'to the connection and surround all upper case names with double-quotes (""). Replace the following lines:
With:paste("jdbc:informix-sqli://", Informix_credentials[][["host"]], ":", Informix_credentials[][["port"]], "/", Informix_credentials[][["database"]], ":user=", Informix_credentials[][["username"]], ";password=", Informix_credentials[][["password"]], ";", sep=""), ... query <- "SELECT * FROM myschema.MY_TABLE"paste("jdbc:informix-sqli://", Informix_credentials[][["host"]], ":", Informix_credentials[][["port"]], "/", Informix_credentials[][["database"]], ":user=", Informix_credentials[][["username"]], ";password=", Informix_credentials[][["password"]],";DELIMIDENT=Y", ";", sep=""), ... query <- "SELECT * FROM myschema.\"MY_TABLE\"" -
For Scala, add the connection property
'DELIMIDENT=Y'to the connection and in the query surround all upper case names with double-quotes (""). Replace the following lines:lazy val Informix_properties = Map("url" -> "jdbc:informix-sqli://myserver.mycompany.com:12345/mydb", "user" -> Informix_credentials ("username").asInstanceOf[String], "password" -> Informix_credentials ("password").asInstanceOf[String]) val data_df_0 = spark.read .format("jdbc") .options(Informix_properties) .option("driver" , "com.informix.jdbc.IfxDriver") .option("dbtable", "myschema.MY_TABLE") .load() data_df_0.show(5)With:
lazy val Informix_properties = Map("url" -> "jdbc:informix-sqli://myserver.mycompany.com:12345/mydb", "user" -> Informix_credentials ("username").asInstanceOf[String], "password" -> Informix_credentials ("password").asInstanceOf[String], "DELIMIDENT" -> "Y") val data_df_0 = spark.read .format("jdbc") .options(Informix_properties) .option("driver" , "com.informix.jdbc.IfxDriver") .option("dbtable", "myschema.\"MY_TABLE\"") .load() data_df_0.show(5)
Error in notebooks when rendering data from Cloudera Distribution for Hadoop
When running Jupyter notebooks against Cloudera Distribution for Hadoop 5.16 or 6.0.1 with Spark 2.2, the first dataframe operation for rendering cell output from the Spark driver results in a JSON encoding error.
Workaround:
When running Jupyter notebooks against CDH 5.16 or 6.0.1 with Spark 2.2, the first dataframe operation for rendering cell output from the Spark driver results in a JSON encoding error. To workaround this error, do one of the following procedures:
-
For interactive notebook sessions
Manually re-run the first cell that renders data.
-
For non-interactive notebook sessions
Add the following non-intrusive code after establishing the Spark connection to trigger the first failure:
%%spark
import sys, warnings
def python_major_version ():
return(sys.version_info[0])
with warnings.catch_warnings(record=True):
print(sc.parallelize([1]).map(lambda x: python_major_version()).collect())
Notebook loading considerations
The time that it takes to create a new notebook or to open an existing one for editing purposes might vary. If no runtime container is available, a container needs to be created and only after it is available, the Jupyter notebook user interface can be loaded. The time it takes to create a container depends on the cluster load and size. Once a runtime container exists, subsequent calls to open notebooks will be significantly faster.
Kernel not found when opening a notebook imported from a Git repository
If you import a project from a Git repository that contains notebooks that were created in JupyterLab, and try opening the notebooks from the project Assets page, you will see a message stating that the required notebook kernel can't be found.
The reason is that you are trying to open the notebook in an environment that doesn't support the kernel required by the notebook, for example in an environment without Spark for a notebook that uses Spark APIs. The information about the environment dependency of a notebook that was created in JupyterLab and exported in a project is currently not available when this project is imported again from Git.
Workaround:
You need to associate the notebook with the correct environment definition. You can do this:
-
From the notebook opened in edit mode by:
- Clicking the Notebook Info icon (
) from the notebook toolbar and then clicking Environment. - Selecting the correct environment definition for your notebook from the list under Environments.
- Clicking the Notebook Info icon (
-
Before you open the notebook, from the project Assets page by:
- Selecting the notebook and unlocking it if it is locked. You can only change the environment of a notebook if the notebook is unlocked.
- Clicking Actions > Change Environment and selecting the correct environment definition for your notebook.
Environment runtime can't be started because the software customization failed
If your Jupyter notebook runtime can't be started and a 47 killed error is logged, the software customization process could not be completed because of lack of memory.
You can customize the software configuration of a Jupyter notebook environment by adding conda and pip packages. However, be aware that conda does dependency checking when installing packages which can be memory intensive if you add many packages to a customization.
To complete a customization successfully, you must make sure that you select an environment with sufficient RAM to enable dependency checking at the time the runtime is started.
If you only want packages from one conda channel, you can prevent unnecessary dependency checking by excluding the default channels. To do this, remove defaults from the channels list in the customization template and add nodefaults.
Notebook returns UTC time and not local time
The Python function datetime returns the date and time for the UTC time zone and not the local time zone where a user is located. The reason is that the default environment runtimes use the time zone where they were created, which is
UTC time.
Workaround:
If you want to use your local time, you need to download and modify the runtime configuration file to use your time zone. You don't need to make any changes to the runtime image. After you upload the configuration file again, the runtime will use the time you set in the configuration file.
Required role: You must be a Cloud Pak for Data cluster administrator to change the configuration file of a runtime.
To change the time zone:
- Download the configuration file of the runtime you are using. Follow the steps in Downloading the runtime configuration.
- Update the runtime definition JSON file and extend the environment variable section to include your time zone, for example, for Europe/Viena use:
{ "name": "TZ", "value": "Europe/Vienna" } -
Upload the changed JSON file to the Cloud Pak for Data cluster. You can use the Cloud Pak for Data API.
- Get the required platform access token. The command returns the bearer token in the accessToken field:
curl <CloudPakforData_URL>/v1/preauth/validateAuth -u <username>:<password> -
Upload the JSON file:
curl -X PUT \ 'https://<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm' \ -H 'Authorization: Bearer <platform-access-token>' \ -H 'content-type: multipart/form-data' \ -F upFile=@/path/to/runtime/def/<custom-def-name>-server.jsonImportant: Change the name of the modified JSON file. The file name must end with
server.jsonand the same file name must be used across all clusters to enable exporting and importing analytics projects across cluster boundaries.If the changed JSON file was uploaded successfully, you will see the following response:
{ "_messageCode_": "Success", "message": "Successfully uploaded file and created the necessary directory structure" }
- Get the required platform access token. The command returns the bearer token in the accessToken field:
- Restart the notebook runtime.
Can't promote existing notebook because no version exists
If you are working with a notebook that you created prior to IBM Cloud Pak for Data 4.0.0, and you want to promote this notebook to a deployment space, you will get an error message stating that no version exists for this notebook.
If this notebook also has a job definition, in addition to saving a new version, you need to edit the job settings.
To enable promoting existing notebooks and edit job settings, see Promoting notebooks.
Applies to: 4.0.0 and later when upgrading from 3.5
JuypterLab hangs if notebook code cell has more than 1000 lines of code
When you insert more than 1000 lines of code into a notebook code cell in JuypterLab, you will notice that JuypteLab hangs. You are asked to either close the browser tab or to continue waiting.
Workaround
Until an open source fix is available for this error, make sure that your notebook code cells in JupyteLab have less than 1000 lines of code. If you need to paste many lines of data to a code cell, rather store this data in the file and load the data to the notebook or script.
Errors when mixing Insert to code function options in R notebooks with Spark
The Insert to code function leverages Apache Arrow and Apache Flight in some function options for faster data access. This means that you can select Insert to code function options that use the old method to access data as well as options that use Arrow and Flight. To avoid errors when running R notebooks with Spark, you must be careful to not mix options that use different data access methods in the same notebook.
The R Insert-to-code options include:
R DataFrame: this option leverages Apache Arrow and Apache Flight and is available for the majority of the supported connection types. This is the recommended option for accessing data.R DataFrame (Deprecated): this option uses traditional technologies, like RJDBC and is available for a subset of the available connections only.SparkSession DataFrame: this option is for notebook code that uses Spark methods for loading data. This option is available for a subset of the available connections only.Credentials: this option returns connection metadata, such as the hostname of a database connection.
Wherever possible, do not mix the options R DataFrame and R DataFrame (deprecated), SparkSession DataFrame, or Credentials within the same notebook, even for different connection types.
For example, if you run code that was inserted by options other than R DataFrame first, followed by code from the R DataFrame option, you will see the following error:
error in py_module_import(module, convert = convert): ImportError: /opt/ibm/conda/miniconda/lib/python/site-packages/pyarrow/../../.././libbrotlienc.so.1: undefined symbol: BrotliDefaultAllocFunc
Workaround
If you can't avoid mixing Insert to code options that use different access methods in the same notebook, add the following code at the top of your notebook, and run this cell first:
library("reticulate")
pa <- import("pyarrow")
library(ibmWatsonStudioLib)
wslib <- access_project_or_space()
SparkSession fails to start in Spark 3.0 & R 3.6 notebooks after upgrading to version 4.0.5
When you upgrade from Cloud Pak for Data 4.0.3 to 4.0.5 or from Cloud Pak for Data 3.5 Refresh 9 (October 2021) to 4.0.5 and you open a notebook in a Spark 3.0 & R 3.6 environment, an error is displayed stating that a SparkSession cannot be started.
The reason for this occurring is that JDBC drivers are not compatible with the latest log4j versions.
Workaround
While upgrading, also update the custom JDBC drivers that you have deployed and make sure that you use the latest versions of these drivers that are compatible with log4j 2.17.0.
Insert to code function in notebooks on Spark 2.4 with Scala or R doesn't support Flight Service
The insert to code function in notebooks that run in environments that use Spark 2.4 with Scala or R doesn't support the Flight Service based on Apache Arrow Flight to communicate with a database connections or connected data assets
when loading data into a data structure.
Workaround
You can continue using the old insert to code function. Although the old insert to code function doesn't support as many data sources, it does include the capability of adding the data source credentials. With the credentials,
you can write your own code to access the asset and load data into data structures of your choice in your notebooks.
Note that you can't use the code that was generated by the insert to code function for Scala or R for a notebook that runs in Spark 3.0 in a notebook that runs in Spark 2.4.
Code inserted by the Insert to code function for Mongo DB connections in Scala 2.12 notebooks with Spark 3.0 sometimes returns errors
Sometimes the insert to code function for Mongo DB connections in notebooks that run in Spark 3.0 and Scala 2.12 returns an error when the inserted code is run.
Workaround
If the inserted code returns an error when accessing data from a Mongo DB connection, you need to write your own code to access and load data into the data structures of your choice in your notebook.
Fixed in: 4.0.7
Python 3.8 notebook kernel dies while running generated code from Insert to code function
If you run the code generated by the Insert to code function to load data from a database connection to your notebook and you encounter an error stating that the kernel appears to have died, the reason might be that the data set that you are trying to load is too large and the kernel has run out of memory.
Workaround
If you know that your data set is large (possibly larger than the memory allocated to the kernel), you should edit the generated code before you run it in your notebook.
If the generated code contains a select_statement interaction property like in the following Db2 example:
Db2_data_request = {
'connection_name': """Db2""",
'interaction_properties': {
'select_statement': 'SELECT * FROM "USER"."TABLE"'
}
}
Modify the Db2 select_statement to only retrieve the first 5000 rows as follows:
Db2_data_request = {
'connection_name': """Db2""",
'interaction_properties': {
'select_statement': 'SELECT * FROM "USER"."TABLE" FETCH FIRST 5000 ROWS ONLY'
}
}
For the select_statements in all other database connections, use the corresponding SQL expressions.
If the generated code contains schema_name and table_name like in the following example:
PostgreSQL_data_request = {
'connection_name': """PostgreSQL""",
'interaction_properties': {
#'row_limit': 500,
'schema_name': 'schema',
'table_name': 'table'
}
}
Remove the comment on row_limit to retrieve only the first 500 rows as follows:
PostgreSQL_data_request = {
'connection_name': """PostgreSQL""",
'interaction_properties': {
'row_limit': 500,
'schema_name': 'schema',
'table_name': 'table'
}
}
Fixed in: 4.0.8
Save data and upload file size limitation in project-lib and ibm-watson-studio-lib for Python
If you use the save_data or upload_file functions in ibm-watson-studio-lib for Python or the save_data function in project-lib for Python, the data or file size is not allowed to exceed
2 GB.
2 GB is a hard limit in project-lib for Python. In ibm-watson-studio-lib for Python, you can follow the steps described in the workaround to save data or upload a file larger than 2 GB.
Workaround:
To work with data or a file that is larger than 2 GB in size in ibm-watson-studio-lib for Python, you need to move the file or the data to the storage associated with the project.
- If you upload a file in a notebook using the
upload_filefunction, the file is already available in the file system of your environment runtime and you can skip step 2. - If you upload data in a notebook using the
save_datafunction, you need to save the data to a file in the local file system of your environment runtime, for example:with open("my_asset.csv", "wb") as file: file.write(data) -
Retrieve the path to the mounted project storage:
wslib.mount.get_base_dir()Make a note of the path, for example,
/project_data/data_asset/. - Copy the files to the mounted project storage, for example:
!cp my_asset.csv /project_data/data_asset/ - Register the files in the mounted project storage as data assets in your project:
wslib.mount.register_asset("/project_data/data_asset/my_asset.csv", asset_name="my_asset.csv")
Notebooks fail to start even after custom environment definition is fixed
If your custom environment definition has a problem, for example it references a custom image that is no longer available, or tries to use too much CPU or memory resources, the associated notebook or JupyterLab runtime will not start. This is the expected behavior.
However, even if you update the environment definition to fix the issue, the notebook or JupyterLab runtime will still not start with this environment.
Workaround
Create a new environment definition and associate this environment definition with your notebook or select it when you launch JupyterLab.
Applies to: 4.0.7
Notebook or JupyterLab runtimes might not be accessible after running for more than 12 hours
If you run a notebook or a JupyterLab session for more than 12 hours, you might get an error stating that the runtime can no longer be accessed.
Workaround
To access the runtime again:
- Stop the runtime from the Environments tab of the project, or under Projects > Active Runtimes.
- Start the notebook or JupyterLab session again.
Applies to: 4.0.7
Fixed in: 4.0.8
Insert to code fails when the Flight service load is very high
When working with the Insert to code function in a notebook after upgrading from Cloud Pak for Data 4.0.7 to 4.0.8, you might see an error stating that the Flight service is unavailable because the server concurrency limit was reached.
The reason for this error occurring is the overall high load on the Flight service and its inability to process any further requests. This error does not mean that there is a problem with the code in your notebook.
Workaround
If you see the error when running a notebook interactively, try running the cell again. If you see the error in the log of a job, try running the job again, if possible at a time when the system is less busy.
Applies to: 4.0.7
Error when trying to access data in an Oracle database
If you try to access data in an Oracle database, you might get a DatabaseError if the schema or table name contains special characters such as the period . character. The reason for this is that Oracle uses periods as separators between
schemas, tables, and columns. If this issue occurs, consider removing any periods from the table name or schema of your database or adapt your code to surround the table name or schema identifier with double quotes i.e. my_schema."table.with.dots".
Anaconda Repository for IBM Cloud Pak for Data
Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters
When you create a channel in Anaconda Team Edition, you can't use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _
RStudio
RStudio Sparklyr package 1.4.0 can't connect with Spark 3.0 kernel
When users try to connect the Sparklyr R package in RStudio with a remote Spark 3.0 kernel, the connection fails because of Sparklyr R package connection issues. The connection issues are due to recent changes to Sparklyr R package version 1.4.0. This will be addressed in future releases. The workaround is to use the Spark 2.4 kernel.
Applies to: 4.0.0 only.
Fixed in: 4.0.1
Sparklyr R package version 1.7.0 is now used in Spark 3.0 kernels.
Running job for R script and selected RStudio environment results in an error
When you're running a job for an R Script and a custom RStudio environment was selected, the following error occurs if the custom RStudio environment was created with a previous release of Cloud Pak for Data:
The job uses an environment that is not supported. Edit your job to select an alternative environment.
To work around this issue, delete and re-create the custom RStudio environment with the same settings.
Applies to: 4.0.0 only.
Fixed in: 4.0.1
Git integration broken when RStudio crashes
If RStudio crashes while working on a script and you restart RStudio, integration to the associated Git repository is broken. The reason is that the RStudio session workspace is in an incorrect state.
Workaround
If Git integration is broken after RStudio crashed, complete the following steps to reset the RStudio session workspace:
- Click on the Terminal tab next to the Console tab to create a terminal session.
- Navigate to the working folder
/home/wsuserand rename the.rstudiofolder to.rstudio.1. - From the File menu, click Quit Session... to end the R session.
- Click Start New Session when prompted. A new R project with Git integration is created.
No Git tab although RStudio is launched with Git integration
When you launch RStudio in a project with Git integration, the Git tab might not be visible on the main RStudio window. The reason for this is that if the RStudio runtime needs longer than usual to start, the .Rprofile file that enables integration to the associated Git repository cannot run.
Workaround
To add the Git tab to RStudio:
- Run the following command from the RStudio terminal:
cp $R_HOME/etc/.Rprofile $HOME/.Rprofile echo "JAVA_HOME='/opt/conda/envs/R-3.6'" >> $HOME/.Renviron - From the Session menu, select Quit Session... to quit the session.
- If you are asked whether you want to save the workspace image to
~.RData, select Don't save. - Then click Start New Session.
Applies to: 4.0.0 only.
Fixed in: 4.0.1
RStudio doesn't open although you were added as project collaborator
If RStudio will not open and all you see is the endless spinner, the reason is that, although you were added as collaborator to the project, you have not created your own personal access token to the Git repository associated with the project. To open RStudio with Git integration, you must select your own access token.
To create your own personal access token, see Collaboration in RStudio.
Data in persistent storage volume not mounted when RStudio is launched
If you use a PersistentVolumeClaim (PVC) on the Cloud Pak for Data cluster to store large data sets, the storage volume is not automatically mounted when RStudio is launched in a project with default Git integration.
Workaround:
If you want to work with data in a persistent storage volume in RStudio, you must either:
- Work in a project with deprecated Git integration
- Or work in a project with no Git integration
In both of these types of projects, the persistent storage volume is automatically mounted when RStudio is launched and can be viewed and accessed in the /mnts/ folder.
Applies to: 4.0.1 thru 4.0.3
Fixed in: 4.0.4
Can't connect to Hadoop Livy in RStudio
If you are working in RStudio and you try to connect to Hadoop Livy, you will see an error stating that a Livy session can't be started. The reason for this error is a version mismatch between the installed cURL 4.3 R package and Livy connection using Sparklyr.
Workaround
To successfully connect to Livy using Sparklyr:
-
Downgrade the cURL R package to the 3.3 version.
As CRAN installs the latest cURL version by default, use the following command to downgrade to version 3.3:
install.packages("https://cran.r-project.org/src/contrib/Archive/curl/curl_3.3.tar.gz", repos=NULL)
Runtime pod fails when runtime is started
Oftentimes, when you start an RStudio runtime, the associated runtime pod fails, changing from being in Running state to Terminating state. This behavior can also be observed when starting SPSS, Jupyter notebook, or Data Refinery runtimes.
The reason for these pods failing is that the runtime manager is waiting for the runtime operator to update its status in a POST operation which times out, resulting in the deletion of the runtime operator.
Applies to: 4.0.7
Fixed in: 4.0.8
Data Refinery
Cannot run a Data Refinery flow job with certain unsigned data types
If the source table contains one of the following data types or equivalents, the Data Refinery flow job will fail with a ClassCastException error:
- UNSIGNED TINYINT
- UNSIGNED SMALLINT
- UNSIGNED INTEGER
Applies to: 4.0.8 and later
Cannot view visualization charts in Data Refinery after upgrade
After you upgrade to Cloud Pak for Data 4.0.8, the visualization charts do not open in Data Refinery.
Workaround
Restart the Data Refinery pods:
- Find the names of Data Refinery pods.
oc get pod -l release=ibm-data-refinery-prod
For example:
oc get po -l release=ibm-data-refinery-prod
NAME READY STATUS RESTARTS AGE
wdp-dataprep-5444f79b5d-xhlww 1/1 Running 0 39h
wdp-shaper-5fc5d87674-c8lq9 1/1 Running 0 39h
- Delete the pods by name.
oc delete pod <pod names>
For example:
oc delete pod wdp-dataprep-5444f79b5d-xhlww wdp-shaper-5fc5d87674-c8lq9
pod "wdp-dataprep-5444f79b5d-xhlww" deleted
pod "wdp-shaper-5fc5d87674-c8lq9" deleted
The two pods will restart after the old pods are terminated.
Applies to: 4.0.8
Cannot run a Data Refinery flow job with data from a Hadoop cluster
If you run a Data Refinery flow job with data from one of the following connections, the job will fail:
- HDFS via Execution Engine for Hadoop
- Hive via Execution Engine for Hadoop
- Impala via Execution Engine for Hadoop
Applies to: 4.0.6 - 4.0.7
Fixed in: 4.0.8
Option to open saved visualization assets is disabled in Data Refinery
After creating a visualization in Data Refinery and clicking Save to project, the option to open the saved visualization asset is disabled.
Applies to: 4.0.7.
Fixed in: 4.0.8.
Cannot refine data that uses commas in the source data and a target that uses a delimited file format
If the source file uses commas in the data (the commas are part of the data, not the delimiters), and you specify the Delimited file format for the target, the job will fail.
Workaround: Choose the CSV file format for the target.
Applies to: 4.0.6 and later
Data Refinery flow job fails when writing double-byte characters to an Avro file
If you run a job for a Data Refinery flow that uses a double-byte character set (for example, the Japanese or Chinese languages), and the output file is in the Avro file format, the job will fail.
Applies to: 3.5.0
Fixed in: 4.0.6
Data Refinery flow job fails with a large data asset
If your Data Refinery flow job fails with a large data asset, try these troubleshooting tips to fix the problem:
- Instead of using a project data asset as the target of the Data Refinery flow (default), use cloud storage for the target. For example, IBM Cloud Object Storage, Amazon S3, or Google Cloud Storage.
- Select a Spark & R 3.6 environment for the Data Refinery flow job or create a new Spark & R 3.6 environment definition.
- Increase the load balancer timeout on the cluster. For instructions, see Watson Knowledge Catalog processes time out before completing.
Applies to: 3.5.0 and later
Certain Data Refinery flow GUI operations might not work on large data assets
Data Refinery flow jobs that include these GUI operation might fail for large data assets.
Applies to 3.5.0 and later. These operations are not fixed yet:
- Split column
- Tokenize
These operations are fixed in 4.0.3:
- Convert column type to Date or to Timestamp (Also applies to the Convert column type operation as the automatic first step in a Data Refinery flow)
- Remove stop words
- Replace substring
- Text > Pad characters
- Text > Substring
See Data Refinery flows with large data sets need updating when using certain GUI operations.
These operations are fixed in 4.0.6:
- Convert column type to Integer when you specify a thousands grouping symbol (comma, dot, or custom)
- Convert column type to Decimal with a comma decimal marker or when you specify a thousands grouping symbol (comma, dot, or custom)
- Text > Trim quotes
See Data Refinery flows with large data sets need updating when using certain GUI operations.
Data Refinery flows with large data sets need updating when using certain GUI operations
For running Data Refinery jobs with large data assets, the following GUI operations have performance enhancements that require you to update any Data Refinery flows that use them:
Applies to: 4.0.3 and later:
- Convert column type to Date or to Timestamp (Also applies to the Convert column type operation as the automatic first step in a Data Refinery flow)
- Remove stop words
- Replace substring
- Text > Pad characters
- Text > Substring
Applies to: 4.0.6 and later:
- Convert column type to Integer when you specify a thousands grouping symbol (comma, dot, or custom)
- Convert column type to Decimal with a comma decimal marker or when you specify a thousands grouping symbol (comma, dot, or custom)
- Text > Trim quotes
To improve the job performance of a Data Refinery flow that uses these operations, update the Data Refinery flow by opening it and saving it, and then running a job for it. New Data Refinery flows automatically have the performance enhancements. For instructions, see Managing Data Refinery flows.
Data Refinery flow job fails for large Excel files
This problem is due to insufficient memory. To solve this problem, the cluster admin can add a new server to the cluster or add more physical memory. Alternatively, the cluster admin can use the OpenShift console to increase the memory allocated
to the wdp-connect-connector service, but be aware that doing so might decrease the memory available to other services.
Applies to: 4.0.2
Fixed in: 4.0.3
Cannot run a Data Refinery flow job with data from an Amazon RDS for MySQL connection
If you create a Data Refinery flow with data from an Amazon RDS for MySQL connection, the job will fail.
Applies to: 4.0.0
Fixed in: 4.0.1
Duplicate connections in a space resulting from promoting a Data Refinery flow to a space
When you promote a Data Refinery flow to a space, all dependent data is promoted as well. If the Data Refinery flow that is being promoted has a dependent connection asset and a dependent connected data asset that references the same connection asset, the connection asset will be duplicated in the space.
The Data Refinery flow will still work. Do not delete the duplicate connections.
Applies to: 3.5.0 and later
Data Refinery flow fails with "The selected data set wasn't loaded" message
The Data Refinery flow might fail if there are insufficient resources. The administrator can monitor the resources and then add resources by scaling the Data Refinery service or by adding nodes to the Cloud Pak for Data cluster.
Applies to: 3.5.0 and later
Jobs
Spark jobs are supported only by API
If you want to run analytical and machine learning applications on your Cloud Pak for Data cluster without installing Watson Studio, you must use the Spark jobs REST APIs of Analytics Engine powered by Apache Spark. See Getting started with Spark applications.
UI displays job run started by Scheduler and not by a specific user
If you manually trigger a run for a job that has a schedule defined, the UI will show that the job run was started by Scheduler, not that it was started by a particular user.
Applies to: 4.0.5
Excluding days when scheduling a job causes unexpected results
If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.
This issue only exists if you exclude days of a week when you schedule to run a job.
Error occurs when jobs are edited
You cannot edit jobs that were created prior to upgrading to Cloud Pak for Data version 3.0 or later. An erorr occurs when you edit those jobs. Create new jobs after upgrading to Cloud Pak for Data version 3.0 or later.
Errors can also occur if the user who is trying to edit the job or schedule is different from the user who started or created the job. For example, if a Project Editor attempts to edit a schedule that was created by another user in the project, an error occurs.
Can't delete notebook job stuck in starting or running state
If a notebook job is stuck in starting or running state and won't stop, although you tried to cancel the job and stopped the active environment runtime, you can try deleting the job by removing the job-run asset manually using the API.
- Retrieve a bearer token from the user management service using an API call:
curl -k -X POST https://PLATFORM_CLUSTER_URL/icp4d-api/v1/authorize -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{"username":"your_username","password":"your_password"}' - (Optional) Get the job-run asset and test the API call. Replace
${token},${asset_id}, and${project_id}accordingly.curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X GET "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}" - Delete the job-run asset. Again replace
${token},${asset_id}, and${project_id}accordingly.curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X DELETE "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
Notebook runs successfully in notebook editor but fails when run as job
Some libraries require a kernel restart after a version change. If you need to work with a library version that isn't pre-installed in the environment in which you start the notebook, and you install this library version through the notebook, the notebook only runs successfully after you restart the kernel. However, when you run the notebook non-interactively, for example as a notebook job, it fails because the kernel can't be restarted. To avoid this, define an environment defintion and add the library version you require as a software customization. See Creating environment definitions.
Can't change the schedule in existing jobs after upgrading to Cloud Pak for Data 4.0.7
If you created scheduled jobs in earlier versions of Cloud Pak for Data and are upgrading to Cloud Pak for Data version 4.0.7, you can't change or remove the schedule from these existing jobs.
Workaround
If you need to change the schedule in an existing job after upgrading to Cloud Pak for Data version 4.0.7:
- Delete the existing job.
- Create a new scheduled job.
For details, see Creating and managing jobs in an analytics project.
Can't run a Scala 2.12 with Spark 3.0 notebook job in a deployment space
If you use code generated by the Insert to code function in a Scala 2.12 with Spark 3.0 notebook and you want to run this code in a job in a deployment space, you must use the code generated by the deprecated Insert to code function.
If you run code that was generated by an Insert to code function that uses the Flight service, your job will fail.
Federated Learning
Authentication failures for Federated Learning training jobs when allowed IPs are specified in the Remote Training System
Currently, the Openshift Ingress Controller is not setting the X-Forwarded-For header with the client's IP address regardless of the forwardedHeaderPolicy setting. This will cause authentication failures for Federated
Learning training jobs when allowed_ips are specified in the Remote Training System even though the client IP address is correct.
To use the Federated Learning Remote Training System IP restriction feature in Cloud Pak for Data 4.0.3, configure an external proxy to inject the X-Forwarded-For header. For more information see this article on configuring ingress
Applies to: 4.0.3 or later
Data module not found in IBM Federated Learning
The data handler for IBM Federated Learning is trying to extract a data module from the FL library but is unable to find it. You might see the following error message:
ModuleNotFoundError: No module named 'ibmfl.util.datasets'
The issue possibly results from using an outdated DataHandler. Please review and update your DataHandler to conform to the latest spec. Here is the link to the most recent MNIST data handler or ensure your gallery sample versions are up-to-date.
Applies to: 4.0.3 or later
Unable to save Federated Learning experiment following upgrade
If you train your Federated Learning model in a previous version of Cloud Pak for Data, then upgrade, you might get this error when you try to save the model following the upgrade: "Unexpected error occurred creating model. The
issue results from training with a framework that is not supported in the upgraded version of Cloud Pak for Data. To resolve the issue, retrain the model with a supported framework, then save.
Applies to: 4.0.2 or later
Watson Machine Learning
Deployments fail for Keras models published to catalog then promoted from project to space
If you publish a Keras model with custom layers to a catalog, and then copy it back to a project, deployments for the model will fail after promotion to a space. The flow is as follows:
- Create a model with custom layers of type
tensorflow_2.7ortensorflow_rt22.1with software specificationruntime-22.1-py3.9ortensorflow_rt22.1-py3.9. - Publish the model to a Watson Knowledge Catalog.
- From the catalog, add the model to a project.
- Promote the model to a space.
At this point, the custom layer information is lost, so deployments of the model will fail. To resolve the issue, save the model to a project without publishing to a catalog.
Predictions API in Watson Machine Learning service can timeout too soon
If the predictions API (POST /ml/v4/deployments/{deployment_id}/predictions) in the Watson Machine Learning deployment service is timing out too soon, follow these steps to manually update the timeout interval.
Applies to: 4.0 and higher
Updating the Prediction service in Envoy
-
Capture the
wmlenvoyconfig configmapcontent in a yaml file:oc get cm wmlenvoyconfig -o yaml > wmlenvoyconfig.yaml -
Search for the property
timeout_msand update the value to a required timeout ( in milliseconds) in the yaml filewmlenvoyconfig.yaml:“timeout_ms”: <REQUIRED_TIMEOUT_IN_MS>
For example: To update the timeout to 600000 milliseconds:
"timeout_ms": 600000
-
To apply the timeout changes, first delete the
configmap:oc delete -f wmlenvoyconfig.yaml -
Recreate the
configmap:
oc create -f wmlenvoyconfig.yaml
- Edit the
wml-crfile to add this lineignoreForMaintenance: true. This sets the operator into maintenance mode which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.
oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>
-
Restart the Envoy pod:
oc rollout restart deployment wml-deployment-envoy -
Wait for the Envoy pod to come up:
oc get pods | grep wml-deployment-envoy
Updating the Prediction service in NGINX
Make a backup of wml-base-routes configmap:
oc get cm wml-base-routes -o yaml > wml-base-routes.yaml.bkpEdit the configmap
oc edit cm wml-base-routesSearch for keyword predictions in the configmap
wml-base-routesand update the following properties for timeout (in secs) value required for that location:proxy_send_timeout proxy_read_timeout send_timeoutFor example, to update the properties
proxy_send_timeout,proxy_read_timeoutandsend_timeoutto timeout 600 in seconds, update these locations:location ~ ^/ml/v4/deployments/([a-z0-9_]+)/predictions { rewrite ^\/ml/v4/deployments/([a-z0-9_]+)/(.*) /ml/v4/deployments/$1/$2 break; proxy_pass https://wml-envoy-upstream; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $http_host; proxy_set_header x-global-transaction-id $x_global_transaction_id; proxy_set_header v4-deployment-id $1; proxy_pass_request_headers on; proxy_connect_timeout 30; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; proxy_next_upstream error timeout; } location ~ ^/ml/v4/deployments/([a-z0-9-]+)/predictions { rewrite ^\/ml/v4/deployments/([a-z0-9-]+)/(.*) /ml/v4/deployments/$1/$2 break; proxy_pass https://wml-envoy-upstream; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $http_host; proxy_set_header x-global-transaction-id $x_global_transaction_id; proxy_set_header v4-deployment-id $1; proxy_pass_request_headers on; proxy_connect_timeout 30; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; proxy_next_upstream error timeout; }Search for keyword
icpdata_addon_versionin thelabelssection, and update values by appending a number to them, in the form:<old value>-<any_random_number>. For example, if the value is4.0.3, change it to4.0.3-100labels: app: wml-base-routes app.kubernetes.io/instance: ibm-wml-cpd app.kubernetes.io/managed-by: ansible app.kubernetes.io/name: ibm-wml-cpd app.kubernetes.io/version: 4.0.3 icpdata_addon: "true" icpdata_addon_version: 4.0.3-100 release: ibm-wmlUpdate the WML configuration file in the NGINX pod:
List the NGINX pods:
oc get pods | grep "ibm-nginx"Get into any one of ibm-nginx pods:
oc exec -it <pod> bash- Search for keyword predictions in the file
/user-home/_global_/nginx-conf.d/wml-base-routes.confand update the propertiesproxy_send_timeout,proxy_read_timeoutandsend_timeoutto timeout (in secs) value required.
For example, to update the properties
proxy_send_timeout,proxy_read_timeoutandsend_timeoutto timeout 600 in seconds, update these locations:location ~ ^/ml/v4/deployments/([a-z0-9_]+)/predictions { rewrite ^\/ml/v4/deployments/([a-z0-9_]+)/(.*) /ml/v4/deployments/$1/$2 break; proxy_pass https://wml-envoy-upstream; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $http_host; proxy_set_header x-global-transaction-id $x_global_transaction_id; proxy_set_header v4-deployment-id $1; proxy_pass_request_headers on; proxy_connect_timeout 30; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; proxy_next_upstream error timeout; } location ~ ^/ml/v4/deployments/([a-z0-9-]+)/predictions { rewrite ^\/ml/v4/deployments/([a-z0-9-]+)/(.*) /ml/v4/deployments/$1/$2 break; proxy_pass https://wml-envoy-upstream; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_set_header Host $http_host; proxy_set_header x-global-transaction-id $x_global_transaction_id; proxy_set_header v4-deployment-id $1; proxy_pass_request_headers on; proxy_connect_timeout 30; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; proxy_next_upstream error timeout; }- Search for keyword predictions in the file
Restart the NGINX pods:
oc rollout restart deployment ibm-nginxCheck that the NGINX pods have come up:
oc get pods | grep "ibm-nginx"
Updating the HAProxy timeout
To update the HAProxy timeout, see HAProxy timeout settings for the load balancer.
Deployment of AutoAI model can fail when training input and deployment input don't match
If you train an AutoAI experiment using a database as the training data source, then deploy the model using a CSV file as the deployment input data, the deployment might fail with an error stating cannot resolve... and eventually times
out.
To resolve the error, use the same type of data source for training and deploying the model.
Applies to: 4.0.6
Deploying SPSS Modeler flows with Data Asset Import node inside supernode fails
If your SPSS Modeler flow includes a supernode containing a Data Asset Import node, the deployment of the flow will fail. To resolve the issue, move the Data Asset Import node outside of the supernode.
Fixed in: 4.0.8
Deploying some SPSS model types saved as PMML fails
If you save SPSS model of type Carma, Sequence, or Apriori as PMML, then promote or import the PMML file to a deployment space, online or batch deployments created from the PMML file will fail with this error:
Deploy operation failed with message: Invalid PMML or modelSequenceModel not supported by Scoring Engine.
To resolve this issue, use the Save Branch as a Model option to save the SPSS flow as a model, promote it to a space, and create the deployments.
Applies to: 4.0.6 and earlier
Fixed in: 4.0.7 for Apriori and Carma models.
Deployments can fail with framework mismatch between training and WMLA
When you are creating a deployment that relies on WMLA, make sure your framework is supported for both training and deployment or the deployment might fail. For example, if you train a model on a Pytorch framework based on Python 3.7, and deploy
on a version of WMLA that supports Python 3.9, you will get this error: your model ir_version is higher than the checker's, indicating a mismatch. In that case, retrain your model using a framework supported for training and deployment.
SPSS deployment jobs with no schema ID fail
When creating a batch deployment job for SPSS models without input schemas defined, you can manually define the schema id and associated data asset. If you select a data asset but don't provide an associated schema id, you are not prompted to correct the error and the job is created without any input data references.
Applies to: 4.0.3 Fixed in: 4.0.7
Deployment unusable because deployment owner left the space
If deployment owner leaves the space, the deployment will become unusable. Here is how you can verify this:
- UI: In the deployment details pane, a warning icon shows next to the Deployment owner section.
- REST API: The
get_deployment_detailsREST call returns a warning message, saying that deployment owner left the space inentity.system.warnings. - Watson Machine Learning Python client: The
client.deployments.get_detailsfunction returns a warning message, saying that deployment owner left the space inentity.system.warnings.
If this happens, a space administrator can assign a new deployment owner. New owner must either be a space admin or an editor.
If you are updating the current deployment owner, only a replace operation is allowed with value path: /metadata/owner.
For information on how to update the deployment owner, refer to the "Updating deployment details using the Patch API command" section in Updating a deployment.
Applies to: 4.0.2 and 4.0.3
Duplicate deployment serving names need updating
Starting in 4.0.3, serving names that users assign to deployments must be unique per cluster. Users can check if an existing serving name is unique using the API call GET /ml/v4/deployments?serving_name&conflict=true API. If the
call returns a status code of 204, the name is unique and no further change is required. If the call returns a status code of 409, the user can update the serving name using PATCH API. Deployments with invalid serving names will fail
with an error requiring the user to update the name. For details on serving names, see Creating an online deployment. For details on using the PATCH command, see Update the deployment metadata.
Applies to: 4.0.2
Upgrade from Cloud Pak for Data 3.5 appears to fail before resolving
While Watson Machine Learning installation is in progress, the Resource Creation status may temporarily show Failed. The Watson Machine Learning resource will attempt reconciliation and the issue should automatically resolve. If the Watson Machine Learning CR status does not change to Complete after an extended period of time, contact IBM Support.
Applies to: 4.0.3
Restrictions for IBM Z and IBM LinuxONE users
When Cloud Pak for Data is installed on the IBM Z and LinuxONE platforms, Watson Studio and Watson Machine Learning users will not be able to use, run, or deploy the following types of assets:
- Data processed using Data Refinery
- Assets trained using AutoAI, Federated Learning, Decision Optimization, SPSS Modeler, Watson Machine Learning, or Hadoop
- Assets trained using RStudio, such as RShiny apps or assets based on the R framework
- Assets based on these runtimes: Spark, Python 3.7 or ONNX 1.8.1
- Deep Learning assets built with TensorFlow or PyTorch 1.8.0 frameworks
Additionally, note the following:
- Attempting to use, train, or deploy unsupported assets on Cloud Pak for Data running on an IBM Z or LinuxONE platform will fail with an error.
- Backup and restore is not currently available on IBM Z and LinuxONE platform.
- With the default runtimes, models trained on other platforms and deployed on IBM Z and Linux ONE might not work as expected. A potential solution is to deploy the model on a custom Python runtime.
- Insert to code function on IBM Z can cause kernel failure
Applies to: 4.0.2 and greater
Spark and PMML models are not supported on FIPS-enabled clusters
Spark and PMML models deployed on FIPS-enabled clusters can fail with the error "Model deployment failed because a pod instance is missing."
Applies to: 4.0.3 Fixed in: 4.0.6
Deployments might fail after restore from backup
After restoring from a backup, users might be unable to deploy new models and score existing models. To resolve this issue, after the restore operation, wait until operator reconciliation completes. You can check the status of the operator with this command:
kubectl describe WmlBase wml-cr -n <namespace_of_wml> | grep "Wml Status" | awk '{print $3}'
Applies to: 4.0.2
Job run retention not working as expected
If you override the default retention settings for preserving job runs and specify an amount, you might find that the number retained does not match what you specified.
Applies to: 4.0.2 and 4.0.3
Deployment unusable when owner ID is removed
If the ID belonging to the owner of the deployment is removed from the organization or the space, then deployments associated with that ID become unusable.
AutoAI requirement for AVX2
The AVX2 instruction set is not required to run AutoAI experiments, however it does improve performance. AutoAI experiments will run more slowly without AVX2.
AutoAI AVX2 limitation
AutoAI experiments that use SnapML algorithms will not work if the CPU used to train the AutoAI experiment does not support AVX2. The training will fail with an error.
Applies to: 4.0.2 and 4.0.3 Fixed in: 4.0.4
Watson Machine Learning might require manual rescaling
By default, the small installation of Watson Machine Learning comes up with one pod. When the load on the service increases, you may experience these symptoms, indicating the need to manually scale the wmlrepository service:
wmlrepositoryservice pod restarts with anOut Of Memoryerror-
wmlrepositoryservice request fails with this error:Generic exception of type HttpError with message: akka.stream.BufferOverflowException: Exceeded configured max-open-requests value of [256]. This means that the request queue of this pool has completely filled up because the pool currently does not process requests fast enough to handle the incoming request load. Please retry the request later. See http://doc.akka.io/docs/akka-http/current/scala/http/client-side/pool-overflow.html for more information.Use this command to scale the repository:
``` ./cpd-linux scale -a wml --config medium -s server.yaml -n
medium.yaml commands:
- scale --replicas=2 deployment wmlrepository ```
Do not import/export models between clusters running on different architectures
When you export a project or space, the contents, including model assets, are included in the export package. You can then import the project or space to another server cluster. Note that the underlying architecture must be the same or you might encounter failures with the deployment of your machine learning models. For example, if you export a space from a cluster running the Power platform, then import to a cluster running x86-64, you may be unable to deploy your machine learning models.
Deleting model definitions used in Deep Learning experiments
Currently, users can create create model definition assets from the Deep Learning Experiment Builder but cannot delete a model definition. They must use REST APIs to delete model definition assets.
RShiny app might load an empty page if user application sources many libraries from an external network
Initial load of an RShiny application may result in an empty page if the user application is sourcing many dependent libraries from an external network. If this happens, try refreshing the app after a while. It is general best practice and recommendation to source the dependent libraries locally by bundling them into the RShiny app.
Python function or Python script deployments may fail if itc_utils library and flight service is used to access data
Python function or Python script deployments may fail if the Python function or script uses the itc_utils library to access data through flight service. As a workaround, make these changes in your code:
-
Remove the
RUNTIME_FLIGHT_SERVICE_URLenvironment variable:os.environ.pop("RUNTIME_FLIGHT_SERVICE_URL")
This has to be done before the itcfs.get_flight_client API is invoked.
-
Initialize the itc client by using this line of code:
read_client = itcfs.get_flight_client(host="wdp-connect-flight.cpd-instance.svc.cluster.local", port=443)
Applies to: 4.5.x releases
Automatic mounting of storage volumes not supported by online and batch deployments
You cannot use automatic mounts for storage volumes with Watson Machine Learning online and batch deployments. Watson Machine Learning does not support this feature for Python-based runtimes, including R-script, SPSS Modeler, Spark, and Decision Optimization. You can only use automatic mounts for storage volumes with Watson Machine Learning shiny app deployments and notebook runtimes.
As a workaround, you can use the download method from the Data assets library, which is a part of the ibm-watson-machine-learning python client.
Applies to: 4.0 and later
Parent topic: IBM Watson Studio