Table of contents

Known issues for Watson Studio and supplemental services

These known issues apply to Watson Studio and the services that require Watson Studio.

Also see Cluster imbalance overwhelms worker nodes.

General issues

Internal service error occurs when you upload files, that already exist on a server, to a remote NFS volume server

If you are uploading files that already exist on a remote NFS volume server, you must update the permissions of the existing files on the remote server or create a new directory to upload all files to that directory. Otherwise, an internal service occurs.

Only users who have access to the NFS server can change the permission of the files and create new directories.

Backup and restore limitations

Offline quiesce is supported only at the OpenShift project level and it restores only to the same machine and to the same namespace.

Deployments view (Operations view/dashboard) has some limitations

The Deployments view has the following limitation:

  • When long names are used, they’re not fully truncated and can be obscured on the screen.

Spark environments can be selected although Spark is not installed

When you create a job for a notebook or a Data Refinery flow that you promoted to a deployment space, and select a Spark environment, you might see the following error message:

Error submitting job. Unable to fetch environment access info from envId spark30****. Error: [{"statusCode":500,"message":"{\"trace\":\"39e52bbb-2816-4bf6-9dad-5aede584ac7a\",\"errors\":[{\"code\":\"default_spark_create_failed\",\"message\":\"Could not create Hummingbird instance, because a wrong status code was returned: 404.\"}]}"}]

The reason for this error is that Spark was not installed on the cluster on which you created the deployment space. Contact your administrator to install the Spark service on that cluster.

UI display might not load properly

If the UI does not load properly, the Watson Studio administrator must restart redis.

Workaround

If the client behaves unexpectedly, for example, enters a redirection loop or parts of the user interface fail to load, then complete the following steps to restart redis:

  1. Log in to the OpenShift cluster.
  2. Restart the pods for redis using the following command:
     oc delete po -n <project> | oc get po -n <project> -l component=redis -o jsonpath="{.items[*].metadata.name}"
    
    

Projects

Import of a project larger than 1 GB in Watson Studio fails

If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.

Connections

Personal credentials are not supported for connected data assets in Data Refinery

If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.

Workaround:

  1. Go to the project page, and click the link for the connected data asset to open the preview.
  2. Enter credentials.
  3. Open Data Refinery and use the authenticated connected data asset for a source or target.

Applies to: 3.5.0 and later

Assets

Limitations for previews of assets

You can’t see previews of these types of assets:

  • Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
  • Connected data assets for image files in projects.
  • Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
  • Connected data assets for PDF files in projects.

Can’t load files to projects that have #, %, or ? characters in the name

You can’t create a data asset in a project by loading a file that contains a hash character (#), percent sign (%), or a question mark (?) in the file name.

Applies to: 3.5.0 and later

Can’t load CSV files to projects that are larger that 20 GB

You can’t load a CSV file to an analytics project in Cloud Pak for Data that is larger than 20 GB.

Hadoop integration

Unable to install pip packages using install_packages() on a Power machine

If you are using a Power cluster, you might see the following error when attempting to install pip packages with hi_core_utils.install_packages():

ModuleNotFoundError: No module named '_sysconfigdata_ppc64le_conda_cos6_linux_gnu'

To work around this known limitation of hi_core_utils.install_packages() on Power, export the following environment variable before calling install_packages():

# For Power machines, export this env var to work around a known issue in
# hi_core_utils.install_packages()...
os.environ['_CONDA_PYTHON_SYSCONFIGDATA_NAME'] = "_sysconfigdata_powerpc64le_conda_cos7_linux_gnu"

On certain HDP Clusters, the Execution Engine for Apache Hadoop service installation fails

The installation fails during the Knox Gateway Configuration step. The issue is because the Knox gateway fails to start and happens on some nodes. See more information about the Knox gateway issue here.

The following errors occur:

  •   Failed to configure gateway keystore Exception in thread "main" Caused by: java.lang.NoSuchFieldError: DEFAULT_XML_TYPE_ATTRIBUTE
    
  •   Exception in thread "main" java.lang.reflect.InvocationTargetException Caused by: java.lang.NoSuchMethodError: org.eclipse.persistence.internal.oxm.mappings.Field.setNestedArray(Z)VException in thread "main" java.lang.reflect.InvocationTargetException
    

The workaround is to remove the org.eclipse.persistence.core-2.7.2.jar file from the installation directory by using the following command:

mv /opt/ibm/dsxhi/gateway/dep/org.eclipse.persistence.core-2.7.2.jar /tmp/

Cannot stop jobs for a registered Hadoop target host

When a registered Hadoop cluster is selected as the Target Host for a job run, the job cannot be stopped. As a workaround, view the Watson Studio Local job logs to find the Yarn applicationId; then, use the ID to manually stop the Hadoop job on the remote system. When the remote job is stopped, the Watson Studio Local job will stop on its own with a “Failed” status. Similarly, jobs that are started for registered Hadoop image push operations cannot be stopped either.

Code that is added to a Python 3.7 or Python 3.8 notebook for an HDFS-connected asset fails

The HDFS-connected asset fails because the Set Home as Root setting is selected for the HDFS connection. To work around this issue, create the connected asset using the HDFS connection without selecting Set Home as Root.

Jupyter notebook with Python 3.8 is not supported by Execution Engine for Apache Hadoop

The following issues are the result of Python 3.8 not being supported by Execution Engine for Apache Hadoop:

  • When a Jupyter Enterprise Gateway Python 3.8 notebook is running on Spark 2.4, the notebook is unable to be launched. The error occurs because Python 3.8 is not supported on Spark 2.4.
  • A Livy session fails to be established with a Jupyter notebook with Python 3.8 runtime pushed image when the registered Hadoop cluster has Spark 2.4.

See more information about Spark and Python 3.8 that is described in Apache’s JIRA tracker.

Applies to: 4.0.1

The Livy service does not restart when a cluster is rebooted

The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.

Applies to: 3.5.0 and later

Pushing the Python 3.7 image to a registered Execution Engine for Apache Hadoop system hangs on Power

When the Python 3.7 Jupyter image is pushed to an Execution Engine for Apache Hadoop registered system using platform configurations, installing dist-keras package into the image (using pip) hangs. The job runs for hours but never completes, and the console output for the job ends with:

  Attempting to install HI addon libs to active environment ...
    ==> Target env: /opt/conda/envs/Python-3.7-main ...
    ====> Installing conda packages ...
    ====> Installing pip packages ...

This hang is caused by a pip regression in dependency resolution, as described in the New resolver downloads hundreds of different package versions, without giving reason issue.

To work around this problem, do the following steps:

  1. Stop the image push job that is hung. To find the job that is hung, use the following command:

     oc get job -l headless-type=img-saver
    
  2. Delete the job:

     oc delete job <jobid>
    
  3. Edit the Execution Engine for Apache Hadoop image push script located at /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.

  4. Add the flag --use-deprecated=legacy-resolver to the pip install command, as follows:

    @@ -391,7 +391,7 @@
          conda-env=2.6.0 opt_einsum=3.1.0 > /tmp/hiaddons.out 2>&1 ; then
            echo " OK"
            echo -n "  ====> Installing pip packages ..."
           -          if pip install dist-keras==0.2.1 >> /tmp/hiaddons.out 2>&1 ; then
           +          if pip install --use-deprecated=legacy-resolver dist-keras==0.2.1 >> /tmp/hiaddons.out 2>&1 ; then
     
              echo " OK"
              hiaddonsok=true
    
  5. Re-start your image push operation again by clicking Replace image in the Execution Engine for Apache Hadoop registration page.

To edit the Execution Engine for Apache Hadoop image push script:

  1. Access the pod running utils-api. This has the /cc-home/.scripts directory mounted.

     oc get pod | grep utils-api
    
  2. Extract the existing /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh file:

     oc cp <utils_pod_id>:/cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh dsx-hi-save-image.sh
    
  3. Modify the file per the workaround steps:

     vi dsx-hi-save-image.sh
    
  4. Copy the new file into the pod, in the /tmp dir:

     oc cp dsx-hi-save-image.sh <utils_pod_id>:/tmp
    
  5. Exec into the pod, and do:

     oc rsh <utils_pod_id>
    
  6. Make a backup:

     cp -up /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.bak
    
  7. Dump the content in /tmp/dsx-hi-save-image.sh to /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh:

     cat /tmp/dsx-hi-save-image.sh > /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh`
    
  8. Do a diff to make sure you have the changes:

     diff /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh /cc-home/.scripts/proxy-pods/dsx-hi/dsx-hi-save-image.sh.bak
    
  9. Exit the utils-api pod:

     exit
    

Pushing a Spectrum Conductor image again after initial push fails

If you push the Spectrum Conductor image again after the initial push for a configured Spectrum Conductor for the Anaconda instance, an error can occur. On the image push log, you will see the following error stack:

create_environment_from_yamls ...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cc-home/.scripts/proxy-pods/dsx-hi/pushToDSXHI.py", line 389, in create_environment_from_yamls
    response = createAndCheckEnvStatus(yams_list, instanceUUID)
  File "/cc-home/.scripts/proxy-pods/dsx-hi/pushToDSXHI.py", line 412, in createAndCheckEnvStatus
    raise Exception('something went wrong while pushing image to conductor cluster. Please check logs for conductor cluster')
Exception: something went wrong while pushing image to conductor cluster. Please check logs for conductor cluster

Locate the Anaconda instance and the name of the Anaconda environment that the image push is trying to create. The name of the environment is in the following format: <topology_name>__jupyter-py37.

The error from Spectrum Conductor should look like the following message:

The following specifications were found to be incompatible with each other:
 <a long list of packages> 

To work around the issue:

  1. Click Clear Error on the Spectrum Conductor environment that failed.
  2. Select the environment, and then click Remove.
  3. On Cloud Pak for Data, log in as an admin.
  4. From the Platform configuration page, click the system where the image push failed.
  5. Run the image push again for the failed image.

Applies to: 4.0.0 and later

Pushing an image to a Spectrum Conductor environment fails

If you push an image to a Spectrum Conductor environment, the image push fails.

Applies to: 4.0.0 and later

Notebooks using a Hadoop environment that don’t have the Jupyter Enterprice Gateway service enabled can’t be used

When a notebook is opened using a Hadoop environment that does not have the Jupyter Enterprise Gateway (JEG) service enabled, the kernel remains disconnected. This notebook does not display any error message, and it cannot be used.

To address this issue, confirm that the Hadoop environment that is defined to use with a JEG service does have JEG enabled.

Cannot use HadoopLibUtils to connect through Livy in RStudio

RStudio includes a new version of sparklyr. sparklyr doesn’t work with the HadoopLibUtils library that’s provided to connect to remote Hadoop clusters using Livy. The following error occurs: Error: Livy connections now require the Spark version to be specified. As a result, you cannot create tables using Hadoop.

If you’re not rolling back the sparklyr 1.6.3 changes, you can install sparklyr 1.6.2. Restart an R session and use sparklyr 1.6.2 for Hadoop.

Install sparklyr 1.6.2, by using the following steps:

  require(remotes)
  library(remotes)
  install_version(“sparklyr”, version = “1.6.2”, repos = http://cran.us.r-project.org)
  packageVersion("sparklyr")

After the package is installed, load sparklyr 1.6.2 when you are using HadoopLibUtilsR.

  • Applies to: 4.0.0

Hadoop Refinery: Cannot output to a connected parquet asset that is a directory

This is a specific problem for when running a Refinery data shaping, using the Execution Engine for Apache Hadoop connector for HDFS. When you select an output target directory that contains a set of parquet files, the Data Refinery layer cannot determine the file format for this directory.

Instead, it leaves the File Format field blank. This causes a failure when writing the output because File Format ‘ ‘ is not supported.

To work around this issue, select an output target that is empty or not a directory.

Hadoop notebook jobs don’t support using environment variables containing spaces

When you’re setting up a notebook job using a remote Hadoop environment, there is an option to specify environment variables that can be accessed by the notebook. There is an issue, where if the environment variable is defined to contains spaces, the JEG job will fail. The remote Hadoop jeg.log displays:

  [E 2020-05-13 10:28:54.542 EnterpriseGatewayApp] Error occurred during launch of KernelID

To work around this issue, it is recommended to not define an environment variable that contains spaces as a value. You can also, if necessary, encode the value and decode when the notebook is using the content of the environment variable.

Notebooks

Spark Scala notebooks using the Insert to code function to load local data assets can fail in a space

Spark Scala notebooks that use the Insert to code function to load a local data asset can fail when running the notebook in a job in a deployment space.

To fix this issue, use the following workaround:

  import org.apache.spark.sql.SQLContext
  import com.ibm.watsonstudiolib.Agent
  val wslib = Agent.accessProjectOrSpace()


  val sqlContext = new SQLContext(sc)

  var path_prefix = "/project_data/"
  if (!(sys.env.get("SPACE_ID")).filterNot(s => s == null || s.trim.isEmpty).isEmpty) {path_prefix = "/space_data/"}


  val data_df_0 = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").option("mode", "DROPMALFORMED").csv(path_prefix + wslib.assets.listAttachments("cars.csv", raw = true).head("object_key"))
  data_df_0.show(5)

Insert to code function does not work on SSL-enabled Db2 on Cloud connections from 3.5.x imported projects or 3.5.x git repos

If you import a project that was created in Cloud Pak for Data 3.5.x and the project contains a Db2 on Cloud connection with SSL enabled, the Notebooks “Insert to code” feature will not work. The problem also occurs if you synchronize with a Git project from Cloud Pak for Data version 3.5.x.

To fix this problem, edit the connection: Click the connection on the project Data assets page. Clear and re-select the Port is SSL-enabled checkbox in the Edit connection page.

Applies to: 4.0.0 and later

Issues with Insert to code function for an Informix connection

If you use the Insert to code function for a connection to an Informix database, and if Informix is configured for case-sensitive identifiers, the inserted code throws a runtime error if you try to query a table with an upper-case name. In the cell output, there’s an error message similar to the following:

    DatabaseError: Execution failed on sql: SELECT * FROM informix.FVT_EMPLOYEE
    java.sql.SQLException: The specified table (informix.fvt_employee) is not in the database.
    unable to rollback

Workaround

In your notebook, edit the inserted code.

For example:

  • For Python, add the connection property 'DELIMIDENT=Y' to the connection and surround the upper-case identifier with double-quotes (""). Replace the following lines:
      informix_connection = jaydebeapi.connect('com.informix.jdbc.IfxDriver',
      '{}://{}:{}/{}:user={};password={};'.format('jdbc:informix-sqli',
      informix_credentials['host'],
      informix_credentials['port'],
      informix_credentials['database'],
      informix_credentials['username'],
      informix_credentials['password']),    [informix_credentials['username'],
      _credentials['password']])
      query = 'SELECT * FROM informix.FVT_EMPLOYEE'
    

    With:

      informix_connection = jaydebeapi.connect('com.informix.jdbc.IfxDriver',
      '{}://{}:{}/{}:user={};password={};'.format('jdbc:informix-sqli',
      informix_credentials['host'],
      informix_credentials['port'],
      informix_credentials['database'],
      informix_credentials['username'],
      informix_credentials['password']),
      {
          'user': informix_credentials['username'],
          'password': informix_credentials['password'],
          'DELIMIDENT': 'Y'
      })
      query = 'SELECT * FROM informix."FVT_EMPLOYEE"'
    
  • For R, add the connection property 'DELIMIDENT=Y' to the connection and surround all upper case names with double-quotes (""). Replace the following lines:
    paste("jdbc:informix-sqli://", Informix_credentials[][["host"]], ":", Informix_credentials[][["port"]],
     "/", Informix_credentials[][["database"]], 
     ":user=", Informix_credentials[][["username"]], 
     ";password=", Informix_credentials[][["password"]], ";", sep=""),
     ...
    query <- "SELECT * FROM myschema.MY_TABLE"
    

    With:

    paste("jdbc:informix-sqli://", Informix_credentials[][["host"]], ":", Informix_credentials[][["port"]],
     "/", Informix_credentials[][["database"]], 
     ":user=", Informix_credentials[][["username"]], 
     ";password=", Informix_credentials[][["password"]],";DELIMIDENT=Y", ";", sep=""),
     ...
    query <- "SELECT * FROM myschema.\"MY_TABLE\""
    
  • For Scala, add the connection property 'DELIMIDENT=Y' to the connection and in the query surround all upper case names with double-quotes (""). Replace the following lines:
    lazy val Informix_properties = Map("url" -> "jdbc:informix-sqli://myserver.mycompany.com:12345/mydb",
      "user" -> Informix_credentials ("username").asInstanceOf[String],
      "password" -> Informix_credentials ("password").asInstanceOf[String])
    
    val data_df_0 = spark.read
      .format("jdbc")
      .options(Informix_properties)
      .option("driver" , "com.informix.jdbc.IfxDriver")
      .option("dbtable", "myschema.MY_TABLE")
      .load()
    data_df_0.show(5)
    

    With:

    lazy val Informix_properties = Map("url" -> "jdbc:informix-sqli://myserver.mycompany.com:12345/mydb",
      "user" -> Informix_credentials ("username").asInstanceOf[String],
      "password" -> Informix_credentials ("password").asInstanceOf[String],
      "DELIMIDENT" -> "Y")
    val data_df_0 = spark.read
      .format("jdbc")
      .options(Informix_properties)
      .option("driver" , "com.informix.jdbc.IfxDriver")
      .option("dbtable", "myschema.\"MY_TABLE\"")
      .load()
    data_df_0.show(5)  
    

Error in notebooks when rendering data from Cloudera Distribution for Hadoop

When running Jupyter notebooks against Cloudera Distribution for Hadoop 5.16 or 6.0.1 with Spark 2.2, the first dataframe operation for rendering cell output from the Spark driver results in a JSON encoding error.

Workaround:

When running Jupyter notebooks against CDH 5.16 or 6.0.1 with Spark 2.2, the first dataframe operation for rendering cell output from the Spark driver results in a JSON encoding error. To workaround this error, do one of the following procedures:

  • For interactive notebook sessions

    Manually re-run the first cell that renders data.

  • For non-interactive notebook sessions

    Add the following non-intrusive code after establishing the Spark connection to trigger the first failure:

	%%spark
	import sys, warnings
	def python_major_version ():
	    return(sys.version_info[0])
	with warnings.catch_warnings(record=True):
         print(sc.parallelize([1]).map(lambda x: python_major_version()).collect())

Notebook loading considerations

The time that it takes to create a new notebook or to open an existing one for editing purposes might vary. If no runtime container is available, a container needs to be created and only after it is available, the Jupyter notebook user interface can be loaded. The time it takes to create a container depends on the cluster load and size. Once a runtime container exists, subsequent calls to open notebooks will be significantly faster.

Kernel not found when opening a notebook imported from a Git repository

If you import a project from a Git repository that contains notebooks that were created in JupyterLab, and try opening the notebooks from the project Assets page, you will see a message stating that the required notebook kernel can’t be found.

The reason is that you are trying to open the notebook in an environment that doesn’t support the kernel required by the notebook, for example in an environment without Spark for a notebook that uses Spark APIs. The information about the environment dependency of a notebook that was created in JupyterLab and exported in a project is currently not available when this project is imported again from Git.

Workaround:

You need to associate the notebook with the correct environment definition. You can do this:

  • From the notebook opened in edit mode by:

    1. Clicking the Notebook Info icon (Notebook Info icon) from the notebook toolbar and then clicking Environment.
    2. Selecting the correct environment definition for your notebook from the list under Environments.
  • Before you open the notebook, from the project Assets page by:

    1. Selecting the notebook and unlocking it if it is locked. You can only change the environment of a notebook if the notebook is unlocked.
    2. Clicking Actions > Change Environment and selecting the correct environment definition for your notebook.

Environment runtime can’t be started because the software customization failed

If your Jupyter notebook runtime can’t be started and a 47 killed error is logged, the software customization process could not be completed because of lack of memory.

You can customize the software configuration of a Jupyter notebook environment by adding conda and pip packages. However, be aware that conda does dependency checking when installing packages which can be memory intensive if you add many packages to a customization.

To complete a customization successfully, you must make sure that you select an environment with sufficient RAM to enable dependency checking at the time the runtime is started.

If you only want packages from one conda channel, you can prevent unnecessary dependency checking by excluding the default channels. To do this, remove defaults from the channels list in the customization template and add nodefaults.

Notebook returns UTC time and not local time

The Python function datetime returns the date and time for the UTC time zone and not the local time zone where a user is located. The reason is that the default environment runtimes use the time zone where they were created, which is UTC time.

Workaround:

If you want to use your local time, you need to download and modify the runtime configuration file to use your time zone. You don’t need to make any changes to the runtime image. After you upload the configuration file again, the runtime will use the time you set in the configuration file.

Required role: You must be a Cloud Pak for Data cluster administrator to change the configuration file of a runtime.

To change the time zone:

  1. Download the configuration file of the runtime you are using. Follow the steps in Downloading the runtime configuration.
  2. Update the runtime definition JSON file and extend the environment variable section to include your time zone, for example, for Europe/Viena use:
    {
      "name": "TZ",
      "value": "Europe/Vienna"
    }
    
  3. Upload the changed JSON file to the Cloud Pak for Data cluster. You can use the Cloud Pak for Data API.

    1. Get the required platform access token. The command returns the bearer token in the accessToken field:
       curl <CloudPakforData_URL>/v1/preauth/validateAuth -u <username>:<password>
      
    2. Upload the JSON file:
       curl -X PUT \
         'https://<CloudPakforData_URL>/zen-data/v1/volumes/files/%2F_global_%2Fconfig%2F.runtime-definitions%2Fibm' \
         -H 'Authorization: Bearer <platform-access-token>' \
         -H 'content-type: multipart/form-data' \
         -F upFile=@/path/to/runtime/def/<custom-def-name>-server.json
      

      Important: Change the name of the modified JSON file. The file name must end with server.json and the same file name must be used across all clusters to enable exporting and importing analytics projects across cluster boundaries.

      If the changed JSON file was uploaded successfully, you will see the following response:

       {
           "_messageCode_": "Success",
           "message": "Successfully uploaded file and created the necessary directory    
                      structure"
       }
      
  4. Restart the notebook runtime.

Can’t promote existing notebook because no version exists

If you are working with a notebook that you created prior to IBM Cloud Pak for Data 4.0.0, and you want to promote this notebook to a deployment space, you will get an error message stating that no version exists for this notebook.

If this notebook also has a job definition, in addition to saving a new version, you need to edit the job settings.

To enable promoting existing notebooks and edit job settings, see Promoting notebooks.

Applies to: 4.0.0 and later when upgrading from 3.5

Anaconda Repository for IBM Cloud Pak for Data

Channel names for Anaconda Repository for IBM Cloud Pak for Data don’t support double-byte characters

When you create a channel in Anaconda Team Edition, you can’t use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _

RStudio

RStudio Sparklyr package 1.4.0 can’t connect with Spark 3.0 kernel

When users try to connect the Sparklyr R package in RStudio with a remote Spark 3.0 kernel, the connection fails because of Sparklyr R package connection issues. The connection issues are due to recent changes to Sparklyr R package version 1.4.0. This will be addressed in future releases. The workaround is to use the Spark 2.4 kernel.

Applies to: 4.0.0 only.

Fixed in: 4.0.1

Sparklyr R package version 1.7.0 is now used in Spark 3.0 kernels.

Running job for R script and selected RStudio environment results in an error

When you’re running a job for an R Script and a custom RStudio environment was selected, the following error occurs if the custom RStudio environment was created with a previous release of Cloud Pak for Data: The job uses an environment that is not supported. Edit your job to select an alternative environment.

To work around this issue, delete and re-create the custom RStudio environment with the same settings.

Applies to: 4.0.0 only.

Fixed in: 4.0.1

Git integration broken when RStudio crashes

If RStudio crashes while working on a script and you restart RStudio, integration to the associated Git repository is broken. The reason is that the RStudio session workspace is in an incorrect state.

Workaround

If Git integration is broken after RStudio crashed, complete the following steps to reset the RStudio session workspace:

  1. Click on the Terminal tab next to the Console tab to create a terminal session.
  2. Navigate to the working folder /home/wsuser and rename the .rstudio folder to .rstudio.1.
  3. From the File menu, click Quite Session… to end the R session.
  4. Click Start New Session when prompted. A new R project with Git integration is created.

No Git tab although RStudio is launched with Git integration

When you launch RStudio in a project with Git integration, the Git tab might not be visible on the main RStudio window. The reason for this is that if the RStudio runtime needs longer than usual to start, the .Rprofile file that enables integration to the associated Git repository cannot run.

Workaround

To add the Git tab to RStudio:

  1. Run the following command from the RStudio terminal:
    cp $R_HOME/etc/.Rprofile  $HOME/.Rprofile
    echo "JAVA_HOME='/opt/conda/envs/R-3.6'" >> $HOME/.Renviron
    
  2. From the Session menu, select Quit Session… to quit the session.
  3. If you are asked whether you want to save the workspace image to ~.RData, select Don’t save.
  4. Then click Start New Session.

Applies to: 4.0.0 only.

Fixed in: 4.0.1

RStudio doesn’t open although you were added as project collaborator

If RStudio will not open and all you see is the endless spinner, the reason is that, although you were added as collaborator to the project, you have not created your own personal access token to the Git repository associated with the project. To open RStudio with Git integration, you must select your own access token.

To create your own personal access token, see Collaboration in RStudio.

Data Refinery

Cannot run a Data Refinery flow job with data from an Amazon RDS for MySQL connection

If you create a Data Refinery flow with data from an Amazon RDS for MySQL connection, the job will fail.

Applies to: 4.0.0
Fixed in: 4.0.1

Duplicate connections in a space resulting from promoting a Data Refinery flow to a space 

When you promote a Data Refinery flow to a space, all dependent data is promoted as well. If the Data Refinery flow that is being promoted has a dependent connection asset and a dependent connected data asset that references the same connection asset, the connection asset will be duplicated in the space.

The Data Refinery flow will still work. Do not delete the duplicate connections.

Applies to: 3.5.0 and later

Data Refinery flow fails with “The selected data set wasn’t loaded” message

The Data Refinery flow might fail if there are insufficient resources. The administrator can monitor the resources and then add resources by scaling the Data Refinery service or by adding nodes to the Cloud Pak for Data cluster.

Applies to: 3.5.0 and later

Jobs

Spark jobs are supported only by API

If you want to run analytical and machine learning applications on your Cloud Pak for Data cluster without installing Watson Studio, you must use the Spark jobs REST APIs of Analytics Engine powered by Apache Spark. See Getting started with Spark applications.

Excluding days when scheduling a job causes unexpected results

If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.

This issue only exists if you exclude days of a week when you schedule to run a job.

Error occurs when jobs are edited

You cannot edit jobs that were created prior to upgrading to Cloud Pak for Data version 3.0 or later. An erorr occurs when you edit those jobs. Create new jobs after upgrading to Cloud Pak for Data version 3.0 or later.

Errors can also occur if the user who is trying to edit the job or schedule is different from the user who started or created the job. For example, if a Project Editor attempts to edit a schedule that was created by another user in the project, an error occurs.

Can’t delete notebook job stuck in starting or running state

If a notebook job is stuck in starting or running state and won’t stop, although you tried to cancel the job and stopped the active environment runtime, you can try deleting the job by removing the job-run asset manually using the API.

  1. Retrieve a bearer token from the user management service using an API call:
    curl -k -X POST https://PLATFORM_CLUSTER_URL/icp4d-api/v1/authorize -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{"username":"your_username","password":"your_password"}'
    
  2. (Optional) Get the job-run asset and test the API call. Replace ${token}, ${asset_id}, and ${project_id} accordingly.
    curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X GET "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
    
  3. Delete the job-run asset. Again replace ${token}, ${asset_id}, and ${project_id} accordingly.
    curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X DELETE "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
    

Notebook runs successfully in notebook editor but fails when run as job

Some libraries require a kernel restart after a version change. If you need to work with a library version that isn’t pre-installed in the environment in which you start the notebook, and you install this library version through the notebook, the notebook only runs successfully after you restart the kernel. However, when you run the notebook non-interactively, for example as a notebook job, it fails because the kernel can’t be restarted. To avoid this, define an environment defintion and add the library version you require as a software customization. See Creating environment definitions.

Watson Machine Learning

Deployments might fail after restore from backup

After restoring from a backup, users might be unable to deploy new models and score existing models. To resolve this issue:

  1. After the restore operation, wait until operator reconciliation completes. You can check the status of the operator with this command:
      kubectl describe WmlBase wml-cr -n <namespace_of_wml> | grep "Wml Status" | awk '{print $3}'
    
  2. After the operator reconciliation status shows as Completed, restart the Runtime Manager Pods using this command:
      kubectl delete pod wml-deployment-manager-0 -n <namespace_of_wml>
    

Unable to save Federated Learning experiment following upgrade

If you train your Federated Learning model in a previous version of Cloud Pak for Data, then upgrade, you might get this error when you try to save the model following the upgrade: "Unexpected error occurred creating model. The issue results from training with a framework that is not supported in the upgraded version of Cloud Pak for Data. To resolve the issue, retrain the model with a supported framework, then save.

Promotion of AutoAI time series model from catalog fails

If you save an AutoAI time series experiment as a model, publish the model to a catalog, promote the model from the catalog to a project, and then promote the model from the project to a space, you will get this error: An unexpected response was returned when patching asset attribute: wml_model. To resolve this, save the model directly to a project and then promote to a space.

Job run retention not working as expected

If you override the default retention settings for preserving job runs and specify an amount, you might find that the number retained does not match what you specified.

Deployment unusable when owner ID is removed

If the ID belonging to the owner of the deployment is removed from the organization or the space, then deployments associated with that ID become unusable.

AutoAI requirement for AVX2

The AVX2 instruction set is not required to run AutoAI experiments, however it does improve performance. AutoAI experiments will run more slowly without AVX2.

Watson Machine Learning might require manual rescaling

By default, the small installation of Watson Machine Learning comes up with one pod. When the load on the service increases, you may experience these symptoms, indicating the need to manually scale the wmlrepository service:

  1. wmlrepository service pod restarts with an Out Of Memory error
  2. wmlrepository service request fails with this error:
    Generic exception of type HttpError with message: akka.stream.BufferOverflowException: Exceeded configured max-open-requests value of [256]. This means that the request queue of this pool  has completely filled up because the pool currently does not process requests fast enough to handle the incoming request load. Please retry the request later. See http://doc.akka.io/docs/akka-http/current/scala/http/client-side/pool-overflow.html for more information.

Use this command to scale the repository:

  ./cpd-linux scale -a wml --config medium -s server.yaml  -n <namespace>

  medium.yaml
  commands:
  - scale --replicas=2 deployment wmlrepository

Do not import/export models between clusters running on different architectures

When you export a project or space, the contents, including model assets, are included in the export package. You can then import the project or space to another server cluster. Note that the underlying architecture must be the same or you might encounter failures with the deployment of your machine learning models. For example, if you export a space from a cluster running the Power platform, then import to a cluster running x86-64, you may be unable to deploy your machine learning models.

Deleting model definitions used in Deep Learning experiments

Currently, users can create create model definition assets from the Deep Learning Experiment Builder but cannot delete a model definition. They must use REST APIs to delete model definition assets.

Decision optimization

Decision Optimizaton batch jobs creation

While the existing batch deployment interface show details about existing Decision Optimization batch deployment job runs, there is no interface for new Decision Optimization batch deployment job run creation. Users must use APIs or other clients to create Decision Optimization batch deployment job runs.

Manually remove deployments to avoid resource consumption

Decision Optimization deployments use long running pods to execute jobs submitted to them. Over time, if the long running pods are not cleaned up, they will continue to use resources as long as they are running. This can result in insufficient resources being available for other deployments. To avoid this issue, delete Decision Optimization deployments that are no longer needed to clean up the long running pods and free up resources.