Known issues and limitations for Watson Studio and supplemental services

These known issues and limitations apply to Watson Studio and the services that require Watson Studio.

Known issues

Known issues for Anaconda Repository for IBM Cloud Pak for Data

Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters

When you create a channel in Anaconda Team Edition, you can't use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _

Known issues for Data Refinery

Data protection rules do not always mask data in Data Refinery visualizations

If you set up data protection rules for an asset, data protection rules are not always enforced. As a result, in some circumstances the data can be seen in Data Refinery Visualizations charts.

"View asset activity" does not work on a shaped data asset

When you run a job for a Data Refinery flow, by default the shaped data asset, source-file-name_shaped.csv, is added to your project assets. If you open the asset from the All assets page, the View asset activity action does not work.

Applies to: 4.6.4 and later

Restriction for refining data from an Exasol connection

You cannot refine data from an Exasol data source if the table name includes spaces or special characters.

Applies to: 4.6.0

Fixed in: 4.6.1

Tokenize GUI operation might not work on large data assets

Data Refinery flow jobs that include the Tokenize GUI operation might fail for large data assets.

Applies to: 4.6.0 and later

Duplicate connections in a space resulting from promoting a Data Refinery flow to a space

When you promote a Data Refinery flow to a space, all dependent data is promoted as well. If the Data Refinery flow that is being promoted has a dependent connection asset and a dependent connected data asset that references the same connection asset, the connection asset will be duplicated in the space.

The Data Refinery flow will still work. Do not delete the duplicate connections.

Applies to: 4.6.0 and later

Personal credentials are not supported for connected data assets in Data Refinery

If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.

Workaround:

  1. Go to the project page, and click the link for the connected data asset to open the preview.
  2. Enter credentials.
  3. Open Data Refinery and use the authenticated connected data asset for a source or target.

Applies to: 4.6.0 and later

Known issues for Federated Learning

Authentication failures for Federated Learning training jobs when allowed IPs are specified in the Remote Training System

Currently, the Openshift Ingress Controller is not setting the X-Forwarded-For header with the client's IP address regardless of the forwardedHeaderPolicy setting. This will cause authentication failures for Federated Learning training jobs when allowed_ips are specified in the Remote Training System even though the client IP address is correct.

To use the Federated Learning Remote Training System IP restriction feature in Cloud Pak for Data 4.0.3, configure an external proxy to inject the X-Forwarded-For header. For more information, see this article on configuring ingress.

Applies to: 4.6.0 and later

Federated Learning failed to create fl job

Your Federated Learning job might fail with a message that contains "WML API response 'attachments'".

The issue possibly results from using an older model on a newer experiment on projects that use Git storage. Please review and update your model to conform to the latest spec. See the most recent Frameworks, fusion methods, and Python versions.

Applies to: 4.6.0 and later

Unsupported software spec from upgrading might cause experiment to fail

After upgrading to Cloud Pak for Data 4.6, rerunning or reconfiguring a Federated Learning experiment from Cloud Pak for Data 3.5 with unsupported software specifications may fail. After upgrading to 4.6.x, create a new Federated Learning experiment using supported software specifications. For more details, see Frameworks, fusion methods, and Python versions.

Applies to: 4.6.0 and later

Known issues for Hadoop integration

Support for Spark versions

  • Apache Spark 3.1 for Power is not supported.

    Applies to: 4.6.0 and later

  • To run Jupyter Enterprise Gateway (JEG) on Cloud Pak for Data 4.6.3, you must run the following commands as the first cell after the kernel starts:

    from pyspark.sql import SparkSession
    from pyspark import SparkContext
    spark = SparkSession.builder.getOrCreate()
    sc = SparkContext.getOrCreate()
    

    Applies to: 4.6.3

Failure to connect to Impala via Execution Engine for Hadoop

On CDP version 7.1.8, the JDBC client fails and you receive the following SQL error message when you try to connect to Impala via Execution Engine for Hadoop:

SQL error: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Socket is closed by peer. ExecuteStatement for query "SHOW DATABASES".

Workaround: Set the property -idle_client_poll_period_s=0 to 0 and restart Impala:

  1. Go to Cloudera Manager.
  2. From the home page, click the Status tab.
  3. Select Impala.
  4. Click the Configuration tab.
  5. In the Impala Command Line Argument Advanced Configuration Snippet (impalad_cmd_args_safety_valve), add the property: -idle_client_poll_period_s=0.
  6. Restart Impala.

Known issues for jobs

Scheduled jobs don't run after upgrading from Cloud Pak for Data 4.0.9

After you have upgraded from Cloud Pak for Data 4.0.9 to 4.5, the schedules in your existing jobs will not run. The last recorded runs for those jobs are from before you upgraded.

Workaround

To run your existing jobs on a schedule again:

  • Either delete the existing schedule and add a new one to the job.
  • Or change the existing schedule in some way, for example by changing the start date.

Excluding days when scheduling a job causes unexpected results

If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.

This issue only exists if you exclude days of a week when you schedule to run a job.

Can't delete notebook job stuck in starting or running state

If a notebook job is stuck in starting or running state and won't stop, although you tried to cancel the job and stopped the active environment runtime, you can try deleting the job by removing the job-run asset manually using the API.

  1. Retrieve a bearer token from the user management service using an API call:

    curl -k -X POST https://PLATFORM_CLUSTER_URL/icp4d-api/v1/authorize -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{"username":"your_username","password":"your_password"}'
    
  2. (Optional) Get the job-run asset and test the API call. Replace ${token}, ${asset_id}, and ${project_id} accordingly.

    curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X GET "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
    
  3. Delete the job-run asset. Again replace ${token}, ${asset_id}, and ${project_id} accordingly.

    curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X DELETE "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
    

Can't change the schedule in existing jobs after upgrading to Cloud Pak for Data 4.0.7

If you created scheduled jobs in earlier versions of Cloud Pak for Data and are upgrading from a version before Cloud Pak for Data 4.0.7, you can't change or remove the schedule from these existing jobs.

Workaround

If you need to change the schedule in an existing job after upgrading from a version from before Cloud Pak for Data version 4.0.7:

  1. Delete the existing job.
  2. Create a new scheduled job.

For details, see Creating and managing jobs in a project.

Known issues for notebooks

Failure to export a notebook to HTML in the Jupyter Notebook editor

When you are working with a Jupyter Notebook created in a tool other than Watson Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.

Workaround

  1. In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.

  2. Remove the following metadata:

    "widgets": {
       "state": {},
       "version": "1.1.2"
    }
    
  3. Click Edit.

  4. Save the notebook.

Passing the value "None" as the schema argument in the "do_put" PyArrow library method stops the kernel

When you run the FlightClient do_put method and you pass the value "None" as the schema argument, the kernel crashes.

Workaround

Ensure that a valid value of type "Schema" is passed as the schema argument to the FlightClient do_put method. The "None" value should not be used for the schema argument or any other required argument.

For example, do not use:

schema = None
flight_client.do_put(flight_descriptor, schema)

Ignore fontconfig errors when importing matplotlib.pyplot to a notebook

The first time you import matplotlib.pyplot into a notebook, you might see fontconfig related errors like "invalid doctype fontconfig" for some of font types. You can ignore these errors.

Applies to: 4.6.0 and later

Fixed in: 4.6.3

Jupyter notebooks and JupyterLab freeze when using mamba to install a larger package

If you use the !mamba install <package-name> command directly in a Jupyter notebook or in JupyterLab, and the package size is large, the notebook or JupyterLab will freeze.

Workaround

Instead of using ! to install mamba packages, use the command:

%system mamba install -c conda-forge vaex

or use the quiet flag:

mamba install -q -c conda-forge vaex

Applies to: 4.6.0

Fixed in: 4.6.1

Error stack trace is missing first line after the "Insert to code" function fails in a Spark Scala notebook

When the "Insert to code" function fails in a Spark Scala notebook, the first line of the error stack trace might be missing.

Applies to: 4.6.0

Fixed in: 4.6.1

Insert to code fails when the Flight service load is very high

When working with the "Insert to code" function in a notebook after upgrading from Cloud Pak for Data 4.5, you might see an error stating that the Flight service is unavailable because the server concurrency limit was reached. The reason for this error occurring is the overall high load on the Flight service and its inability to process any further requests. This error does not mean that there is a problem with the code in your notebook.

Workaround

If you see the error when running a notebook interactively, try running the cell again. If you see the error in the log of a job, try running the job again, if possible at a time when the system is less busy.

Applies to: 4.6.0

Fixed in: 4.6.1

No access to MS-SQL assets when using the deprecated Insert to code

When you try to access an MS-SQL connected asset by using the "Insert to code" function with the deprecated Pandas dataframe code, you might get a Login failed error. This happens in environments that use JDBC to access MS-SQL connected assets that have Active Directory set up.

Workaround

Use the non-deprecated version of the Pandas dataframe insertion code or add this snippet to the generated code that fails:

encrypt=false;integratedSecurity=true;authenticationScheme=ntlm

For example:

MicrosoftSQL_connection <- dbConnect(drv,
    paste("jdbc:sqlserver://", MicrosoftSQL_credentials[][["host"]], ":", MicrosoftSQL_credentials[][["port"]], ";databaseName=", MicrosoftSQL_credentials[][["database"]],";encrypt=false;integratedSecurity=true;authenticationScheme=ntlm", sep=""),
    MicrosoftSQL_credentials[][["username"]],
    MicrosoftSQL_credentials[][["password"]])

Applies to: 4.6.x

Can't authenticate when adding data from a locked connection by using the Insert to code function

When you use the Insert to code feature to add data from a locked connection to your notebook, the authentication fields are not displayed properly.

Workaround

Open the connection in the project UI and add your credentials there. For details on how to do that, refer to Adding connections to projects.

Applies to: 4.6.4, 4.6.5

Error when trying to access data in an Oracle database

If you try to access data in an Oracle database, you might get a DatabaseError if the schema or table name contains special characters such as the period . character. The reason for this is that Oracle uses periods as separators between schemas, tables, and columns. If this issue occurs, consider removing any periods from the table name or schema of your database or adapt your code to surround the table name or schema identifier with double quotes i.e. my_schema."table.with.dots".

Known issues for projects

Importing connected data assets into projects with Git integration fails with error

Error adding connected data message is received when you attempt to import connected data assets into projects with Git integration.

Fixed in: This error no longer appears in Cloud Pak for Data version 4.6.4 and later. Projects with Git integration do not support connected folder assets.

Collaborators added via users group do not receive notifications in personal inbox

When a notification is sent to a project, users who are collaborators added via users groups will not receive those notifications in their personal inbox.

Applies to: 4.6.0

Fixed in: 4.6.3

Downloading a data asset from a Cloud Object Storage connection can result in a timeout

Downloading a data asset from a Cloud Object Storage connection will timeout if the source connection was created without specifiying a bucket, secret key, and access key.

Workarounds

  • You can create a new connection or edit the existing connection to include a secret key and access key using the credentials dropdown; or
  • Create a new connection where the bucket is specified, and import the file you wish to download from there.

Applies to: 4.6.0 and later

Incorrect password imports project successfully with falsely decrypted properties, and error message is not received

If the exported file that you select to import was encrypted, you must enter the password that was used for encryption to enable decrypting sensitive connection properties.

If you enter the incorrect password to import a local file, an error message is not received and the file imports successfully with falsely decrypted sensitive connection properties.

Applies to: 4.6.0 and later

Can't use imported platform connections in a space created from a project using Git archive from a different cluster

If you export assets from a project with default Git integration by creating a Git archive file (a ZIP file) and then create a deployment space by importing this ZIP file, the space is created successfully. However, if the project contains platform connections imported from a different cluster, these will fail to be imported.

Workaround

Recreate the platform connections as a local connection in your project.

Option to log all project activities is enabled but project logs don't contain activities and return an empty list

Log all project activities is enabled, but the project logs don't contain activities and return an empty list.

Workaround: If the project logs are empty after 30 minutes or more, restart the rabbitmq pod by completing the following steps:

  1. Search for all the sts (stateful sets) of rabbitmq by running oc get pods | grep rabbitmq-ha.

    This will return 3 pods:

    [root@api.xen-ss-ocs-408-409.cp.fyre.ibm.com ~]# oc get pods | grep rabbitmq-ha
    rabbitmq-ha-0                                                     1/1     Running     0             4d6h
    rabbitmq-ha-1                                                     1/1     Running     0             4d6h
    rabbitmq-ha-2                                                     1/1     Running     0             4d7h
    
  2. Restart each pod by running oc delete pod rabbitmq-ha-0 rabbitmq-ha-1 rabbitmq-ha-2.

Applies to: 4.6.0 and later

Known issues for data visualizations

Masked data is not supported in data visualizations

Masked data is not supported in data visualizations. If you attempt to work with masked data while generating a chart in the Visualizations tab of a data asset in a project the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported.

Applies to: 4.6.4

Known issues for Watson Machine Learning

The Flight service returns "Received RST_STREAM with error code 3" when reading large datasets

If you use the Flight service and pyarrow to read large datasets in an AutoAI experiment in a notebook, the Flight service might return the following message:

Received RST_STREAM with error code 3

When this error occurs, the AutoAI experiment receives incomplete data, which can affect training of the model candidate pipelines.

If this error occurs, add the following code to your notebook:

os.environ['GRPC_EXPERIMENTAL_AUTOFLOWCONTROL'] = 'false'

Then, re-run the experiment.

Applies to: 4.6 and later

Predictions API in Watson Machine Learning service can timeout too soon

If the predictions API (POST /ml/v4/deployments/{deployment_id}/predictions) in the Watson Machine Learning deployment service is timing out too soon, follow these steps to manually update the timeout interval.

  1. Update the API timeout parameter in Watson Machine Learning CR:

    oc patch wmlbase wml-cr -p '{"spec":{"wml_api_timeout": <REQUIRED_TIMEOUT_IN_SECONDS>, "wml_envoy_pods": 1}}'  --type=merge -n <NAMESPACE>
    

    For example, to update the timeout to 600 seconds:

    oc patch wmlbase wml-cr -p '{"spec":{"wml_api_timeout": 600, "wml_envoy_pods": 1}}'  --type=merge -n zen
    

    Note: If there is a need to support higher throughput of Watson Machine Learning prediction API requests, you can increase the number of Watson Machine Learning envoy pods using the parameter wml_envoy_pods in the previous command. One envoy pod can support upto 1500 requests/sec.

  2. Restart the NGINX pods:

    oc rollout restart deployment ibm-nginx
    
  3. Check that the NGINX pods have come up:

    oc get pods | grep "ibm-nginx"
    

Applies to: 4.6.0 and later

Decision Optimization deployment job fails with error: "Add deployment failed with deployment not finished within time"

If your decision optimization deployment job fails with the following error, complete the steps to extend the time-out window.

"status": {
     "completed_at": "2022-09-02T02:35:31.711Z",
     "failure": {
         "trace": "0c4c4308935a3c4f2d9987b22139c61c",
         "errors": [{
              "code": "add_deployment_failed_in_runtime",
              "message": "Add deployment failed with deployment not finished within time"
         }]
     },
     "state": "failed"
   }

To update the DO deployment timeout in the deployment manager:

  1. Edit the wmlbase wml-cr and add this line: ignoreForMaintenance: true. This sets the WML operator into maintenance mode which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.

    oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>
    

    For example:

    oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n zen
    
  2. Capture the contents of the wmlruntimemanager configmap in a YAML file.

    oc get cm wmlruntimemanager -n <namespace> -o yaml > wmlruntimemanager.yaml
    

    For example:

    oc get cm wmlruntimemanager -n zen -o yaml > wmlruntimemanager.yaml
    
  3. Create a backup of the wmlruntimemanager YAML file.

    cp wmlruntimemanager.yaml wmlruntimemanager.yaml.bkp
    
  4. Open the wmlruntimemanager.yaml.

    vi wmlruntimemanager.yaml
    
  5. Navigate to file runtimeManager.conf and search for property service.

    Release 4.6.0:

    Add the relevant section. For example, add a jobs section after the section service:

    service {
    
     //Add "jobs" section after the section "service"
    
         jobs {
    
             do {
                 check_deployment_status {
                     retry_count = 420
                     retry_delay = 1000  // In milliseconds
                 }
             }
         }
    
    

    Release 4.6.1 and later:

    Increase the number of retries in the retry_count field to extend the timeout window:

    service {
    
         jobs {
    
             do {
                 check_deployment_status {
                     retry_count = 420   // Increase the number of retries to extend the timeout window
                     retry_delay = 1000
                 }
             }
         }
    
    

    Where:

    • Field retry_count = Number of retries
    • Field retry_delay = Delay between each retry in milliseconds

    In the example, timeout is configured as 7 minutes (retry_count * retry_delay = 420 * 1000 = 7 minutes). If you want to increase the timeout further, you can increase the number of retries in the retry_count field.

  6. Apply the deployment manager configmap changes:

    oc delete -f wmlruntimemanager.yaml
    oc create -f wmlruntimemanager.yaml
    
    
  7. Restart the deployment manager pods:

    oc get pods -n <namespace> | grep wml-deployment-manager
    
    oc delete pod <podname> -n <namespace>
    
    
  8. Wait for the deployment manager pod to come up:

    oc get pods -n <namespace> | grep wml-deployment-manager
    

Note: If you plan to upgrade the Cloud Pak for Data cluster, you must bring the WML operator out of maintenance mode by setting the field ignoreForMaintenance to false in wml-cr.

Applies to: 4.6 and later

Online deployment pertaining to a custom library based model fails with error: deployment id <deployment_id> already in use

If your online deployment pertaining to a custom library based model fails with error: deployment id <deployment_id> already in use, complete these steps to extend the time-out window:

  1. Edit the wmlbase wml-cr and add this line: ignoreForMaintenance: true. This sets the WML operator into maintenance mode which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.

    oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>
    

    For example:

    oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n zen
    
  2. Capture the contents of the wmlruntimesidecar configmap in a YAML file.

    oc get cm wmlruntimesidecar -n <namespace> -o yaml > wmlruntimesidecar.yaml
    
  3. Create a backup of the wmlruntimesidecar YAML file.

    cp wmlruntimesidecar.yaml wmlruntimesidecar.yaml.bkp
    
  4. In the wmlruntimesidecar.yaml file, search for the service property and increase the write_timeout value.

    [service]
    port = "16500"
    read_timeout = 120
    read_header_timeout = 120
    write_timeout = 7200     // increase the write_timeout value
    
  5. In the wmlruntimesidecar.yaml file, search for the http_client.localhost property and increase the time_out value.

    [http_client.localhost]
    max_idle_conns = 100
    max_idle_conns_per_host = 100
    max_conns_per_host = 100
    idle_conn_timeout = 10
    time_out = 7200   // increase the time_out value
    
  6. Apply the sidecar configmap changes

    oc delete -f wmlruntimesidecar.yaml
    oc create -f wmlruntimesidecar.yaml
    
  7. Restart the deployment manager pods:

    oc get pods -n <namespace> | grep wml-deployment-manager
    
    oc delete pod <podname> -n <namespace>
    
    
  8. Wait for the deployment manager pod to come up:

    oc get pods -n <namespace> | grep wml-deployment-manager
    

Applies to: 4.6.0 and 4.6.1

Previewing masked data assets is blocked in deployment space

A data asset preview may fail with this message: This asset contains masked data and is not supported for preview in the Deployment Space Deployment spaces currently don't support masking data so the preview for masked assets has been blocked to prevent data leaks.

Applies to: 4.6 and later

If you are adding collaborators from the Deployments page of a space, you must enter search input as all lowercase, or the search will fail.

Applies to: 4.7 and later

Limitations

These limitations apply to Watson Studio and the services that require Watson Studio.

Limitations for assets

Can't load CSV files to projects that are larger that 20 GB

You can't load a CSV file to a project in Cloud Pak for Data that is larger than 20 GB.

Limitations for previews of assets

You can't see previews of these types of assets:

  • Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
  • Connected data assets for image files in projects.
  • Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
  • Connected data assets for PDF files in projects.

Limitations for Hadoop integration

The Livy service does not restart when a cluster is rebooted

The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.

Workaround: Restart the Livy service.

Limitations for jobs

Job run has wrong environment variable values if special characters are used

Environment variables defined in the job configuration are not passed correctly to the job runs if the variable values contain special characters. This might lead to job run failures, or the incorrect behavior of job runs. To resolve the problem, see Job run has wrong environment variable values if special characters are used.

Job runs fail after environments are deleted or Cloud Pak for Data has been upgraded

Job runs in deployment spaces or projects fail if the job is using an environment that has been deleted or is no longer supported after a Cloud Pak for Data version upgrade. To get the job running again, edit the job to point to an alternative environment.

To prevent job runs from failing due to an upgrade, you can use either of the following methods:

Migrate your environments before upgrading Cloud Pak for Data. For details, see:

Limitations for projects

Unable to sync deprecated Git projects when all assets have been deleted

If you delete all assets from a deprecated Git project, the project can no longer sync with the Git repository.

Workaround: Retain at least one asset in the deprecated Git project.

Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration

You shouldn't use the Git repository from a project with deprecated Git integration in a project with default Git integration as this can result in an error. For example, in Bitbucket, you will see an error stating that the repository contains content from a deprecated Git project although the selected branch contains default Git project content.

In a project with default Git integration, you can either use a new clean Git repository or link to one that was used in a project with default Git integration.

Import of a project larger than 1 GB in Watson Studio fails

If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.

Export of a large project in Watson Studio fails with a time-out

If you are trying to export a project with a large number of assets (for example, more than 7000), the export process can time-out and fail. In that case, although you could export assets in subsets, the recommended solution is to export using the APIs available from the CPDCTL command line interface tool.

Scheduling jobs is unsupported in Git-based projects

In Git-based projects, you must run all jobs manually. Job scheduling is not supported.

Can't include a Cognos dashboard when exporting a project to desktop

Currently, you cannot select a Cognos dashboard when you export a project to desktop.

Workaround:

Although you cannot add a dashboard to your project export, you can move a dashboard from one project to the another.

To move a dashboard to another project:

  1. Download the dashboard JSON file from the original project. Download a dashboard to the desktop
  2. Export the original project to desktop by clicking  the Export to desktop icon from the project toolbar.
  3. Create a new project by importing the project ZIP with the required data sources.
  4. Create a new dashboard by clicking the From file tab and adding the JSON file you downloaded from the original project. Create a dashboard from file
  5. A dialog box will pop up asking you if you want to re-link each of your data sources. Click the re-link button and select the asset in the new project that corresponds to the data source.

Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster

If your project is associated with a Git repository that was used in a project in another cluster and contains connections that require a JDBC driver, the connections will not work in your project. If you upload the required JDBC JAR file, you will see an error stating that the JDBC driver could not be initialized.

This error is caused by the JDBC JAR file that is added to the connection as a presigned URI. This URI is not valid in a project in another cluster. The JAR file can no longer be located even if it exists in the cluster, and the connection will not work.

Workaround

To use any of these connections, you need to create new connections in the project. The following connections require a JDBC driver and are affected by this error situation:

  • Db2 for i
  • Db2 for z/OS
  • Generic JDBC
  • Hive via Execution Engine for Apache Hadoop
  • Impala via Execution Engine for Apache Hadoop
  • SAP HANA
  • Exasol

Limitations for Watson Machine Learning

Restrictions for IBM Z and IBM LinuxONE users

When Cloud Pak for Data is installed on the IBM Z and LinuxONE platforms, Watson Studio and Watson Machine Learning users will not be able to use, run, or deploy the following types of assets:

  • Data processed using Data Refinery
  • Assets trained using AutoAI, Federated Learning, Decision Optimization, SPSS Modeler, Watson Machine Learning, or Hadoop
  • Assets trained using RStudio, such as RShiny apps or assets based on the R framework
  • Assets based on these runtimes: Spark, Python 3.7 or ONNX 1.8.1
  • Deep Learning assets built with TensorFlow or PyTorch 1.8.0 frameworks

Additionally, note the following:

  • Attempting to use, train, or deploy unsupported assets on Cloud Pak for Data running on an IBM Z or LinuxONE platform will fail with an error.
  • Backup and restore is not currently available on IBM Z and LinuxONE platform.
  • With the default runtimes, models trained on other platforms and deployed on IBM Z and Linux ONE might not work as expected. A potential solution is to deploy the model on a custom Python runtime.
  • Insert to code function on IBM Z can cause kernel failure

Applies to: 4.6.0 and later

Deploying a model on S90X cluster might require retraining

Training an AI model on a different platforms such as x86/ppc and deploying the AI model on s390x using Watson Machine Learning might fail because of an endianness issue. In such cases, retrain and deploy the existing AI model on the s390x platform to resolve the problem.

Applies to: 4.6.0 and later

Limits on size of model deployments

Limits on the size of models you deploy with Watson Machine Learning depend on factors such as the model framework and type. In some instances, when you exceed a threshold, you will be notified with an error when you try to store a model in the Watson Machine Learning repository, for example: OverflowError: string longer than 2147483647 bytes. In other cases, the failure might be indicated by a more general error message, such as The service is experiencing some downstream errors, please re-try the request or There's no available attachment for the targeted asset. Any of these results indicate that you have exceeded the allowable size limits for that type of deployment.

Applies to: 4.6.0 and later

Maximum number of feature columns in AutoAI experiments

The maximum number of feature columns for a classification or regression experiment is 5000.

No support for Cloud Pak for Data authentication with storage volume connection

You cannot use a storage volume connection with the 'Cloud Pak for Data authentication' option enabled as a data source in an AutoAI experiment. AutoAI does not currently support the user authentication token. Instead, disable the 'Cloud Pak for Data authentication' option in storage volume connection to use the connection as a data source in AutoAI experiment.

Applies to: 4.6.5 and higher

Automatic mounting of storage volumes not supported by online and batch deployments

You cannot use automatic mounts for storage volumes with Watson Machine Learning online and batch deployments. Watson Machine Learning does not support this feature for Python-based runtimes, including R-script, SPSS Modeler, Spark, and Decision Optimization. You can only use automatic mounts for storage volumes with Watson Machine Learning shiny app deployments and notebook runtimes.

As a workaround, you can use the download method from the Data assets library, which is a part of the ibm-watson-machine-learning python client.

Applies to: 4.6 and later

AutoAI time series notebook error requires update of import-tracker library

Following an upgrade, running a pipeline notebook for an AutoAI time series forecasting experiment might result in an error loading the import-tracker library. Do one of the following to resolve the error:

  • Run the cell twice to dismiss the error.
  • Update the import-tracker library by adding !pip install -U import-tracker to a cell at the beginning of the notebook.

Applies to: 4.6.0 and later

Batch deployments that use large data volumes as input might fail

Applies to: 4.6.0 and later

If you are scoring a batch job that uses a large volumes of data as the input source, the job might fail becase of internal timeout settings. A symptom of this problem might be an error message similar to the following example:

Incorrect input data: Flight returned internal error, with message: CDICO9999E: Internal error occurred: Snowflake sQL logged error: JDBC driver internal error: Timeout waiting for the download of #chunk49(Total chunks: 186) retry=0.

If the timeout occurs when you score your batch deployment, you must configure the data source query level timeout limitation to handle long-running jobs.

Query-level timeout information for data sources is as follows:

Information about query-level time limitation for data sources
Data source Query level time limitation Default time limit Modify default time limit
Apache Cassandra Yes 10 seconds Set the read_timeout_in_ms and write_timeout_in_ms parameters in the Apache Cassandra configuration file or in the Apache Cassandra connection URL to change the default time limit.
Cloud Object Storage No N/A N/A
Db2 Yes N/A Set the QueryTimeout parameter to specify the amount of time (in seconds) that a client waits for a query execution to complete before a client attempts to cancel the execution and return control to the application.
Hive via Execution Engine for Hadoop Yes 60 minutes (3600 seconds) Set the hive.session.query.timeout property in the connection URL to change the default time limit.
Microsoft SQL Server Yes 30 seconds Set the QUERY_TIMEOUT server configuration option to change the default time limit.
MongoDB Yes 30 seconds Set the maxTimeMS parameter in the query options to change the default time limit.
MySQL Yes 0 seconds (No default time limit) Set the timeout property in the connection URL or in the JDBC driver properties to specify a time limit for your query.
Oracle Yes 30 seconds Set the QUERY_TIMEOUT parameter in the Oracle JDBC driver to specify the maximum amount of time a query can run before it is automatically cancelled.
PostgreSQL No N/A Set the queryTimeout property to specify the maximum amount of time that a query can run. The default value of the queryTimeout property is 0.
Snowflake Yes 6 hours Set the queryTimeout parameter to change the default time limit.

To avoid your batch deployments from failing, partition your data set or decrease its size.

Batch deployment jobs that use large inline payload might get stuck in starting or running state

Applies to: 4.6.0 and later

If you provide a large asynchronous payload for your inline batch deployment, it can result in the runtime manager process to go out of heap memory.

In the following example, 92 MB of payload was passed inline to the batch deployment which resulted in the heap to go out of memory.

Uncaught error from thread [scoring-runtime-manager-akka.scoring-jobs-dispatcher-35] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[scoring-runtime-manager]
java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
	at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
	at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
	at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
   ...

This could result in concurrent jobs getting stuck in starting or running state. The starting state can only be cleared once the deployment is deleted and a new deployement is created. The running state can be cleared without deleting the deployment.

As a workaround, use data references instead of inline for huge payloads that are provided to batch deployments.

Parent topic: Limitations and known issues in IBM Cloud Pak for Data