Known issues and limitations for Watson Studio and supplemental services
These known issues and limitations apply to Watson Studio and the services that require Watson Studio.
Known issues
-
Known issues for Anaconda Repository for IBM Cloud Pak for Data
-
Known issues for Data Refinery
- Data protection rules do not always mask data in Data Refinery visualizations
- "View asset activity" does not work on a shaped data asset
- Restriction for refining data from an Exasol connection
- Tokenize GUI operation might not work on large data assets
- Duplicate connections in a space resulting from promoting a Data Refinery flow to a space
- Personal credentials are not supported for connected data assets in Data Refinery
-
Known issues for Federated Learning
-
Known issues for Hadoop integration
-
Known issues for jobs
-
Known issues for notebooks
- Failure to export a notebook to HTML in the Jupyter Notebook editor
- Passing the value "None" as the schema argument in the "do_put" PyArrow library method stops the kernel
- Ignore fontconfig errors when importing matplotlib.pyplot to a notebook
- Jupyter notebooks and JupyterLab freeze when using mamba to install a larger package
- Error stack trace is missing first line after the "Insert to code" function fails in a Spark Scala notebook
- Insert to code fails when the Flight service load is very high
- No access to MS-SQL assets when using the deprecated Insert to code
- Impossible to authenticate when adding data from a locked connection by using the Insert to code function
- Error when trying to access data in an Oracle database
-
Known issues for projects
- Importing connected data assets into projects with Git integration fails with error
- Collaborators added via users group do not receive notifications in personal inbox
- Downloading a data asset from a Cloud Object Storage connection can result in a timeout
- Incorrect password imports project successfully with falsey decrypted properties, and error message is not received
- Can't use imported platform connections in a space created from a project using Git archive from a different cluster
- Option to log all project activities is enabled but project logs don't contain activities and return an empty list
-
Known issues for data visualizations
-
Known issues for Watson Machine Learning
- The Flight service returns "Received RST_STREAM with error code 3" when reading large datasets
- Predictions API in Watson Machine Learning service can timeout too soon
- Decision Optimization deployment job fails with error:
Add deployment failed with deployment not finished within time
- Online deployment pertaining to a custom library based model fails with error:
deployment id <deployment_id> already in use
- Previewing masked data assets is blocked in deployment space
- Search to add collaborators for a deployment requires lowercase
Known issues for Anaconda Repository for IBM Cloud Pak for Data
Channel names for Anaconda Repository for IBM Cloud Pak for Data don't support double-byte characters
When you create a channel in Anaconda Team Edition, you can't use double-byte characters or most special characters. You can use only these characters: a-z 0-9 - _
Known issues for Data Refinery
Data protection rules do not always mask data in Data Refinery visualizations
If you set up data protection rules for an asset, data protection rules are not always enforced. As a result, in some circumstances the data can be seen in Data Refinery Visualizations charts.
"View asset activity" does not work on a shaped data asset
When you run a job for a Data Refinery flow, by default the shaped data asset, source-file-name_shaped.csv, is added to your project assets. If you open the asset from the All assets page, the View asset activity action does not work.
Applies to: 4.6.4 and later
Restriction for refining data from an Exasol connection
You cannot refine data from an Exasol data source if the table name includes spaces or special characters.
Applies to: 4.6.0
Fixed in: 4.6.1
Tokenize GUI operation might not work on large data assets
Data Refinery flow jobs that include the Tokenize GUI operation might fail for large data assets.
Applies to: 4.6.0 and later
Duplicate connections in a space resulting from promoting a Data Refinery flow to a space
When you promote a Data Refinery flow to a space, all dependent data is promoted as well. If the Data Refinery flow that is being promoted has a dependent connection asset and a dependent connected data asset that references the same connection asset, the connection asset will be duplicated in the space.
The Data Refinery flow will still work. Do not delete the duplicate connections.
Applies to: 4.6.0 and later
Personal credentials are not supported for connected data assets in Data Refinery
If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.
Workaround:
- Go to the project page, and click the link for the connected data asset to open the preview.
- Enter credentials.
- Open Data Refinery and use the authenticated connected data asset for a source or target.
Applies to: 4.6.0 and later
Known issues for Federated Learning
Authentication failures for Federated Learning training jobs when allowed IPs are specified in the Remote Training System
Currently, the Openshift Ingress Controller is not setting the X-Forwarded-For
header with the client's IP address regardless of the forwardedHeaderPolicy
setting. This will cause authentication failures for Federated
Learning training jobs when allowed_ips
are specified in the Remote Training System even though the client IP address is correct.
To use the Federated Learning Remote Training System IP restriction feature in Cloud Pak for Data 4.0.3, configure an external proxy to inject the X-Forwarded-For
header. For more information, see this article on configuring ingress.
Applies to: 4.6.0 and later
Federated Learning failed to create fl job
Your Federated Learning job might fail with a message that contains "WML API response 'attachments'".
The issue possibly results from using an older model on a newer experiment on projects that use Git storage. Please review and update your model to conform to the latest spec. See the most recent Frameworks, fusion methods, and Python versions.
Applies to: 4.6.0 and later
Unsupported software spec from upgrading might cause experiment to fail
After upgrading to Cloud Pak for Data 4.6, rerunning or reconfiguring a Federated Learning experiment from Cloud Pak for Data 3.5 with unsupported software specifications may fail. After upgrading to 4.6.x, create a new Federated Learning experiment using supported software specifications. For more details, see Frameworks, fusion methods, and Python versions.
Applies to: 4.6.0 and later
Known issues for Hadoop integration
Support for Spark versions
-
Apache Spark 3.1 for Power is not supported.
Applies to: 4.6.0 and later
-
To run Jupyter Enterprise Gateway (JEG) on Cloud Pak for Data 4.6.3, you must run the following commands as the first cell after the kernel starts:
from pyspark.sql import SparkSession from pyspark import SparkContext spark = SparkSession.builder.getOrCreate() sc = SparkContext.getOrCreate()
Applies to: 4.6.3
Failure to connect to Impala via Execution Engine for Hadoop
On CDP version 7.1.8, the JDBC client fails and you receive the following SQL error message when you try to connect to Impala via Execution Engine for Hadoop:
SQL error: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Socket is closed by peer. ExecuteStatement for query "SHOW DATABASES".
Workaround: Set the property -idle_client_poll_period_s=0
to 0 and restart Impala:
- Go to Cloudera Manager.
- From the home page, click the Status tab.
- Select Impala.
- Click the Configuration tab.
- In the Impala Command Line Argument Advanced Configuration Snippet (impalad_cmd_args_safety_valve), add the property:
-idle_client_poll_period_s=0
. - Restart Impala.
Known issues for jobs
Scheduled jobs don't run after upgrading from Cloud Pak for Data 4.0.9
After you have upgraded from Cloud Pak for Data 4.0.9 to 4.5, the schedules in your existing jobs will not run. The last recorded runs for those jobs are from before you upgraded.
Workaround
To run your existing jobs on a schedule again:
- Either delete the existing schedule and add a new one to the job.
- Or change the existing schedule in some way, for example by changing the start date.
Excluding days when scheduling a job causes unexpected results
If you select to schedule a job to run every day of the week excluding given days, you might notice that the scheduled job does not run as you would expect. The reason might be due to a discrepancy between the timezone of the user who creates the schedule, and the timezone of the master node where the job runs.
This issue only exists if you exclude days of a week when you schedule to run a job.
Can't delete notebook job stuck in starting or running state
If a notebook job is stuck in starting or running state and won't stop, although you tried to cancel the job and stopped the active environment runtime, you can try deleting the job by removing the job-run asset manually using the API.
-
Retrieve a bearer token from the user management service using an API call:
curl -k -X POST https://PLATFORM_CLUSTER_URL/icp4d-api/v1/authorize -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{"username":"your_username","password":"your_password"}'
-
(Optional) Get the job-run asset and test the API call. Replace
${token}
,${asset_id}
, and${project_id}
accordingly.curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X GET "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
-
Delete the job-run asset. Again replace
${token}
,${asset_id}
, and${project_id}
accordingly.curl -H 'accept: application/json' -H 'Content-Type: application/json' -H "Authorization: Bearer ${token}" -X DELETE "<PLATFORM_CLUSTER_URL>/v2/assets/${asset_id}?project_id=${project_id}"
Can't change the schedule in existing jobs after upgrading to Cloud Pak for Data 4.0.7
If you created scheduled jobs in earlier versions of Cloud Pak for Data and are upgrading from a version before Cloud Pak for Data 4.0.7, you can't change or remove the schedule from these existing jobs.
Workaround
If you need to change the schedule in an existing job after upgrading from a version from before Cloud Pak for Data version 4.0.7:
- Delete the existing job.
- Create a new scheduled job.
For details, see Creating and managing jobs in a project.
Known issues for notebooks
Failure to export a notebook to HTML in the Jupyter Notebook editor
When you are working with a Jupyter Notebook created in a tool other than Watson Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.
Workaround
-
In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.
-
Remove the following metadata:
"widgets": { "state": {}, "version": "1.1.2" }
-
Click Edit.
-
Save the notebook.
Passing the value "None" as the schema argument in the "do_put" PyArrow library method stops the kernel
When you run the FlightClient do_put
method and you pass the value "None" as the schema argument, the kernel crashes.
Workaround
Ensure that a valid value of type "Schema" is passed as the schema argument to the FlightClient do_put method. The "None" value should not be used for the schema argument or any other required argument.
For example, do not use:
schema = None
flight_client.do_put(flight_descriptor, schema)
Ignore fontconfig errors when importing matplotlib.pyplot to a notebook
The first time you import matplotlib.pyplot
into a notebook, you might see fontconfig related errors like "invalid doctype fontconfig" for some of font types. You can ignore these errors.
Applies to: 4.6.0 and later
Fixed in: 4.6.3
Jupyter notebooks and JupyterLab freeze when using mamba to install a larger package
If you use the !mamba install <package-name>
command directly in a Jupyter notebook or in JupyterLab, and the package size is large, the notebook or JupyterLab will freeze.
Workaround
Instead of using !
to install mamba packages, use the command:
%system mamba install -c conda-forge vaex
or use the quiet flag:
mamba install -q -c conda-forge vaex
Applies to: 4.6.0
Fixed in: 4.6.1
Error stack trace is missing first line after the "Insert to code" function fails in a Spark Scala notebook
When the "Insert to code" function fails in a Spark Scala notebook, the first line of the error stack trace might be missing.
Applies to: 4.6.0
Fixed in: 4.6.1
Insert to code fails when the Flight service load is very high
When working with the "Insert to code" function in a notebook after upgrading from Cloud Pak for Data 4.5, you might see an error stating that the Flight service is unavailable because the server concurrency limit was reached. The reason for this error occurring is the overall high load on the Flight service and its inability to process any further requests. This error does not mean that there is a problem with the code in your notebook.
Workaround
If you see the error when running a notebook interactively, try running the cell again. If you see the error in the log of a job, try running the job again, if possible at a time when the system is less busy.
Applies to: 4.6.0
Fixed in: 4.6.1
No access to MS-SQL assets when using the deprecated Insert to code
When you try to access an MS-SQL connected asset by using the "Insert to code" function with the deprecated Pandas dataframe code, you might get a Login failed
error. This happens in environments that use JDBC to access
MS-SQL connected assets that have Active Directory set up.
Workaround
Use the non-deprecated version of the Pandas dataframe insertion code or add this snippet to the generated code that fails:
encrypt=false;integratedSecurity=true;authenticationScheme=ntlm
For example:
MicrosoftSQL_connection <- dbConnect(drv,
paste("jdbc:sqlserver://", MicrosoftSQL_credentials[][["host"]], ":", MicrosoftSQL_credentials[][["port"]], ";databaseName=", MicrosoftSQL_credentials[][["database"]],";encrypt=false;integratedSecurity=true;authenticationScheme=ntlm", sep=""),
MicrosoftSQL_credentials[][["username"]],
MicrosoftSQL_credentials[][["password"]])
Applies to: 4.6.x
Can't authenticate when adding data from a locked connection by using the Insert to code function
When you use the Insert to code feature to add data from a locked connection to your notebook, the authentication fields are not displayed properly.
Workaround
Open the connection in the project UI and add your credentials there. For details on how to do that, refer to Adding connections to projects.
Applies to: 4.6.4, 4.6.5
Error when trying to access data in an Oracle database
If you try to access data in an Oracle database, you might get a DatabaseError
if the schema or table name contains special characters such as the period .
character. The reason for this is that Oracle uses periods
as separators between schemas, tables, and columns. If this issue occurs, consider removing any periods from the table name or schema of your database or adapt your code to surround the table name or schema identifier with double quotes
i.e. my_schema."table.with.dots"
.
Known issues for projects
Importing connected data assets into projects with Git integration fails with error
Error adding connected data
message is received when you attempt to import connected data assets into projects with Git integration.
Fixed in: This error no longer appears in Cloud Pak for Data version 4.6.4 and later. Projects with Git integration do not support connected folder assets.
Collaborators added via users group do not receive notifications in personal inbox
When a notification is sent to a project, users who are collaborators added via users groups will not receive those notifications in their personal inbox.
Applies to: 4.6.0
Fixed in: 4.6.3
Downloading a data asset from a Cloud Object Storage connection can result in a timeout
Downloading a data asset from a Cloud Object Storage connection will timeout if the source connection was created without specifiying a bucket, secret key, and access key.
Workarounds
- You can create a new connection or edit the existing connection to include a secret key and access key using the credentials dropdown; or
- Create a new connection where the bucket is specified, and import the file you wish to download from there.
Applies to: 4.6.0 and later
Incorrect password imports project successfully with falsely decrypted properties, and error message is not received
If the exported file that you select to import was encrypted, you must enter the password that was used for encryption to enable decrypting sensitive connection properties.
If you enter the incorrect password to import a local file, an error message is not received and the file imports successfully with falsely decrypted sensitive connection properties.
Applies to: 4.6.0 and later
Can't use imported platform connections in a space created from a project using Git archive from a different cluster
If you export assets from a project with default Git integration by creating a Git archive file (a ZIP file) and then create a deployment space by importing this ZIP file, the space is created successfully. However, if the project contains platform connections imported from a different cluster, these will fail to be imported.
Workaround
Recreate the platform connections as a local connection in your project.
Option to log all project activities is enabled but project logs don't contain activities and return an empty list
Log all project activities
is enabled, but the project logs don't contain activities and return an empty list.
Workaround: If the project logs are empty after 30 minutes or more, restart the rabbitmq pod by completing the following steps:
-
Search for all the sts (stateful sets) of rabbitmq by running
oc get pods | grep rabbitmq-ha
.This will return 3 pods:
[root@api.xen-ss-ocs-408-409.cp.fyre.ibm.com ~]# oc get pods | grep rabbitmq-ha rabbitmq-ha-0 1/1 Running 0 4d6h rabbitmq-ha-1 1/1 Running 0 4d6h rabbitmq-ha-2 1/1 Running 0 4d7h
-
Restart each pod by running
oc delete pod rabbitmq-ha-0 rabbitmq-ha-1 rabbitmq-ha-2
.
Applies to: 4.6.0 and later
Known issues for data visualizations
Masked data is not supported in data visualizations
Masked data is not supported in data visualizations. If you attempt to work with masked data while generating a chart in the Visualizations tab of a data asset in a project the following error message is received: Bad Request: Failed to retrieve data from server. Masked data is not supported
.
Applies to: 4.6.4
Known issues for Watson Machine Learning
The Flight service returns "Received RST_STREAM with error code 3" when reading large datasets
If you use the Flight service and pyarrow to read large datasets in an AutoAI experiment in a notebook, the Flight service might return the following message:
Received RST_STREAM with error code 3
When this error occurs, the AutoAI experiment receives incomplete data, which can affect training of the model candidate pipelines.
If this error occurs, add the following code to your notebook:
os.environ['GRPC_EXPERIMENTAL_AUTOFLOWCONTROL'] = 'false'
Then, re-run the experiment.
Applies to: 4.6 and later
Predictions API in Watson Machine Learning service can timeout too soon
If the predictions API (POST /ml/v4/deployments/{deployment_id}/predictions
) in the Watson Machine Learning deployment service is timing out too soon, follow these steps to manually update the timeout interval.
-
Update the API timeout parameter in Watson Machine Learning CR:
oc patch wmlbase wml-cr -p '{"spec":{"wml_api_timeout": <REQUIRED_TIMEOUT_IN_SECONDS>, "wml_envoy_pods": 1}}' --type=merge -n <NAMESPACE>
For example, to update the timeout to 600 seconds:
oc patch wmlbase wml-cr -p '{"spec":{"wml_api_timeout": 600, "wml_envoy_pods": 1}}' --type=merge -n zen
Note: If there is a need to support higher throughput of Watson Machine Learning prediction API requests, you can increase the number of Watson Machine Learning envoy pods using the parameter
wml_envoy_pods
in the previous command. One envoy pod can support upto 1500 requests/sec. -
Restart the NGINX pods:
oc rollout restart deployment ibm-nginx
-
Check that the NGINX pods have come up:
oc get pods | grep "ibm-nginx"
Applies to: 4.6.0 and later
Decision Optimization deployment job fails with error: "Add deployment failed with deployment not finished within time"
If your decision optimization deployment job fails with the following error, complete the steps to extend the time-out window.
"status": {
"completed_at": "2022-09-02T02:35:31.711Z",
"failure": {
"trace": "0c4c4308935a3c4f2d9987b22139c61c",
"errors": [{
"code": "add_deployment_failed_in_runtime",
"message": "Add deployment failed with deployment not finished within time"
}]
},
"state": "failed"
}
To update the DO deployment timeout in the deployment manager:
-
Edit the
wmlbase wml-cr
and add this line:ignoreForMaintenance: true
. This sets the WML operator into maintenance mode which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>
For example:
oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n zen
-
Capture the contents of the
wmlruntimemanager
configmap in a YAML file.oc get cm wmlruntimemanager -n <namespace> -o yaml > wmlruntimemanager.yaml
For example:
oc get cm wmlruntimemanager -n zen -o yaml > wmlruntimemanager.yaml
-
Create a backup of the
wmlruntimemanager
YAML file.cp wmlruntimemanager.yaml wmlruntimemanager.yaml.bkp
-
Open the
wmlruntimemanager.yaml
.vi wmlruntimemanager.yaml
-
Navigate to file
runtimeManager.conf
and search for propertyservice
.Release 4.6.0:
Add the relevant section. For example, add a
jobs
section after thesection
service:service { //Add "jobs" section after the section "service" jobs { do { check_deployment_status { retry_count = 420 retry_delay = 1000 // In milliseconds } } }
Release 4.6.1 and later:
Increase the number of retries in the
retry_count
field to extend the timeout window:service { jobs { do { check_deployment_status { retry_count = 420 // Increase the number of retries to extend the timeout window retry_delay = 1000 } } }
Where:
Field retry_count
= Number of retriesField retry_delay
= Delay between each retry in milliseconds
In the example, timeout is configured as 7 minutes (
retry_count * retry_delay = 420 * 1000
= 7 minutes). If you want to increase the timeout further, you can increase the number of retries in theretry_count
field. -
Apply the deployment manager configmap changes:
oc delete -f wmlruntimemanager.yaml oc create -f wmlruntimemanager.yaml
-
Restart the deployment manager pods:
oc get pods -n <namespace> | grep wml-deployment-manager oc delete pod <podname> -n <namespace>
-
Wait for the deployment manager pod to come up:
oc get pods -n <namespace> | grep wml-deployment-manager
Note: If you plan to upgrade the Cloud Pak for Data cluster, you must bring the WML operator out of maintenance mode by setting the field ignoreForMaintenance
to false
in wml-cr
.
Applies to: 4.6 and later
Online deployment pertaining to a custom library based model fails with error: deployment id <deployment_id> already in use
If your online deployment pertaining to a custom library based model fails with error: deployment id <deployment_id> already in use
, complete these steps to extend the time-out window:
-
Edit the
wmlbase wml-cr
and add this line:ignoreForMaintenance: true
. This sets the WML operator into maintenance mode which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>
For example:
oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n zen
-
Capture the contents of the
wmlruntimesidecar
configmap in a YAML file.oc get cm wmlruntimesidecar -n <namespace> -o yaml > wmlruntimesidecar.yaml
-
Create a backup of the
wmlruntimesidecar
YAML file.cp wmlruntimesidecar.yaml wmlruntimesidecar.yaml.bkp
-
In the
wmlruntimesidecar.yaml
file, search for theservice
property and increase thewrite_timeout
value.[service] port = "16500" read_timeout = 120 read_header_timeout = 120 write_timeout = 7200 // increase the write_timeout value
-
In the
wmlruntimesidecar.yaml
file, search for thehttp_client.localhost
property and increase thetime_out
value.[http_client.localhost] max_idle_conns = 100 max_idle_conns_per_host = 100 max_conns_per_host = 100 idle_conn_timeout = 10 time_out = 7200 // increase the time_out value
-
Apply the sidecar configmap changes
oc delete -f wmlruntimesidecar.yaml oc create -f wmlruntimesidecar.yaml
-
Restart the deployment manager pods:
oc get pods -n <namespace> | grep wml-deployment-manager oc delete pod <podname> -n <namespace>
-
Wait for the deployment manager pod to come up:
oc get pods -n <namespace> | grep wml-deployment-manager
Applies to: 4.6.0 and 4.6.1
Previewing masked data assets is blocked in deployment space
A data asset preview may fail with this message:
This asset contains masked data and is not supported for preview in the Deployment Space
Deployment spaces currently don't support masking data so the preview for masked assets has been blocked to prevent data leaks.
Applies to: 4.6 and later
Search to add collaborators for a deployment requires lowercase
If you are adding collaborators from the Deployments page of a space, you must enter search input as all lowercase, or the search will fail.
Applies to: 4.7 and later
Limitations
These limitations apply to Watson Studio and the services that require Watson Studio.
-
Limitations for assets
-
Limitations for Hadoop integration
-
Limitations for jobs
-
Limitations for projects
- Unable to sync deprecated Git projects when all assets have been deleted
- Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
- Import of a project larger than 1 GB in Watson Studio fails
- Export of a large project in Watson Studio can timeout
- Scheduling jobs is unsupported in git-based projects
- Can't include a Cognos dashboard when exporting a project to desktop
- Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
-
Limitations for Watson Machine Learning
- Restrictions for IBM Z and IBM LinuxONE users
- Deploying a model on S90X cluster might require retraining
- Limits on size of model deployments
- Automatic mounting of storage volumes not supported by online and batch deployments
- Batch deployments that use large data volumes as input might fail
- Batch deployment jobs that use large inline payload might get stuck in
starting
orrunning
state
-
Limitations for AutoAI experiments
Limitations for assets
Can't load CSV files to projects that are larger that 20 GB
You can't load a CSV file to a project in Cloud Pak for Data that is larger than 20 GB.
Limitations for previews of assets
You can't see previews of these types of assets:
- Folder assets associated with a connection with personal credentials. You are prompted to enter your personal credentials to start the preview or profiling of the connection asset.
- Connected data assets for image files in projects.
- Connected assets with shared credentials of text and JSON files are incorrectly displayed in a grid.
- Connected data assets for PDF files in projects.
Limitations for Hadoop integration
The Livy service does not restart when a cluster is rebooted
The Livy service does not automatically restart after a system reboot if the HDFS Namenode is not in an active state.
Workaround: Restart the Livy service.
Limitations for jobs
Job run has wrong environment variable values if special characters are used
Environment variables defined in the job configuration are not passed correctly to the job runs if the variable values contain special characters. This might lead to job run failures, or the incorrect behavior of job runs. To resolve the problem, see Job run has wrong environment variable values if special characters are used.
Job runs fail after environments are deleted or Cloud Pak for Data has been upgraded
Job runs in deployment spaces or projects fail if the job is using an environment that has been deleted or is no longer supported after a Cloud Pak for Data version upgrade. To get the job running again, edit the job to point to an alternative environment.
To prevent job runs from failing due to an upgrade, you can use either of the following methods:
Migrate your environments before upgrading Cloud Pak for Data. For details, see:
- Migrating Python 3.7 and Python 3.8 environments from Cloud Pak for Data 4.0
- Migrating Python 3.6 and Python 3.7 environments from Cloud Pak for Data 3.5
- Create custom environments based on custom runtime images. Jobs associated with these environments will still run after an upgrade. For details, see Building custom images.
Limitations for projects
Unable to sync deprecated Git projects when all assets have been deleted
If you delete all assets from a deprecated Git project, the project can no longer sync with the Git repository.
Workaround: Retain at least one asset in the deprecated Git project.
Don't use the Git repository from projects with deprecated Git integration in projects with default Git integration
You shouldn't use the Git repository from a project with deprecated Git integration in a project with default Git integration as this can result in an error. For example, in Bitbucket, you will see an error stating that the repository contains content from a deprecated Git project although the selected branch contains default Git project content.
In a project with default Git integration, you can either use a new clean Git repository or link to one that was used in a project with default Git integration.
Import of a project larger than 1 GB in Watson Studio fails
If you create an empty project in Watson Studio and then try to import a project that is larger than 1 GB in size, the operation might fail depending on the size and compute power of the Cloud Pak for Data cluster.
Export of a large project in Watson Studio fails with a time-out
If you are trying to export a project with a large number of assets (for example, more than 7000), the export process can time-out and fail. In that case, although you could export assets in subsets, the recommended solution is to export using the APIs available from the CPDCTL command line interface tool.
Scheduling jobs is unsupported in Git-based projects
In Git-based projects, you must run all jobs manually. Job scheduling is not supported.
Can't include a Cognos dashboard when exporting a project to desktop
Currently, you cannot select a Cognos dashboard when you export a project to desktop.
Workaround:
Although you cannot add a dashboard to your project export, you can move a dashboard from one project to the another.
To move a dashboard to another project:
- Download the dashboard JSON file from the original project.
- Export the original project to desktop by clicking from the project toolbar.
- Create a new project by importing the project ZIP with the required data sources.
- Create a new dashboard by clicking the From file tab and adding the JSON file you downloaded from the original project.
- A dialog box will pop up asking you if you want to re-link each of your data sources. Click the re-link button and select the asset in the new project that corresponds to the data source.
Can't use connections in a Git repository that require a JDBC driver and were created in a project on another cluster
If your project is associated with a Git repository that was used in a project in another cluster and contains connections that require a JDBC driver, the connections will not work in your project. If you upload the required JDBC JAR file, you will see an error stating that the JDBC driver could not be initialized.
This error is caused by the JDBC JAR file that is added to the connection as a presigned URI. This URI is not valid in a project in another cluster. The JAR file can no longer be located even if it exists in the cluster, and the connection will not work.
Workaround
To use any of these connections, you need to create new connections in the project. The following connections require a JDBC driver and are affected by this error situation:
- Db2 for i
- Db2 for z/OS
- Generic JDBC
- Hive via Execution Engine for Apache Hadoop
- Impala via Execution Engine for Apache Hadoop
- SAP HANA
- Exasol
Limitations for Watson Machine Learning
Restrictions for IBM Z and IBM LinuxONE users
When Cloud Pak for Data is installed on the IBM Z and LinuxONE platforms, Watson Studio and Watson Machine Learning users will not be able to use, run, or deploy the following types of assets:
- Data processed using Data Refinery
- Assets trained using AutoAI, Federated Learning, Decision Optimization, SPSS Modeler, Watson Machine Learning, or Hadoop
- Assets trained using RStudio, such as RShiny apps or assets based on the R framework
- Assets based on these runtimes: Spark, Python 3.7 or ONNX 1.8.1
- Deep Learning assets built with TensorFlow or PyTorch 1.8.0 frameworks
Additionally, note the following:
- Attempting to use, train, or deploy unsupported assets on Cloud Pak for Data running on an IBM Z or LinuxONE platform will fail with an error.
- Backup and restore is not currently available on IBM Z and LinuxONE platform.
- With the default runtimes, models trained on other platforms and deployed on IBM Z and Linux ONE might not work as expected. A potential solution is to deploy the model on a custom Python runtime.
- Insert to code function on IBM Z can cause kernel failure
Applies to: 4.6.0 and later
Deploying a model on S90X cluster might require retraining
Training an AI model on a different platforms such as x86/ppc and deploying the AI model on s390x using Watson Machine Learning might fail because of an endianness issue. In such cases, retrain and deploy the existing AI model on the s390x platform to resolve the problem.
Applies to: 4.6.0 and later
Limits on size of model deployments
Limits on the size of models you deploy with Watson Machine Learning depend on factors such as the model framework and type. In some instances, when you exceed a threshold, you will be notified with an error when you try to store a model in
the Watson Machine Learning repository, for example: OverflowError: string longer than 2147483647 bytes
. In other cases, the failure might be indicated by a more general error message, such as The service is experiencing some downstream errors, please re-try the request
or There's no available attachment for the targeted asset
. Any of these results indicate that you have exceeded the allowable size limits for that type of deployment.
Applies to: 4.6.0 and later
Maximum number of feature columns in AutoAI experiments
The maximum number of feature columns for a classification or regression experiment is 5000.
No support for Cloud Pak for Data authentication with storage volume connection
You cannot use a storage volume connection with the 'Cloud Pak for Data authentication' option enabled as a data source in an AutoAI experiment. AutoAI does not currently support the user authentication token. Instead, disable the 'Cloud Pak for Data authentication' option in storage volume connection to use the connection as a data source in AutoAI experiment.
Applies to: 4.6.5 and higher
Automatic mounting of storage volumes not supported by online and batch deployments
You cannot use automatic mounts for storage volumes with Watson Machine Learning online and batch deployments. Watson Machine Learning does not support this feature for Python-based runtimes, including R-script, SPSS Modeler, Spark, and Decision Optimization. You can only use automatic mounts for storage volumes with Watson Machine Learning shiny app deployments and notebook runtimes.
As a workaround, you can use the download method from the Data assets library, which is a part of the ibm-watson-machine-learning python client.
Applies to: 4.6 and later
AutoAI time series notebook error requires update of import-tracker
library
Following an upgrade, running a pipeline notebook for an AutoAI time series forecasting experiment might result in an error loading the import-tracker
library. Do one of the following to resolve the error:
- Run the cell twice to dismiss the error.
- Update the import-tracker library by adding
!pip install -U import-tracker
to a cell at the beginning of the notebook.
Applies to: 4.6.0 and later
Batch deployments that use large data volumes as input might fail
Applies to: 4.6.0 and later
If you are scoring a batch job that uses a large volumes of data as the input source, the job might fail becase of internal timeout settings. A symptom of this problem might be an error message similar to the following example:
Incorrect input data: Flight returned internal error, with message: CDICO9999E: Internal error occurred: Snowflake sQL logged error: JDBC driver internal error: Timeout waiting for the download of #chunk49(Total chunks: 186) retry=0.
If the timeout occurs when you score your batch deployment, you must configure the data source query level timeout limitation to handle long-running jobs.
Query-level timeout information for data sources is as follows:
Data source | Query level time limitation | Default time limit | Modify default time limit |
---|---|---|---|
Apache Cassandra | Yes | 10 seconds | Set the read_timeout_in_ms and write_timeout_in_ms parameters in the Apache Cassandra configuration file or in the Apache Cassandra connection URL to change the default time limit. |
Cloud Object Storage | No | N/A | N/A |
Db2 | Yes | N/A | Set the QueryTimeout parameter to specify the amount of time (in seconds) that a client waits for a query execution to complete before a client attempts to cancel the execution and return control to the application. |
Hive via Execution Engine for Hadoop | Yes | 60 minutes (3600 seconds) | Set the hive.session.query.timeout property in the connection URL to change the default time limit. |
Microsoft SQL Server | Yes | 30 seconds | Set the QUERY_TIMEOUT server configuration option to change the default time limit. |
MongoDB | Yes | 30 seconds | Set the maxTimeMS parameter in the query options to change the default time limit. |
MySQL | Yes | 0 seconds (No default time limit) | Set the timeout property in the connection URL or in the JDBC driver properties to specify a time limit for your query. |
Oracle | Yes | 30 seconds | Set the QUERY_TIMEOUT parameter in the Oracle JDBC driver to specify the maximum amount of time a query can run before it is automatically cancelled. |
PostgreSQL | No | N/A | Set the queryTimeout property to specify the maximum amount of time that a query can run. The default value of the queryTimeout property is 0 . |
Snowflake | Yes | 6 hours | Set the queryTimeout parameter to change the default time limit. |
To avoid your batch deployments from failing, partition your data set or decrease its size.
Batch deployment jobs that use large inline payload might get stuck in starting
or running
state
Applies to: 4.6.0 and later
If you provide a large asynchronous payload for your inline batch deployment, it can result in the runtime manager process to go out of heap memory.
In the following example, 92 MB of payload was passed inline to the batch deployment which resulted in the heap to go out of memory.
Uncaught error from thread [scoring-runtime-manager-akka.scoring-jobs-dispatcher-35] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[scoring-runtime-manager]
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
...
This could result in concurrent jobs getting stuck in starting
or running
state. The starting
state can only be cleared once the deployment is deleted and a new deployement is created. The running
state can be cleared without deleting the deployment.
As a workaround, use data references instead of inline for huge payloads that are provided to batch deployments.
Parent topic: Limitations and known issues in IBM Cloud Pak for Data