Known issues and limitations

The following limitations and known issues apply to Cloud Pak for Data as a Service.

Watson Knowledge Catalog

If you use the Watson Knowledge Catalog, you might encounter these known issues and restrictions when you use catalogs.

Business glossary term associations

If you edit a policy or rule that was created before March 16, 2018, associations between the policy or rule and business terms that are used in the policy or rule are duplicated. When you view the business term, you’ll see the policy or rule listed twice on the Related Content page.

Refresh your browser after adding the Watson Knowledge Catalog service

If you add the Watson Knowledge Catalog service to your account using the Try-Out link on the landing page, you must refresh your browser to see catalog items.

Add collaborators with lowercase email addresses

When you add collaborators to the catalog, enter email addresses with all lowercase letters. Mixed-case email addresses are not supported.

Object Storage connection restrictions

When you look at a Cloud Object Storage (S3 API) or Cloudant connection, the folder itself is listed as a child asset.

Multiple concurrent connection operations might fail

An error might be encountered when multiple users are running connection operations concurrently. The error message can vary.

Can't enable policies after catalog creation

You cannot enable the enforcement of policies after you create a catalog. To apply policies to the assets in a catalog, you must enable enforcement during catalog creation.

Data is not masked in some project tools

When you add a connected data asset that has masked columns from a catalog to a project, the columns remain masked when you view the data and when you refine the data in the Data Refinery tool. Other tools in projects, however, do not preserve masking when accessing data through a connection. For example, when you load connected data in a notebook, you access the data through a direct connection and bypass masking.

Workaround To retain masking of connected data, create a new asset with Data Refinery:

  1. Open the connected data asset and click Refine. Data Refinery automatically includes the data masking steps in the Data Refinery flow that transforms the full data set into the new target asset.
  2. If necessary, adjust the target name and location.
  3. Click the Run button, and then click Save and Run. The new connected data asset is ready to use.
  4. Remove the original connected data asset from your project.

Business glossary terms need manual refresh

In Business Glossary, updates to business terms are not automatically refreshed when navigating back to the Business Glossary page with breadcrumbs. To refresh the Business Glossary page to view all updates, click the same button twice in the alphabet index.

Assets are blocked if evaluation fails

The following restrictions apply to data assets in a catalog with policies enforced: File-based data assets that have a header can't have duplicate column names, a period (.), or single quotation mark (') in a column name.

If evaluation fails, the asset is blocked to all users except the asset owner. All other users see an error message that the data asset cannot be viewed because evaluation failed and the asset is blocked.

Predefined governance artifacts are not available in the new governance artifact experience

If you don't see any predefined classifications or data classes, you need to reinitialize your tenant by using the following API call:

curl -X POST "https://api.dataplatform.cloud.ibm.com/v3/glossary_terms/admin/initialize_content" -H "Authorization: Bearer $BEARER_TOKEN" -k

Upgrading to the new governance artifact experience deletes existing governance artifacts

When you upgrade your governance artifacts experience, any existing business terms, policies, and rules are permanently deleted. Therefore, all term and classification assignments become invalid because the referenced artifacts no longer exist. The same applies to any data masking you might have configured. Profiles are upgraded so that the classification results use the new version of the data classes.

Create and populate a new catalog after the upgrade to avoid mixing old data assets with invalid term and classification assignments with newly added data assets.

Misleading message while the list of enriched assets is updated in the metadata enrichment asset

When the list of assets is updated, for example, after you edit the metadata enrichment asset, the message Assets are being enriched is shown.

The list of enriched assets is not automatically refreshed

After you edit a metadata enrichment asset, the list of data assets in the scope is rebuilt. However, the page is not automatically refreshed. Refresh the web browser to see the updated list.

Data Privacy

If you use Data Privacy, you might encounter these known issues and restrictions when you are privatizing data.

Masking flow jobs might fail

During a masking flow job, Spark might attempt to read all of a data source into memory. Errors might occur when there isn't enough memory to support the job. The largest volume of data that can fit into the largest deployed Spark processing node is approximately 12GBs.

The Rules menu option isn't available

When you open the main menu and choose Governance, but the Rules option isn't available, do the following procedure to resolve the issue:

  1. The account owner must upgrade the tenant to the Business Glossary 3.0 version.
  2. Wait 3-10 minutes, and then log out of Cloud Pak for Data as a Service.
  3. Log in to Cloud Pak for Data as a Service.

Data Refinery

If you use Data Refinery, you might encounter these known issues and restrictions when you refine data.

Personal credentials are not supported for connected data assets in Data Refinery

If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.

Workaround:

  1. Go to the project page, and click the link for the connected data asset to open the preview.
  2. Enter credentials.
  3. Open Data Refinery and use the authenticated connected data asset for a source or target.

Watson Studio issues

You might encounter some of these issues when getting started with and using Watson Studio.

Can't create assets in older accounts

If you're working in an instance of Watson Studio that was activated before November, 2017, you might not be able to create operational assets, like notebooks. If the Create button stays gray and disabled, you must add the Watson Studio service to your account from the Services catalog.

500 internal server error received when launching Watson Studio

Rarely, you may receive an HTTP internal server error (500) when launching Watson Studio. This might be caused by an expired cookie stored for the browser. To confirm the error was caused by a stale cookie, try launching Watson Studio in a private browsing session (incognito) or by using a different browser. If you can successfully launch in the new browser, the error was caused by an expired cookie. You have a choice of resolutions:

  1. Exit the browser application completely to reset the cookie. You must close and restart the application, not just close the browser window. Restart the browser application and launch Watson Studio to reset the session cookie.
  2. Clear the IBM cookies from the browsing data and launch Watson Studio. Look in the browsing data or security options in the browser to clear cookies. Note that clearing all IBM cookies may affect other IBM applications.

If the 500 error persists after performing one of these resolutions, check the status page for IBM Cloud incidents affecting Watson Studio. Additionally, you may open a support case at the IBM Cloud support portal.

Error during login

You might get this error message while trying to log in to Watson Studio: "Access Manager WebSEAL could not complete your request due to an unexpected error." Return to dataplatform.cloud.ibm.com and log in again. Usually the second login attempt works.

Manual installation of some tensor libraries is not supported

Some tensor flow libraries are preinstalled, but if you try to install additional tensor flow libraries yourself, you get an error.

Connection to notebook kernel is taking longer than expected after running a code cell

If you try to reconnect to the kernel and immediately run a code cell (or if the kernel reconnection happened during code execution), the notebook doesn't reconnect to the kernel and no output is displayed for the code cell. You need to manually reconnect to the kernel by clicking Kernel > Reconnect. When the kernel is ready, you can try running the code cell again.

Using the predefined sqlContext object in multiple notebooks causes an error

You might receive an Apache Spark error if you use the predefined sqlContext object in multiple notebooks. Create a new sqlContext object for each notebook. See this Stack Overflow explanation.

Spark tasks might fail with missing AWS keys error

Respawning a failed executor during a job that reads or writes Parquet on S3 causes subsequent tasks to fail because of missing AWS keys.

Notebook kernel not started when opening a Scala notebook

You might notice that the notebook kernel is not running when you open a Scala notebook that uses Spark and custom Scala libraries. This situation arises when you use Scala libraries that are not compatible with the Spark version you are using, for example, if you use a Scala 2.10 jar file in a notebook with Spark 2.1.

To avoid this situation:

  1. Ensure that you use Scala 2.11 libraries with Spark 2.1.
  2. Run the following code in a Python notebook to remove the existing Scala libraries:
    !rm -rvf ~/data/libs/*
    
  3. Reload the libraries you need.

Connection failed message

If your kernel stops, your notebook is no longer automatically saved. To save it, click File > Save manually, and you should get a Notebook saved message in the kernel information area, which appears before the Spark version. If you get a message that the kernel failed, to reconnect your notebook to the kernel click Kernel > Reconnect. If nothing you do restarts the kernel and you can't save the notebook, you can download it to save your changes by clicking File > Download as > Notebook (.ipynb). Then you need to create a new notebook based on your downloaded notebook file.

Connection to notebook kernel on Amazon EMR failed

If the notebook language, for example Python 3.7 with Spark isn't displayed for the notebook, the notebook kernel couldn't be started.

To verify that the Kernel Gateway to Amazon Elastic Map Reduce is started and its endpoints are accessible via the internet, run: curl https://<KG_EMR_URL>:<PORT>/api/kernelspecs -H "Authorization: token <Personal_access_token>"

The Kernel Gateway is accessible if a JSON list of the available kernels is returned. If not, you must reinstall Jupyter Kernel Gateway on Amazon EMR. For details, see Add an Amazon EMR Spark service.

Connecting to the notebook kernel on Amazon EMR is taking longer than expected

If your notebook kernel will not start, your Amazon Elastic Map Reduce service might have run out of Spark resources. You can free Spark resources by stopping the kernels of notebooks you aren't using. Alternatively, you can stop all kernels by restarting the Kernel Gateway to the EMR cluster:

  1. Open the Amazon EMR console and log into the master node of the cluster.
  2. Enter wget https://raw.githubusercontent.com/IBMDataScience/kernelgateway-setup/master/install_kg_emr_bootstrap_script.sh to download the Kernel Gateway setup.
  3. Enter chmod +x install_kg_emr_bootstrap_script.sh to run the script.
  4. Enter ./install_kg_emr_bootstrap_script.sh --restart to restart the Kernel Gateway. You will be prompted for the port number.

Connection to Amazon EMR not available

If you keep running into problems connecting to Amazon Elastic Map Reduce, it is best you uninstall the Kernel Gateway and install it again:

  1. Open the Amazon EMR console and log into the master node of the cluster.
  2. Enter wget https://raw.githubusercontent.com/IBMDataScience/kernelgateway-setup/master/install_kg_emr_bootstrap_script.sh to download the Kernel Gateway setup.
  3. Enter chmod +x install_kg_emr_bootstrap_script.sh to run the script.
  4. Enter ./install_kg_emr_bootstrap_script.sh --uninstall to remove the Kernel Gateway.
  5. Enter ./install_kg_emr_bootstrap_script.sh to install the Kernel Gateway again.

Connection to IBM Analytics Engine service not available

The IBM Analytics Engine service instance that you selected to use for your notebook in Watson Studio might have been deleted or might not be running. Check if the service instance exists and is provisioned on the IBM Cloud Dashboard by clicking the navigation menu in Watson Studio and selecting Dashboard.

You can add a new IBM Analytics Engine service from your project's Settings page in the associated services section.

No Insert to code support for notebooks running in Spark 3.0 & Scala 2.12 environments

You can't access data from project assets in Scala notebooks that run in Spark 3.0 & Scala 2.12 environments. An error is returned when you click the Insert to code link below the asset name and select to load data into a SparkSession DataFrame. A workaround is to switch back to using a Spark 2.4 & Scala 2.11 environment.

If your notebook contains sections that you link to from an introductory section at the top of the notebook for example, the links to these sections will not work if the notebook was opened in view-only mode in Firefox. However, if you open the notebook in edit mode, these links will work.

Launching an environment with a software customization in a Satellite location takes much longer than in Cloud Pak for Data as a Service

If you added a software customization to an environment for a Satellite location, launching the environment with this customization in the Satellite location to run a notebook or a notebook job takes much long than it takes to launch the same environment for a notebook or job in Cloud Pak for Data as a Service.

Can't connect to notebook kernel

If you try to run a notebook and you see the message Connecting to Kernel, followed by Connection failed. Reconnecting and finally by a connection failed error message, the reason might be that your firewall is blocking the notebook from running.

If Watson Studio is installed behind a firewall, you must add the WebSocket connection wss://dataplatform.cloud.ibm.com to the firewall settings. Enabling this WebSocket connection is required when you're using notebooks and RStudio.

Watson Machine Learning issues

You might encounter some of these issues when working with IBM Watson Machine Learning components, including the Model Builder and Flow Editor.

Security vulnerability when passing credentials in Watson Machine Learning

Currently, when specifying data references to input and output data for training, a path to a Cloud Object Storage bucket includes the path and credentials for the bucket, introducing a possible security vulnerability. This issue will be addressed but there is currently no workaround.

Region requirements

You can only associate a Watson Machine Learning service instance with your Watson Studio project when the Watson Machine Learning service instance and the Watson Studio instance are located in the same region.

Flow Editor runtime restrictions

Watson Studio does not include SPSS functionality in Peru, Ecuador, Colombia and Venezuela.

Deployment issues

AutoAI known limitations

Cognos Dashboard Embedded issues

You might encounter some of these issues when working with a Cognos Dashboard Embedded.

Incorrect data type shown for refined data assets

After you import a CSV file, if you click on the imported file in the data asset overview page, types of some columns might not show up correctly. For example, a dataset of a company report with a column called Revenue that contains the revenue of the company might show up as type String, instead of a number-oriented data type that is more logical.

Unsupported special characters in CSV files

The source CSV file name can contain non-alphanumeric characters. However, the CSV file name can't contain the special characters / : & < . \ ". If the file name contains these characters, they are removed from the table name.

Important: Table column names in the source CSV file can't contain any of the unsupported special characters. Those characters can't be removed because the name in the data module must match the name of the column in the source file. In this case, remove the special characters in your column names to enable using your data in a dashboard.

String values in CSV files are limited to 128 characters

String values in a column in your source CSV file can be only 128 characters long. If your CSV file has string columns with values that are longer, an error message is displayed.

Date format limitations in CSV files

There are date format limitations for CSV files used in visualizations. For details, see Resolving problems when using data from CSV files in Cognos Dashboard Embedded.

Can't replace a data table in a visualization

When you add a visualization to a dashboard, you cannot add a data table to the visualization if you previously added (and then removed) data fields from another data table. This restriction applies to Db2, CSV tables, and other data sources.

Cognos Analytics features that are not supported

The following functionality from IBM Cognos Analytics is not supported in dashboards:

SPSS Modeler issues

You might encounter some of these issues when working in SPSS Modeler.

Error when trying to stop a running flow

When running an SPSS Modeler flow, you might encounter an error if you try to stop the flow from the Environments tab. To completely stop the SPSS Modeler runtime and CUH consumption, close the browser tab(s) where you have the flow open.

Imported Data Asset Export nodes sometimes fail to run

When you create a new flow by importing an SPSS Modeler stream (.str file), migrate the export node, and then run the resulting Data Asset Export node, the run may fail. To work around this issue: rerun the node, change the output name and change the If the data set already exists option in the node properties, then run the node again.

Data preview may fail if table metadata has changed

In some cases, when using the Data Asset import node to import data from a connection, data preview may return an error if the underlying table metadata (data model) has changed. Recreate the Data Asset node to resolve the issue.

Unable to view output after running an Extension Output node

When running an Extension Output node with the Output to file option selected, the resulting output file returns an error when you try to open it from the Outputs panel.

Unable to preview Excel data from COS connections

Currently, you can't preview .xls or .xlsx data from a COS connection.

Numbers interpreted as a string

Any number with a precision larger or equal to 32 and a scale equal to 0 will be interpreted as a string. If you need to change this behavior, you can use a Filler node to cast the field to a real number instead by using the expression to_real(@FIELD).

SuperNode containing Import nodes

If your flow has a SuperNode that contains an Import node, the input schema may not be set correctly when you save the model with the Scoring branch option. To work around this issue, expand the SuperNode before saving.

KDE nodes with unsupported Python version

If your flow contains an old KDE node, you may receive an error when you run it about the model using a Python package that's no longer supported. In such a case, remove the old KDE node and add a new one.

Exporting to a SAV file

When using the Data Asset Export node to export to an SPSS Statistics SAV file (.sav), the Replace data asset option won't work if the input schema doesn't match the output schema. The schema of the existing file you want to replace must match.

Delimiter and decimal options

You can only set field delimiter and decimal options for Data Asset nodes (.csv). These options aren't available for connections at this time.

Running flows on a Watson Machine Learning Server

When you run flows on a Watson Machine Learning Server, you might encounter the following issues:

Migrating Import nodes

If you import a stream (.str) to your flow that was created in SPSS Modeler desktop and contains one or more unsupported Import nodes, you'll be prompted to migrate the Import nodes to data assets. If the stream contains multiple Import nodes that use the same data file, then you must first add that file to your project as a data asset before migrating because the migration can't upload the same file to more than one Import node. After adding the data asset to your project, reopen the flow and proceed with the migration using the new data asset.

Text Analytics

The Text Analytics nodes have the following issues: