The following limitations and known issues apply to Cloud Pak for Data as a Service.
Watson Knowledge Catalog
Data Privacy
Data Refinery
Watson Studio
Watson Machine Learning
Cognos Dashboard Embedded
Watson OpenScale
SPSS Modeler
If you use the Watson Knowledge Catalog, you might encounter these known issues and restrictions when you use catalogs.
If you edit a policy or rule that was created before March 16, 2018, associations between the policy or rule and business terms that are used in the policy or rule are duplicated. When you view the business term, you’ll see the policy or rule listed twice on the Related Content page.
If you add the Watson Knowledge Catalog service to your account using the Try-Out link on the landing page, you must refresh your browser to see catalog items.
When you add collaborators to the catalog, enter email addresses with all lowercase letters. Mixed-case email addresses are not supported.
When you look at a Cloud Object Storage (S3 API) or Cloudant connection, the folder itself is listed as a child asset.
An error might be encountered when multiple users are running connection operations concurrently. The error message can vary.
You cannot enable the enforcement of policies after you create a catalog. To apply policies to the assets in a catalog, you must enable enforcement during catalog creation.
When you add a connected data asset that has masked columns from a catalog to a project, the columns remain masked when you view the data and when you refine the data in the Data Refinery tool. Other tools in projects, however, do not preserve masking when accessing data through a connection. For example, when you load connected data in a notebook, you access the data through a direct connection and bypass masking.
Workaround To retain masking of connected data, create a new asset with Data Refinery:
In Business Glossary, updates to business terms are not automatically refreshed when navigating back to the Business Glossary page with breadcrumbs. To refresh the Business Glossary page to view all updates, click the same button twice in the alphabet index.
The following restrictions apply to data assets in a catalog with policies enforced: File-based data assets that have a header can't have duplicate column names, a period (.), or single quotation mark (') in a column name.
If evaluation fails, the asset is blocked to all users except the asset owner. All other users see an error message that the data asset cannot be viewed because evaluation failed and the asset is blocked.
If you don't see any predefined classifications or data classes, you need to reinitialize your tenant by using the following API call:
curl -X POST "https://api.dataplatform.cloud.ibm.com/v3/glossary_terms/admin/initialize_content" -H "Authorization: Bearer $BEARER_TOKEN" -k
When you upgrade your governance artifacts experience, any existing business terms, policies, and rules are permanently deleted. Therefore, all term and classification assignments become invalid because the referenced artifacts no longer exist. The same applies to any data masking you might have configured. Profiles are upgraded so that the classification results use the new version of the data classes.
Create and populate a new catalog after the upgrade to avoid mixing old data assets with invalid term and classification assignments with newly added data assets.
When the list of assets is updated, for example, after you edit the metadata enrichment asset, the message Assets are being enriched is shown.
After you edit a metadata enrichment asset, the list of data assets in the scope is rebuilt. However, the page is not automatically refreshed. Refresh the web browser to see the updated list.
If you use Data Privacy, you might encounter these known issues and restrictions when you are privatizing data.
During a masking flow job, Spark might attempt to read all of a data source into memory. Errors might occur when there isn't enough memory to support the job. The largest volume of data that can fit into the largest deployed Spark processing node is approximately 12GBs.
When you open the main menu and choose Governance, but the Rules option isn't available, do the following procedure to resolve the issue:
If you use Data Refinery, you might encounter these known issues and restrictions when you refine data.
If you create a connected data asset with personal credentials, other users must use the following workaround in order to use the connected data asset in Data Refinery.
Workaround:
You might encounter some of these issues when getting started with and using Watson Studio.
If you're working in an instance of Watson Studio that was activated before November, 2017, you might not be able to create operational assets, like notebooks. If the Create button stays gray and disabled, you must add the Watson Studio service to your account from the Services catalog.
Rarely, you may receive an HTTP internal server error (500) when launching Watson Studio. This might be caused by an expired cookie stored for the browser. To confirm the error was caused by a stale cookie, try launching Watson Studio in a private browsing session (incognito) or by using a different browser. If you can successfully launch in the new browser, the error was caused by an expired cookie. You have a choice of resolutions:
If the 500 error persists after performing one of these resolutions, check the status page for IBM Cloud incidents affecting Watson Studio. Additionally, you may open a support case at the IBM Cloud support portal.
You might get this error message while trying to log in to Watson Studio: "Access Manager WebSEAL could not complete your request due to an unexpected error." Return to dataplatform.cloud.ibm.com and log in again. Usually the second login attempt works.
Some tensor flow libraries are preinstalled, but if you try to install additional tensor flow libraries yourself, you get an error.
If you try to reconnect to the kernel and immediately run a code cell (or if the kernel reconnection happened during code execution), the notebook doesn't reconnect to the kernel and no output is displayed for the code cell. You need to manually reconnect to the kernel by clicking Kernel > Reconnect. When the kernel is ready, you can try running the code cell again.
You might receive an Apache Spark error if you use the predefined sqlContext object in multiple notebooks. Create a new sqlContext object for each notebook. See this Stack Overflow explanation.
Respawning a failed executor during a job that reads or writes Parquet on S3 causes subsequent tasks to fail because of missing AWS keys.
You might notice that the notebook kernel is not running when you open a Scala notebook that uses Spark and custom Scala libraries. This situation arises when you use Scala libraries that are not compatible with the Spark version you are using, for example, if you use a Scala 2.10 jar file in a notebook with Spark 2.1.
To avoid this situation:
!rm -rvf ~/data/libs/*
If your kernel stops, your notebook is no longer automatically saved. To save it, click File > Save manually, and you should get a Notebook saved message in the kernel information area, which appears before the Spark version. If you get a message that the kernel failed, to reconnect your notebook to the kernel click Kernel > Reconnect. If nothing you do restarts the kernel and you can't save the notebook, you can download it to save your changes by clicking File > Download as > Notebook (.ipynb). Then you need to create a new notebook based on your downloaded notebook file.
If the notebook language, for example Python 3.7 with Spark isn't displayed for the notebook, the notebook kernel couldn't be started.
To verify that the Kernel Gateway to Amazon Elastic Map Reduce is started and its endpoints are accessible via the internet, run: curl https://<KG_EMR_URL>:<PORT>/api/kernelspecs -H "Authorization: token <Personal_access_token>"
The Kernel Gateway is accessible if a JSON list of the available kernels is returned. If not, you must reinstall Jupyter Kernel Gateway on Amazon EMR. For details, see Add an Amazon EMR Spark service.
If your notebook kernel will not start, your Amazon Elastic Map Reduce service might have run out of Spark resources. You can free Spark resources by stopping the kernels of notebooks you aren't using. Alternatively, you can stop all kernels by restarting the Kernel Gateway to the EMR cluster:
wget https://raw.githubusercontent.com/IBMDataScience/kernelgateway-setup/master/install_kg_emr_bootstrap_script.sh to download the Kernel Gateway setup.chmod +x install_kg_emr_bootstrap_script.sh to run the script../install_kg_emr_bootstrap_script.sh --restart to restart the Kernel Gateway. You will be prompted for the port number.If you keep running into problems connecting to Amazon Elastic Map Reduce, it is best you uninstall the Kernel Gateway and install it again:
wget https://raw.githubusercontent.com/IBMDataScience/kernelgateway-setup/master/install_kg_emr_bootstrap_script.sh to download the Kernel Gateway setup.chmod +x install_kg_emr_bootstrap_script.sh to run the script../install_kg_emr_bootstrap_script.sh --uninstall to remove the Kernel Gateway../install_kg_emr_bootstrap_script.sh to install the Kernel Gateway again.The IBM Analytics Engine service instance that you selected to use for your notebook in Watson Studio might have been deleted or might not be running. Check if the service instance exists and is provisioned on the IBM Cloud Dashboard by clicking the navigation menu in Watson Studio and selecting Dashboard.
You can add a new IBM Analytics Engine service from your project's Settings page in the associated services section.
You can't access data from project assets in Scala notebooks that run in Spark 3.0 & Scala 2.12 environments. An error is returned when you click the Insert to code link below the asset name and select to load data into a SparkSession DataFrame. A workaround is to switch back to using a Spark 2.4 & Scala 2.11 environment.
If your notebook contains sections that you link to from an introductory section at the top of the notebook for example, the links to these sections will not work if the notebook was opened in view-only mode in Firefox. However, if you open the notebook in edit mode, these links will work.
If you added a software customization to an environment for a Satellite location, launching the environment with this customization in the Satellite location to run a notebook or a notebook job takes much long than it takes to launch the same environment for a notebook or job in Cloud Pak for Data as a Service.
If you try to run a notebook and you see the message Connecting to Kernel, followed by Connection failed. Reconnecting and finally by a connection failed error message, the reason might be that your firewall is blocking the
notebook from running.
If Watson Studio is installed behind a firewall, you must add the WebSocket connection wss://dataplatform.cloud.ibm.com to the firewall settings. Enabling this WebSocket connection is required when you're using notebooks and RStudio.
You might encounter some of these issues when working with IBM Watson Machine Learning components, including the Model Builder and Flow Editor.
Currently, when specifying data references to input and output data for training, a path to a Cloud Object Storage bucket includes the path and credentials for the bucket, introducing a possible security vulnerability. This issue will be addressed but there is currently no workaround.
You can only associate a Watson Machine Learning service instance with your Watson Studio project when the Watson Machine Learning service instance and the Watson Studio instance are located in the same region.
Watson Studio does not include SPSS functionality in Peru, Ecuador, Colombia and Venezuela.
Currently, AutoAI experiments do not support double-byte character sets. AutoAI only supports CSV files with ASCII characters. Users must convert any non-ASCII characters in the file name or content, and provide input data as a CSV as defined in this CSV standard.
To interact programmatically with an AutoAI model, use the REST API instead of the Python client. The APIs for the Python client required to support AutoAI are not generally available at this time.
You might encounter some of these issues when working with a Cognos Dashboard Embedded.
After you import a CSV file, if you click on the imported file in the data asset overview page, types of some columns might not show up correctly. For example, a dataset of a company report with a column called Revenue that contains the revenue of the company might show up as type String, instead of a number-oriented data type that is more logical.
The source CSV file name can contain non-alphanumeric characters. However, the CSV file name can't contain the special characters / : & < . \ ". If the file name contains these characters, they are removed from the table
name.
Important: Table column names in the source CSV file can't contain any of the unsupported special characters. Those characters can't be removed because the name in the data module must match the name of the column in the source file. In this case, remove the special characters in your column names to enable using your data in a dashboard.
String values in a column in your source CSV file can be only 128 characters long. If your CSV file has string columns with values that are longer, an error message is displayed.
There are date format limitations for CSV files used in visualizations. For details, see Resolving problems when using data from CSV files in Cognos Dashboard Embedded.
When you add a visualization to a dashboard, you cannot add a data table to the visualization if you previously added (and then removed) data fields from another data table. This restriction applies to Db2, CSV tables, and other data sources.
The following functionality from IBM Cognos Analytics is not supported in dashboards:
You might encounter some of these issues when working in SPSS Modeler.
When running an SPSS Modeler flow, you might encounter an error if you try to stop the flow from the Environments tab. To completely stop the SPSS Modeler runtime and CUH consumption, close the browser tab(s) where you have the flow open.
When you create a new flow by importing an SPSS Modeler stream (.str file), migrate the export node, and then run the resulting Data Asset Export node, the run may fail. To work around this issue: rerun the node, change the output name and change the If the data set already exists option in the node properties, then run the node again.
In some cases, when using the Data Asset import node to import data from a connection, data preview may return an error if the underlying table metadata (data model) has changed. Recreate the Data Asset node to resolve the issue.
When running an Extension Output node with the Output to file option selected, the resulting output file returns an error when you try to open it from the Outputs panel.
Currently, you can't preview .xls or .xlsx data from a COS connection.
Any number with a precision larger or equal to 32 and a scale equal to 0 will be interpreted as a string. If you need to change this behavior, you can use a Filler node to cast the field to a real number instead by using the expression to_real(@FIELD).
If your flow has a SuperNode that contains an Import node, the input schema may not be set correctly when you save the model with the Scoring branch option. To work around this issue, expand the SuperNode before saving.
If your flow contains an old KDE node, you may receive an error when you run it about the model using a Python package that's no longer supported. In such a case, remove the old KDE node and add a new one.
When using the Data Asset Export node to export to an SPSS Statistics SAV file (.sav), the Replace data asset option won't work if the input schema doesn't match the output schema. The schema of the existing file you want to replace must match.
You can only set field delimiter and decimal options for Data Asset nodes (.csv). These options aren't available for connections at this time.
When you run flows on a Watson Machine Learning Server, you might encounter the following issues:
If you import a stream (.str) to your flow that was created in SPSS Modeler desktop and contains one or more unsupported Import nodes, you'll be prompted to migrate the Import nodes to data assets. If the stream contains multiple Import nodes that use the same data file, then you must first add that file to your project as a data asset before migrating because the migration can't upload the same file to more than one Import node. After adding the data asset to your project, reopen the flow and proceed with the migration using the new data asset.
The Text Analytics nodes have the following issues:
In the Interactive Workbench, when you click Generate new model, a new model nugget is created in your flow. If you generate multiple models, they all have the same name, so it may be difficult to differentiate them. One recommendation is to use annotations to help identify them (double-click a model nugget to open its properties, then go to Annotations).