What's new in IBM Cloud Pak for Data?

See what new features and improvements are available in the latest release of IBM® Cloud Pak for Data.

Refresh 7 of Version 3.5

Released: July 2021

This refresh primarily includes defect fixes. However, several services, such as Analytics Engine Powered by Apache Spark, Open Data for Industries, and Watson™ Knowledge Catalog include new or updated features.

Software Assembly version What does it mean for me?
Cloud Pak for Data control plane 3.5.3
Version 3.5.3 of the control plane includes various fixes. For details, see What's new and changed in the control plane.
Important: To install Version 3.5.3, you must use Version 3.5.6 of the Cloud Pak for Data command-line interface (cpd-cli).
(x86_64 clusters only.) If you are upgrading from Cloud Pak for Data Version 3.0.1, you must install cpd-3.0.1-lite-patch-8 before you upgrade to Version 3.5.3 of the Cloud Pak for Data control plane. For details on installing this patch, see:
Related documentation:
Cloud Pak for Data command-line interface (cpd-cli) 3.5.6
The 3.5.6 release of the Cloud Pak for Data command-line interface includes the following features and updates:
Refreshing the Cloud Pak for Data Operator
You can now refresh the Cloud Pak for Data Operator so that you can use newer versions of the cpd-cli to patch and scale Cloud Pak for Data. Previously, if you encountered an error because of an incompatibility between the cpd-cli and the operator, you needed to upgrade Cloud Pak for Data. For details, see Cannot use the cpd-cli to patch or scale.
Important: To install Version 3.5.3, you must use Version 3.5.6 of the Cloud Pak for Data command-line interface (cpd-cli).
Cloud Pak for Data common core services 3.5.5
Version 3.5.5 of the common core services includes various fixes. For details, see What's new and changed in the common core services.
Analytics Engine Powered by Apache Spark 3.5.5
The 3.5.5 release of Analytics Engine Powered by Apache Spark includes the following features and updates:
Latest sparklyr library version installed
You can now analyze data in Spark using the latest version of the sparklyr package.
Version 3.5.5 of the Analytics Engine Powered by Apache Spark includes various fixes. For details, see What's new and changed in Analytics Engine Powered by Apache Spark.
Related documentation:
Analytics Engine Powered by Apache Spark
Data Refinery 3.5.5
The 3.5.5 release of Data Refinery includes the following features and updates:
Short videos showcase the Data Refinery GUI operations

The Data Refinery GUI operations topic now includes a short video for each operation to help you learn by example.

If you have feedback on the videos, you can submit it through the Watson Studio and Machine Learning community (You must sign in to leave comments.)

Version 3.5.5 of the Data Refinery service includes various fixes. For details, see What's new and changed in Data Refinery.
Related documentation:
Data Refinery
DataStage® 3.5.6
Version 3.5.6 of the DataStage includes various fixes. For details, see What's new and changed in DataStage.
Related documentation:
Decision Optimization 3.5.6
Version 3.5.6 of the Decision Optimization service includes various fixes.
Related documentation:
Decision Optimization
Execution Engine for Apache Hadoop 3.5.3
Version 3.5.3 of the Execution Engine for Apache Hadoop service includes various security fixes.
Related documentation:
Execution Engine for Apache Hadoop
Jupyter Notebooks with Python 3.7 for GPU 3.5.4
Version 3.5.4 of the Jupyter Notebooks with Python 3.7 for GPU service includes various fixes. For details, see What's new and changed in Jupyter Notebooks with Python 3.7 for GPU.
Related documentation:
Jupyter Notebooks with Python 3.7 for GPU
Jupyter Notebooks with R 3.6 3.5.4
Version 3.5.4 of the Jupyter Notebooks with R 3.6 service includes various fixes. For details, see What's new and changed in Jupyter Notebooks with R 3.6.
Related documentation:
Jupyter Notebooks with R 3.6
Open Data for Industries 2.0.2
The 2.0.2 release of Open Data for Industries includes the following features and updates:
Ingesting and governing oil and gas data with Open Data for Industries
Use the Workflow API to ingest oil and gas data files into the Open Data for Industries metadata repositories so that the data can be managed. For details, see Ingesting and governing oil and gas data with Open Data for Industries.
The following topics include steps to ingest data from sample files:
Domain data management services
Manage the data lifecycle, satisfying any mandatory data access concerns, and make the data globally discoverable and retrievable through Open Data for Industries. For details, see Domain data management services APIs.

Open Data for Industries provides the following domain data management services:

Wellbore service
Manage well data with the Wellbore DDMS API. The service can search and correlate data logs on the basis of the domain context. For details, see Wellbore API.
Seismic service
Access and store seismic data objects with the Seismic Store API. Convert data to different formats and accumulate large data sets that use standard formats. For details, see Seismic Store service.
Setting up a test environment to validate the API endpoints
Verify that your installation is working properly or run a build verification test (BVT) of your environment. For details, see Validating the Open Data for Industries API endpoints.
Security considerations for Open Data for Industries
Learn about the security mechanisms that are implemented at the Cloud Pak for Data platform level and the security mechanisms that are specific to Open Data for Industries. For details, see Getting started with Open Data for Industries.
Version 2.0.2 of the Open Data for Industries service includes various fixes.
Related documentation:
Open Data for Industries
RStudio® Server with R 3.6 3.5.5
Version 3.5.5 of the RStudio Server with R 3.6 service includes various fixes. For details, see What's new and changed in RStudio Server with R 3.6.
Related documentation:
RStudio Server with R 3.6
SPSS® Modeler 3.5.5
Version 3.5.5 of the SPSS Modeler service includes various fixes. For details, see What's new and changed in SPSS Modeler.
Related documentation:
SPSS Modeler
Watson Knowledge Catalog 3.5.6
The 3.5.6 release of Watson Knowledge Catalog includes the following features and updates:
Data discovery: quick scan results
This release includes the following changes to data discovery:
  • The View data quality permission now grants access to quick scan results.
  • You can reanalyze individual tables.
  • You can set the status Reviewed for columns.
  • You can enable overwrite of existing term assignments when results are republished. For details, see Reviewing and working with quick scan results.
Version 3.5.6 of the Watson Knowledge Catalog service includes various fixes. For details, see What's new and changed in Watson Knowledge Catalog.
Related documentation:
Watson Knowledge Catalog
Watson Machine Learning 3.5.6
Version 3.5.6 of the Watson Machine Learning service includes various fixes. For details, see What's new and changed in Watson Machine Learning.
Related documentation:
Watson Machine Learning
Watson OpenScale 3.5.4
Version 3.5.4 of the Watson OpenScale service includes various fixes. For details, see What's new and changed in Watson OpenScale.
Related documentation:
Watson OpenScale
Watson Studio 3.5.4
Version 3.5.4 of the Watson Studio service includes various fixes. For details, see What's new and changed in Watson Studio.

Refresh 6 of Version 3.5

Released: May 2021

This refresh primarily includes defect fixes. However, several services, such as Analytics Engine Powered by Apache Spark and Watson Knowledge Catalog include new or updated features.

In addition, Version 1.1.2 of the Watson Knowledge Studio is now supported on Red Hat® OpenShift® 4.6.

The following table introduces what is new and changed for the software and services that released a refresh in May 2021.

Software Assembly version What does it mean for me?
Cloud Pak for Data scheduling service 1.1.4
Version 1.1.4 of the scheduling service includes various fixes. various fixes. For details, see What's new and changed in the scheduling service.
Related documentation:
Analytics Engine Powered by Apache Spark 3.5.4
The 3.5.4 release of Analytics Engine Powered by Apache Spark includes the following features and updates:
New data skipping library
Data skipping now uses the open source Xskipper library to improve the performance of SQL queries in your Spark applications. For details, see Data skipping for Spark SQL.
Version 3.5.4 of the Analytics Engine Powered by Apache Spark includes various fixes. For details, see What's new and changed in Analytics Engine Powered by Apache Spark.
Related documentation:
Analytics Engine Powered by Apache Spark
DataStage 3.5.5
Version 3.5.5 of the DataStage includes various fixes. For details, see What's new and changed in DataStage.
Related documentation:
Decision Optimization 3.5.5
Version 3.5.5 of the Decision Optimization service includes various security fixes.
Related documentation:
Decision Optimization
Guardium® External S-TAP® 11.3.1
The images for the Guardium External S-TAP service are now available on Docker Hub. (The images for the 11.2.2 release of the service are also on Docker Hub.)

For information on specifying the image location in your repo.yaml file, see Obtaining the installation files.

Version 11.3.1 of the Guardium External S-TAP service includes various fixes. For details, see What's new and changed in Guardium External S-TAP.

Related documentation:
Guardium External S-TAP
Watson Knowledge Catalog 3.5.5
The 3.5.5 release of Watson Knowledge Catalog includes the following features and updates:
Data discovery
This release includes the following changes to data discovery:
  • Improved usability and performance when editing term assignments in the discovery results view.
  • In the quick scan results interface, you can now assign business terms to more than one asset at once. See Working with quick scan results.
  • Publishing discovery results can now be restricted to users with admin or edit access to the default catalog. See Restricting publishing of discovery results.
Data quality
When you download a data asset from a data quality project, only assigned terms are included.
Version 3.5.5 of the Watson Knowledge Catalog service includes various fixes. For details, see What's new and changed in Watson Knowledge Catalog.
Related documentation:
Watson Knowledge Catalog

Refresh 5 of Version 3.5

Released: April 2021

This refresh primarily includes defect fixes. However, several services, such as IBM Open Data for Industries and Watson Studio include new features. In addition, Financial Crimes Insight® is now available on Cloud Pak for Data Version 3.5.

The following table introduces what is new and changed for the software and services that released a refresh in April 2021.

Software Assembly version What does it mean for me?
Cloud Pak for Data command-line interface (cpd-cli) 3.5.4
The 3.5.4 release of the Cloud Pak for Data command-line interface includes the following features and updates:
Support for a different temporary download directory
When you download files for air-gapped environments, the cpd-cli uses /var/tmp for temporary storage. However, if you don't have sufficient space in your /var/tmp directory, the download can fail.
In the latest release of the cpd-cli, you can specify a different temporary directory with more space. For details, see:
Version 3.5.4 of the Cloud Pak for Data command-line interface includes various fixes. For details, see What's new with the Cloud Pak for Data command-line interface (cpd-cli).
Cloud Pak for Data common core services 3.5.4
The 3.5.4 release of the common core services includes changes to support features and updates in Watson Studio and Watson Knowledge Catalog.
In addition, the common core services includes the following features and updates:
Global search to support word or phrase searches
You can use search to find results that match the following elements:
  • A partial word anywhere in the data value (beginning, middle or end)
  • Any keyword that is provided
  • An exact match for all keywords that are provided

If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.

Version 3.5.4 of the common core services includes various fixes. For details, see What's new and changed in the common core services.
Cloud Pak for Data scheduling service 1.1.3
Version 1.1.3 of the scheduling service includes minor fixes.
Related documentation:
Analytics Engine Powered by Apache Spark 3.5.3
Version 3.5.3 of the Analytics Engine Powered by Apache Spark includes various fixes. For details, see What's new and changed in Analytics Engine Powered by Apache Spark.
Related documentation:
Analytics Engine Powered by Apache Spark
Cognos® Analytics 3.5.3
The 3.5.3 release of Cognos Analytics includes the following features and updates:
Audit database option
Starting in the 3.5.3 release, you can optionally log events from Cognos Analytics to a relational database. For details, see Accessing the audit reports.
New plan size for proof of concept deployments
When you provision an instance of Cognos Analytics, you can specify the size of the instance. Starting in the 3.5.3 release, you can choose the Fixed minimum plan for evaluation and proof of concept deployments.

For details, see Provisioning the Cognos Analytics service.

New font for reports
You can now author reports with the IBM Plex font.

Version 3.5.3 of the Cognos Analytics service includes various fixes. For details, see What's new and changed in Cognos Analytics.

Related documentation:
Cognos Analytics
Data Refinery 3.5.4
Version 3.5.4 of the Data Refinery service includes various fixes. For details, see What's new and changed in Data Refinery.
Related documentation:
Data Refinery
Db2® Data Gate 1.1.3
The 1.1.3 release of Db2 Data Gate includes the following features and updates:
Automatic index creation
When you add a table to Db2 Data Gate, the service automatically creates an index for the table. By use of the index, source tables and target tables can be synchronized faster.
Use Cloud Pak for Data credentials for the target database
You can now give Cloud Pak for Data users access to the target database. This allows you to deliberately select certain users and control their access permissions. Before, access was given to an internal user ID created by Db2 Data Gate. It was not possible to influence the creation of this user in any way.
Version 1.1.3 of the Db2 Data Gate service includes minor bug fixes.
Related documentation:
Db2 Data Gate
Decision Optimization 3.5.4
Version 3.5.4 of the Decision Optimization service includes various fixes. For details, see What's new and changed in Decision Optimization.
Related documentation:
Decision Optimization
Financial Crimes Insight 6.6.0
Financial Crimes Insight is now available on IBM Cloud Pak for Data 3.5.
The Financial Crimes Insight offering includes the following components:
IBM Financial Crimes Insight
Financial Crimes Insight combines AI, big data, and automation with input from regulatory experts to make it easier to detect and mitigate financial crimes.

Install the base offering, Financial Crimes Insight, to proactively detect, intercept, and prevent attempted fraud and financial crimes.

After installing the base offering, you can install one or more of the optional components that support your use case.

This release of Financial Crimes Insight includes added support for related cases in the network graph.

In addition, Financial Crimes Insight has adopted Carbon, IBM's open source design system for digital products and experiences. Carbon makes it easier for you to know exactly how the interface will behave and makes Financial Crimes Insight feel like other services in IBM Cloud Pak for Data.

IBM Financial Crimes Insight for Claims Fraud
This component helps you deter, prevent, and intercept multiple types of insurance fraud. IBM Financial Crimes Insight for Claims Fraud enables you to improve detection processes and decision making, expand the observation space within the institution, speed investigations, and achieve faster claims resolution for customers.
This release of IBM Financial Crimes Insight for Claims Fraud includes:
  • Added support for integration with the National Insurance Crime Bureau (NICB)
  • New user interface functionality to perform watchlist maintenance
IBM Financial Crimes Insight for Entity Research
This component enables your organization to improve the customer experience throughout the customer lifecycle, from on-boarding through periodic review, by streamlining and automating Know Your Customer (KYC), and customer due diligence (CDD) compliance process. By augmenting existing systems instead of replacing them, financial institutions can take advantage of AI, automation and modern capabilities while realizing a faster time to value.
This release of IBM Financial Crimes Insight for Entity Research includes:
  • A new API for monitoring material data changes from both structured and unstructured data sources, including API Orchestration framework
  • Improved Negative News analytics, including the grouping model
  • Added save and dismiss decision updates for improved audit log enhancements
IBM Financial Crimes Insight for Alert Triage - Transaction List Screening
This component augments existing sanctions screening systems by analyzing alerted transactions using a configurable and extendable API-driven pipeline. Transaction data is cleaned, parsed, and wrangled, then processed through heuristics and cognitive computing techniques. The results are used to score hits, identify false positives, and return informative, customizable insights.
Version 6.6.0 of the Financial Crimes Insight service includes various fixes. For details, see What's new and changed in Financial Crimes Insight.
Related documentation:
Financial Crimes Insight
Open Data for Industries 2.0.0
The 2.0.0 release of Open Data for Industries includes the following features and updates:
New endpoints in the API
The following methods were added to the Open Data for Industries API:
  • Use the File endpoint to fetch records or request file location data. For details, see File API.
  • Use the Indexer endpoint to index records for efficient search. For details, see Indexer API.
  • Use the Workflow endpoint to provide mechanisms to define, manage, and run process definitions. For details, see Workflow API.
Support for logging and monitoring
In addition to the logging and monitoring mechanisms provided by the Cloud Pak for Data platform, you can now view logs and monitor activity at the service level by using the EFK stack.

For details, see Logs and monitoring in Open Data for Industries.

New version of Elasticsearch
Elasticsearch 7.11.1 is now included in the Open Data for Industries software utilities.

For details, see Installing software utilities.

New post-installation script
Open Data for Industries includes a new script, postInstall_script.sh, that you must run after you install the service. This script configures the environment so the service can run properly.

For details, see Configuring and validating the service.

Related documentation:
Open Data for Industries
RStudio Server with R 3.6 3.5.3
Version 3.5.3 of the RStudio Server with R 3.6 service includes various fixes. For details, see What's new and changed in RStudio Server with R 3.6.
Watson Machine Learning 3.5.3
The 3.5.3 release of Watson Machine Learning includes the following features and updates:
Support for new frameworks and software specifications

For details, see Specifying a model type and software specification.

Version 3.5.3 of the Watson Machine Learning service includes various fixes. For details, see What's new and changed in Watson Machine Learning.
Related documentation:
Watson Machine Learning
Watson Studio 3.5.3
The 3.5.3 release of Watson Studio includes the following features and updates:
The Default Python 3.7 environment includes new open source library versions
The Default Python 3.7 environment now includes the latest open source versions of many popular machine learning libraries like TensorFlow, XGBoost and PyTorch.

The old Default Python 3.7 environment is deprecated and was renamed as Default Python 3.7 (legacy). You should start using the new Default Python 3.7 environment to run your existing notebooks.

Version 3.5.3 of the Watson Studio service includes various fixes. For details, see What's new and changed in Watson Studio.
Related documentation:
Watson Studio

Refresh 4 of Version 3.5

Released: March 2021

This refresh primarily includes defect fixes. However, several services, such as OpenPages®, Planning Analytics, and SPSS Modeler also include new features.

The following table introduces what is new and changed for the software and services that released a refresh in March 2021.

Software Assembly version What does it mean for me?
Cloud Pak for Data command-line interface (cpd-cli) 3.5.3
Version 3.5.3 of the Cloud Pak for Data command-line interface includes various fixes. For details, see What's new with the Cloud Pak for Data command-line interface (cpd-cli).
Cognos Dashboards 3.5.2

Version 3.5.2 of the Cognos Dashboards service includes minor fixes.

Related documentation:
Cognos Dashboards
DataStage 3.5.4

Version 3.5.4 of the DataStage includes various fixes. For details, see What's new and changed in DataStage.

Related documentation:
Db2 3.5.4
The 3.5.4 release of Db2 includes the following features and updates:
Support for IBM Spectrum® Scale
Db2 now supports IBM Spectrum Scale storage on the Linux® on IBM Z® platform.

Version 3.5.4 of the Db2 service includes minor fixes.

Related documentation:
Db2
Db2 Event Store 2.0.1.1
The 2.0.1.1 release of Db2 Event Store includes the following features and updates:
Automatic fallback mechanism for determining container IP address
If you installed Db2 Event Store on nodes that use a different network interface than the default Kubernetes network interface, it can be difficult to determine the IP addresses of the containers.

If Db2 Event Store cannot reach the nodes using the values in the db2nodes.cfg, the service prints the correct IP addresses to the OpenShift Console.

Version 2.0.1.1 of the Db2 Event Store service includes minor bug fixes.

Related documentation:
Db2 Event Store
Db2 Warehouse 3.5.4
The 3.5.4 release of Db2 Warehouse includes the following features and updates:
Support for IBM Spectrum Scale
Db2 Warehouse now supports IBM Spectrum Scale storage on the Linux on IBM Z platform.

Version 3.5.4 of the Db2 Warehouse service includes minor bug fixes.

Related documentation:
Db2 Warehouse
OpenPages 8.2.2
The 8.2.2 release of OpenPages includes the following features and updates:
Support for Portworx 2.6.2
If you are using Red Hat OpenShift Version 4.x, you can install OpenPages with Portworx Essentials for IBM Version 2.6.2.
Provision multiple instances
You can now deploy multiple instances of the OpenPages service from a single installation. For more information, see Installing OpenPages

For additional information about changes in OpenPages, see New features in version 8.2.0.2 in the OpenPages with Watson product documentation.

Version 8.2.2 of the OpenPages service includes various fixes. For details, see What's new and changed in OpenPages.

Related documentation:
OpenPages
Planning Analytics 3.5.2
The 3.5.2 release of Planning Analytics includes the following features and updates:
Improved look and feel
The updated Planning Analytics Workspace interface makes it easier to accomplish tasks, provides a more consistent experience with other IBM products, and simplifies the transition between Planning Analytics Workspace and Cognos Analytics.

Planning Analytics Workspace has adopted Carbon, IBM's open source design system for digital products and experiences. Carbon makes it easier for you to know exactly how the interface will behave and makes Planning Analytics Workspace feel like other services in IBM Cloud Pak for Data.

To learn more about the interface changes, see Improved look and feel in the Planning Analytics product documentation.

Assign a unique name to the internal TM1 database
In previous versions, the name that you assigned to the Planning Analytics service instance was also assigned to the internal TM1 database.

You can now assign a unique name to the internal database when you create the service instance. For details, see Provisioning the Planning Analytics service.

Version 3.5.2 of the Planning Analytics service includes various fixes. For details, see What's new and changed in Planning Analytics.

Related documentation:
Planning Analytics
SPSS Modeler 3.5.2
The 3.5.2 release of SPSS Modeler includes the following features and updates:
Performance improvements
The performance of reading and writing data has been improved.
SQL pushback for Snowflake
SQL pushback is supported for connections to Snowflake data sources running on x86 Linux.

For details, see SQL optimization.

PostgreSQL support
You can now connect to PostgreSQL data sources.

For details, see Supported data sources for SPSS Modeler.

Extension nodes
You can now load Python or R libraries to extension nodes.

For details, see Extension nodes.

Version 3.5.2 of the SPSS Modeler service includes various fixes. For details, see What's new and changed in SPSS Modeler.

Related documentation:
SPSS Modeler
Watson Knowledge Catalog 3.5.4
The 3.5.4 release of Watson Knowledge Catalog includes the following features and updates:
Search column descriptions
You can now use the global search bar to search for terms in the column descriptions of new data assets.
Publish results at schema level
In quick scan results, you can now publish entire schemas instead of selecting the tables of a schema individually for publishing. For details, see Reviewing and working with the quick scan results

Version 3.5.4 of the Watson Knowledge Catalog service includes various fixes. For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson Machine Learning 3.5.2

Version 3.5.2 of the Watson Machine Learning service includes various fixes. For details, see What's new and changed in Watson Machine Learning.

Related documentation:
Watson Machine Learning
Watson OpenScale 3.5.3
Version 3.5.3 of the Watson OpenScale service includes minor bug fixes.
Related documentation:
Watson OpenScale
Watson Speech to Text 1.2.1
Version 1.2.1 of the Watson Text to Speech service includes minor bug fixes.
Related documentation:
Watson Speech to Text
Watson Text to Speech 1.2.1
Version 1.2.1 of the Watson Text to Speech service includes minor bug fixes.
Related documentation:
Watson Text to Speech

Refresh 3 of Version 3.5

Released: February 2021

This refresh primarily includes defect fixes. However, several services, such as Informix® and Guardium External S-TAP, also include support for newer software releases.

The following table introduces what is new and changed for the software and services that released a refresh in February 2021.

Software Assembly version What does it mean for me?
Cloud Pak for Data common core services 3.5.3

Version 3.5.3 of the common core services includes various fixes. For details, see What's new and changed in the common core services.

Cloud Pak for Data scheduling service 1.1.2

Version 1.1.2 of the scheduling service includes various fixes. For details, see What's new and changed in the scheduling service

Related documentation:
Data Refinery 3.5.3

Version 3.5.3 of the Data Refinery service includes minor fixes.

Related documentation:
Data Refinery
DataStage 3.5.3

Version 3.5.3 of the DataStage service includes minor bug fixes.

Db2 Data Management Console 3.5.3

Version 3.5.3 of the Db2 Data Management Console service includes various fixes. For details, see What's new and changed in Db2 Data Management Console.

Related documentation:
Db2 Data Management Console
Guardium External S-TAP 11.3.0
The 11.3.0 release of Guardium External S-TAP includes the following features and updates:
Version upgrade
This version of the Guardium External S-TAP service includes support for Guardium 11.3. For details, see the Guardium 11.3 documentation.

Version 11.3.0 of the Guardium External S-TAP service includes various fixes. For details, see What's new and changed in Guardium External S-TAP.

Related documentation:
Guardium External S-TAP
Informix 3.5.0
The 3.5.0 release of Informix includes the following features and updates:
Version upgrade
This version of the Informix service runs Informix Version 14.10.FC5.
New License upgrade mechanism
The Informix service runs the Developer Edition. After you provision an instance of the service, you can upgrade to a different edition, such as Enterprise Edition.
Storage enhancements
Informix now supports the following storage options:
  • Red Hat OpenShift Container Storage 4.5
  • Portworx
Security enhancements
The Informix service now uses TLS encryption for all the exposed APIs.

Version 3.5.0 of the Informix service includes minor bug fixes.

Related documentation:
Informix
Jupyter Notebooks with Python 3.7 for GPU 3.5.2

Version 3.5.2 of the Jupyter Notebooks with Python 3.7 for GPU service includes minor bug fixes.

Related documentation:
Jupyter Notebooks with Python 3.7 for GPU
Jupyter Notebooks with R 3.6 3.5.2

Version 3.5.2 of the Jupyter Notebooks with R 3.6 service includes minor bug fixes.

Related documentation:
Jupyter Notebooks with R 3.6
Open Data for Industries 1.1.0
The 1.1.0 release of Open Data for Industries includes the following features and updates:
New API to manage schemas
Schema management is fully decoupled from the Storage API. You can use the new Schema API to manage schemas to define and enforce sets of attributes on functional data. For details, see Schema API.
Bulk user creation
A new install utility allows you to easily create user records in the service's identity provider. You can create multiple user records in a single operation by using this utility. For details, see Installing software utilities.
Related documentation:
Open Data for Industries
RStudio Server with R 3.6 3.5.2

Version 3.5.2 of the RStudio Server with R 3.6 service includes minor bug fixes.

Related documentation:
RStudio Server with R 3.6
Watson Discovery 2.2.1
Version 2.2.1 of the Watson Discovery service includes various security fixes.
Related documentation:
Watson Discovery
Watson Knowledge Catalog 3.5.3
The 3.5.3 release of Watson Knowledge Catalog includes the following features and updates:
Additional data source for discovery
Apache Kudu data sources are now supported for automated discovery and quick scan.

Version 3.5.3 of the Watson Knowledge Catalog service includes various fixes. For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson OpenScale 3.5.0, 3.5.1
Deprecation notice
As of 15 March 2021, Watson OpenScale requires the use of a new API version; the use of the V1 API and SDK are deprecated and disabled.

For details on the V2 REST API, see the Watson OpenScale API documentation.

For details on the V2 Python SDK, see the Watson OpenScale Python SDK 3.0.3 documentation.

Related documentation:
Watson OpenScale
Watson Studio 3.5.2

Version 3.5.2 of the Watson Studio service includes various fixes. For details, see What's new and changed in Watson Studio.

Related documentation:
Watson Studio

Refresh 2 of Version 3.5

Released: January 2021

This refresh includes support for Red Hat OpenShift Container Platform 4.6. In many cases, you must install the January 2021 release of the software to run on OpenShift 4.6.

The following services are not currently available on OpenShift 4.6:
  • Watson Assistant
  • Watson Assistant for Voice Interaction
  • Watson Discovery
  • Watson Knowledge Studio
  • Watson Language Translator
  • Watson Speech to Text
  • Watson Text to Speech
The following services have not been refreshed but can be installed on OpenShift 4.6:
  • Analytics Engine Powered by Apache Spark (Version 3.5.0)
  • Data Virtualization (Version 1.5.0)
  • Db2 Big SQL (Version 7.1.1)
  • Db2 Event Store (Version 2.0.1)
  • Db2 for z/OS® Connector (Version 3.2.2)
  • EDB Postgres (Version 2.0.0)
  • Edge Analytics (1.1.0 beta)
  • Guardium External S-TAP (Version 11.2.0)
  • Master Data Connect (Version 1.0.0)
  • MongoDB (Version 3.5.0)
  • Open Data for Industries (Version 1.0.0)
  • Streams Flows (Version 3.5.0)

The following table introduces what is new and changed for the software and services that released a refresh in January 2021.

Software Assembly version What does it mean for me?
Cloud Pak for Data control plane 3.5.2
The following versions of the Cloud Pak for Data control plane can run on Red Hat OpenShift 4.6:
  • Version 3.5.1
  • Version 3.5.2

Version 3.5.2 of the control plane also includes the following features and updates:

Quota enforcement
If you want to programmatically enforce the quotas that you set for Cloud Pak for Data or for various Cloud Pak for Data services, you must install Version 1.1.1 of the scheduling service on your cluster.

For details on quota enforcement, see Managing the platform.

For details on installing the scheduling service, see Setting up the scheduling service.

Support for Portworx 2.6.2
If you use Portworx Essentials for IBM, you must use Portworx Version 2.6.2.

Cloud Pak for Data Version 3.5.2 customers are entitled to download and use Portworx Essentials for IBM Version 2.6.2.

For details, see Setting up Portworx storage.

Version 3.5.2 of the control plane includes various fixes. For details, see What's new and changed in the control plane.

Related documentation:
Cloud Pak for Data command-line interface (cpd-cli) 3.5.2

You must install Version 3.5.2 of the cpd-cli to install Cloud Pak for Data on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

New flag for air-gapped environments
When you run the cpd-cli preload-images command, you can optionally specify the --include-patches flag, which downloads the latest patch, if one is available. This reduces the number of commands that you must run to get the latest version of the software installed on your cluster.

For installations, see Preparing for air-gapped installations.

For upgrades, see Preparing for air-gapped upgrades.

No need to specify the --storageclass flag during upgrade
When you upgrade a service using cpd-cli, you don't need to specify the --storageclass flag. The upgrade command uses the storage class that was specified when you installed the service. If you specify the --storageclass flag during an upgrade, the value is ignored and the value from the existing installation is used.
Cloud Pak for Data common core services 3.5.2

This release of the common core services includes changes to support features and updates in Watson Studio and Watson Knowledge Catalog.

If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.

Version 3.5.2 of the common core services includes various fixes. For details, see What's new and changed in the common core services.

Cloud Pak for Data scheduling service 1.1.1

You must install Version 1.1.1 of the scheduling service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Quota enforcement
If you want to programmatically enforce the quotas that you set for Cloud Pak for Data or for various Cloud Pak for Data services, you must install Version 1.1.1 of the scheduling service on your cluster.

For details on quota enforcement, see Managing the platform.

For details on installing the scheduling service, see Setting up the scheduling service.

Co-scheduling of pods
(For Watson Machine Learning Accelerator.) The ability to co-schedule pods is used by parallel and AI workloads in Watson Machine Learning Accelerator to:
  • Guarantee that all pods can start
  • Remove resource deadlock
  • Enable workloads to grow and shrink
  • Support reclaiming pods in the event of resource contention
Improved GPU sharing
(For Watson Machine Learning Accelerator.) The scheduling service allows GPUs to be shared between competing groups, which improves GPU utilization. Sharing policies govern how to resolve resource contention.
Related documentation:
Cognos Analytics 3.5.2

You must install Version 3.5.2 of the Cognos Analytics service if you want to install the service on Red Hat OpenShift 4.6.

This release provides software version 11.1.7 Fix Pak 2 of Cognos Analytics.

Version 3.5.2 of the Cognos Analytics service includes various fixes. For details, see What's new and changed in Cognos Analytics.

Related documentation:
Cognos Analytics
Cognos Dashboards 3.5.1

You must install Version 3.5.1 of the Cognos Dashboards service if you want to install the service on Red Hat OpenShift 4.6.

Version 3.5.1 of the Cognos Dashboards service includes various fixes. For details, see What's new and changed in Cognos Dashboards.

Related documentation:
Cognos Dashboards
Data Refinery 3.5.2
Version 3.5.2 of the Data Refinery service is installed when you install either of the following services:
  • Watson Studio Version 3.5.1
  • Watson Knowledge Catalog Version 3.5.2

Version 3.5.2 of the Data Refinery service includes various fixes. For details, see What's new and changed in Data Refinery.

Related documentation:
Data Refinery
DataStage 3.5.2

You must install Version 3.5.2 of the DataStage service if you want to install the service on Red Hat OpenShift 4.6.

Version 3.5.2 of the DataStage service includes various fixes. For details, see What's new and changed in DataStage.

Related documentation:
Db2 3.5.2

You must install Version 3.5.2 of the Db2 service if you want to install the service on Red Hat OpenShift 4.6.

Related documentation:
Db2
Db2 Data Gate 1.1.2
The following versions of the Db2 Data Gate service can run on Red Hat OpenShift 4.6:
  • Version 1.1.1
  • Version 1.1.2

Version 1.1.2 of the Db2 Data Gate service includes minor bug fixes.

Related documentation:
Db2 Data Gate
Db2 Data Management Console 3.5.2

You must install Version 3.5.2 of the Db2 Data Management Console service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Upgrade support
If you have a previous release the Db2 Data Management Console service installed, you can use the cpd-cli to upgrade to Version 3.5.2.

For details, see Upgrading Db2 Data Management Console.

Version 3.5.2 of the Db2 Data Management Console service includes various fixes. For details, see What's new and changed in Db2 Data Management Console.

Related documentation:
Db2 Data Management Console
Db2 Warehouse 3.5.2

You must install Version 3.5.2 of the Db2 Warehouse service if you want to install the service on Red Hat OpenShift 4.6.

Related documentation:
Db2 Warehouse
Decision Optimization 3.5.1

You must install Version 3.5.1 of the Decision Optimization service if you want to install the service on Red Hat OpenShift 4.6.

Version 3.5.1 of the Decision Optimization service includes various fixes. For details, see What's new and changed in Decision Optimization.

Related documentation:
Decision Optimization
Execution Engine for Apache Hadoop 3.5.1

You must install Version 3.5.1 of the Execution Engine for Apache Hadoop service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Support for Cloudera 7.1.x
Execution Engine for Apache Hadoop Version 3.5.1 supports the Cloudera 7.1.x platform.

Version 3.5.1 of the Execution Engine for Apache Hadoop service includes various fixes. For details, see What's new and changed in Execution Engine for Apache Hadoop.

Related documentation:
Execution Engine for Apache Hadoop
Jupyter Notebooks with Python 3.7 for GPU 3.5.1

You must install Version 3.5.1 of the Jupyter Notebooks with Python 3.7 for GPU service if you want to install the service on Red Hat OpenShift 4.6.

Related documentation:
Jupyter Notebooks with Python 3.7 for GPU
Jupyter Notebooks with R 3.6 3.5.1

You must install Version 3.5.1 of the Jupyter Notebooks with R 3.6 service if you want to install the service on Red Hat OpenShift 4.6.

Related documentation:
Jupyter Notebooks with R 3.6
OpenPages 8.2.1

You must install Version 8.2.1 of the OpenPages service if you want to install the service on Red Hat OpenShift 4.6.

Version 8.2.1 of the OpenPages service includes various fixes. For details, see What's new and changed in OpenPages.

Related documentation:
OpenPages
Planning Analytics 3.5.1

You must install Version 3.5.1 of the Planning Analytics service if you want to install the service on Red Hat OpenShift 4.6.

Version 3.5.1 of the Planning Analytics service includes various fixes. For details, see What's new and changed in Planning Analytics.

Related documentation:
Planning Analytics
RStudio Server with R 3.6 3.5.1

You must install Version 3.5.1 of the RStudio Server with R 3.6 service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Support for Spark 3.0
You can now connect to a Spark 3.0 kernel from RStudio.
Related documentation:
RStudio Server with R 3.6
SPSS Modeler 3.5.1

You must install Version 3.5.1 of the SPSS Modeler service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Support for Python 3.8.6
SPSS Modeler supports Python 3.8.6 for scripting and in Extension nodes.
Importing and exporting flows
When you import or export a project, you can optionally include SPSS Modeler flows.
Connecting to SPSS Analytic Server
You can write data to an SPSS Analytic Server connection. If the data already exists in the connection, you can choose to stop with an error.
New node
The Streaming TMC node enables you to build and score temporal causal models in a single step. For details, see TMC node.
Performance improvements
Performance has been improved when importing data from a TM1 database.
New scripting properties
You can now use scripting in Data Asset Import nodes and Data Asset Export nodes. For details see:

Version 3.5.1 of the SPSS Modeler service includes various fixes. For details, see What's new and changed in SPSS Modeler.

Related documentation:
SPSS Modeler
Streams 5.5.1

You must install Version 5.5.1 of the Streams service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Default settings for data throughput
You can configure the default application service resource CPU and memory settings to handle your data throughput. You can configure these settings by editing the Streams instance details in the Cloud Pak for Data web client.
Data retrieval endpoint improvements
If you use data retrieval endpoints for one-time use data, you can configure the endpoint operator to remove buffered data as it is read. Removing the buffered data can help with scalability and allow for simpler client API code. To configure the endpoint, you use the new consumingReads parameter on the EndpointSink operator.
REST API improvements
To support greater interoperability with other data sources, the application service REST API now supports the following features:
  • User-configurable list names
  • Non-list data item input
  • Additional options for how data items are returned
Manage access to Streams application services
You can manage access to Streams application services from the Instances list in the Cloud Pak for Data web client.

For details, see Streams application services.

Version 5.5.1 of the Streams service includes various fixes. For details, see What's new and changed in Streams.

Related documentation:
Streams
Watson Knowledge Catalog 3.5.2

You must install Version 3.5.2 of the Watson Knowledge Catalog service if you want to install the service on Red Hat OpenShift 4.6.

Version 3.5.2 of Watson Knowledge Catalog also includes the following features and updates:
Support for Microsoft SQL Server
The synchronization of assets in the default catalog and information assets for Microsoft SQL Server connections is now supported.

Version 3.5.2 of the Watson Knowledge Catalog service includes various fixes. For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson Machine Learning 3.5.1

You must install Version 3.5.1 of the Watson Machine Learning service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

CPDCTL for model lifecycle management
You can now use CPDCTL command-line interface to manage the lifecycle of models. By using the CLI, you manage configuration settings and automate an end-to-end flow that includes training a model, saving it, creating a deployment space, and deploying the model.

For details on using the CPCTL commands in a notebook, see Notebooks.

Related documentation:
Watson Machine Learning
Watson Machine Learning Accelerator 2.2.1

You must install Version 2.2.1 of the Watson Machine Learning service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Upgrade support
If you have a previous release the Watson Machine Learning Accelerator service installed on Cloud Pak for Data Version 3.5, you can use the cpd-cli to upgrade to Version 2.2.1.

For details, see Upgrading Watson Machine Learning Accelerator.

Version 2.2.1 of the Watson Machine Learning Accelerator service includes various fixes. For details, see What's new and changed in Watson Machine Learning Accelerator.

Related documentation:
Watson Machine Learning Accelerator
Watson OpenScale 3.5.1

You must install Version 3.5.1 of the Watson OpenScale service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

Batch processing
Configure Watson OpenScale to work in batch mode by connecting a custom Watson OpenScale machine learning engine to an Apache Hive database and an Apache Spark analytics engine. Unlike online scoring, where the scoring is done in real-time and the payload data can be logged into the Watson OpenScale data mart, batch processing is done asynchronously. The batch processor reads the scored data and derives various model metrics. This means that millions of transactions can be processed without bringing the data into the data mart.

For details, see Batch processing overview

This feature was previously released in the cpd-aiopenscale-3.5.0-patch-1 patch.

Version 3.5.1 of the Watson OpenScale service includes various fixes. For details, see What's new and changed in Watson OpenScale.

Related documentation:
Watson OpenScale
Watson Studio 3.5.1

You must install Version 3.5.1 of the Watson Studio service if you want to install the service on Red Hat OpenShift 4.6.

In addition this release also includes the following features and updates:

CPDCTL for notebook lifecycle management
You can now use CPDCTL command-line interface to manage the lifecycle of notebooks. By using the notebook CLI, you can automate the flow for creating notebooks or scripts and running jobs, moving notebooks between projects in Watson Studio, and adding custom libraries to notebook runtime environments.

For details on using the CPCTL commands in a notebook, see Notebooks.

Version 3.5.1 of the Watson Studio service includes various fixes. For details, see What's new and changed in Watson Studio.

Related documentation:
Watson Studio

Refresh 1 of Version 3.5

Released: December 2020

Service What's new
Watson OpenScale
Batch processing
Configure Watson OpenScale to work in batch mode by connecting a custom Watson OpenScale machine learning engine to an Apache Hive database and an Apache Spark analytics engine. Unlike online scoring, where the scoring is done in real-time and the payload data can be logged into the Watson OpenScale data mart, batch processing is done asynchronously. The batch processor reads the scored data and derives various model metrics. This means that millions of transactions can be processed without bringing the data into the datamart. To enable batch processing, you must apply the cpd-aiopenscale-3.5.0-patch-1 patch.

For details about installing the patch, see Available patches for Watson OpenScale. For details about using the batch processor, see Batch processing.

What's new in Version 3.5

IBM Cloud Pak for Data 3.5 introduces a home page that is now customizable, simpler navigation, improved platform and production workload management, and broader support for connections from the platform with easier connection management. The release also includes support for zLinux, a vault to store sensitive data, several new services, and numerous updates to existing services.

Platform enhancements

The following table lists the new features that were introduced in Cloud Pak for Data Version 3.5.

What's new What does it mean for me?
Customize the home page
In Cloud Pak for Data Version 3.5, you can customize the home page in two ways:
Platform-level customization
A Cloud Pak for Data administrator can specify which cards and links to display on the home page.
Cards
The cards that are available from the home page are determined by the services that are installed on the platform.

You can disable cards if you don't want users to see them. The changes apply to all users. However, the cards that an individual user sees are determined by their permissions and the services that they have access to.

Resource links
You can customize the links that are displayed in the Resources section of the home page.

For details, see Customizing the home page.

Personal customization
Each user can specify the cards that are displayed on their home page. (However, the list of cards that they can choose from is determined by the Cloud Pak for Data administrator.)

In addition, each user can specify which links to display in the Quick navigation section of their home page.

Home page

These features are offered in addition to the branding features that were introduced in Cloud Pak for Data 3.0.1.

Create user groups
A Cloud Pak for Data administrator can create user groups to make it easier to manage large numbers of users who need similar permissions.

When you create a user group, you specify the roles that all of the members of the group have.

If you configure a connection to an LDAP server, user groups can include:
  • Existing platform users
  • LDAP users
  • LDAP groups
You can assign a user group access to various assets on the platform in the same way that you assign an individual user access. The benefit of a group is that it is easier to:
  • Give many users access to an asset.
  • Remove a user's access to assets by removing them from the user group.
Manage your cluster resources with quotas
Cloud Pak for Data Version 3.5 makes it easier to manage and monitor your Cloud Pak for Data deployment.

The Platform management page gives you a quick overview of the services, service instances, environments, and pods running in your Cloud Pak for Data deployment. The Platform management page also shows any unhealthy or pending pods. If you see an issue, you can use the cards on the page to drill down to get more information about the problem.

The Platform management page

In addition, you can see your current vCPU and memory use. You can optionally set quotas to help you track your actual use against your target use. When you set quotas, you specify alert thresholds for vCPU and memory use. When you reach the alert threshold, the platform sends you an alert so that you aren't surprised by unexpected spikes in resource use.

Manage and monitor production workloads
The Deployment spaces page gives you a dashboard that you can use to monitor and manage production workloads in multiple deployment spaces.
This page makes it easier for Operations Engineers to manage jobs and online deployments, regardless of where they are running. The dashboard helps you assess the status of workloads, identify issues, and manage workloads. You can use this page to:
  • Compare jobs.
  • Identify issues as they surface.
  • Accelerate problem resolution.

Deployment spaces page

Common core services This feature is available only when the Cloud Pak for Data common core services are installed. The common core services are automatically installed by services that rely on them. If you don't see the Deployment spaces page, it's because none of the services that are installed on your environment rely on the common core services.

Store secrets in a secure vault
Cloud Pak for Data introduces a new set of APIs that you can use to protect access to sensitive data. You can create a vault that you can use to store:
  • Tokens
  • Database credentials
  • API keys
  • Passwords
  • Certificates

For more information, see Credentials and secrets API.

Improved navigation
The Cloud Pak for Data navigation menu is organized to focus on the objects that you need to access, such as:
  • Projects
  • Catalogs
  • Data
  • Services
  • Your task inbox

The items in the navigation depend on the services that are installed.

Manage connections more easily
The Connections page makes it easier for administrators to define and manage connections and for users to find connections.

The Connections page is a catalog of connections that can be used by various services across the platform. Any user who has access to the platform can see the connections on this page. However, only users with the credentials for the underlying data source can use a connection.

Example of connections users might choose from

Users who have the Admin role on the connections catalog can create and manage these connections. Unlike previous releases of Cloud Pak for Data, services can refer to these connections, rather than creating local copies. This means that any changes you make on the Connections page are automatically cascaded to the services that use the connection.

Common core services This feature is available only when the Cloud Pak for Data common core services are installed. The common core services are automatically installed by services that rely on them. If you don't see the Connections page, it's because none of the services that are installed on your environment rely on the common core services.

Workflows for managing business processes
You can use workflows to manage your business processes. For example, when you install Watson Knowledge Catalog, the service includes predefined workflow templates that you can use to control the process of creating, updating, and deleting governance artifacts.

From the Workflow management page, you can define and configure the types of workflows that you need to support your business processes.

You can import and configure BPMN files from Flowable.

Service The feature is available only if Watson Knowledge Catalog is installed.

For details, see Workflows.

Connect to storage volumes
In Cloud Pak for Data Version 3.5, you can connect to storage volumes from the Connections page or from services that support storage volume connections.

The storage volumes can be on external Network File System (NFS) storage or persistent volume claims (PVCs). This feature lets you access the files that are stored in these volumes from Jupyter Notebooks, Spark jobs, projects, and more. For details, see Connecting to data sources.

You can also create and manage volumes from the Storage volumes page. For more information, see Managing storage volumes.

Improved backup and restore process
The backup and restore utility can now call hooks provided by Cloud Pak for Data services to perform the quiesce operation. Quiesce hooks offer optimizations and other enhancements compared to scaling down all Kubernetes resources. Services might be quiesced and unquiesced in a certain order, or services might be suspended without having to bring down pods to reduce the time it takes to bring down applications and bring them back up. For more information, see Backing up the file system to a local repository or object store.
Audit service enhancements
The Audit Logging Service in Cloud Pak for Data now supports increased events monitoring in the zen-audit-config configmap.

If you updated the zen-audit-config configmap to forward auditable events to an external security information and event management (SIEM) solution using the Cloud Pak for Data Audit Logging Service, you must update the zen-audit-config configmap to continue forwarding auditable events.

From:

<match export export.**>

To:

<match export export.** records records.** syslog syslog.**>

You can also use the oc patch configmap command to update the zen-audit-config configmap. For more information, see Export IBM Cloud Pak for Data audit records to your security information and event management solution.

Configure the idle web session timeout
A Cloud Pak for Data administrator can configure the idle web session timeout in accordance with your security and compliance requirements. If a user leaves their session idle in a web browser for the specified length of time, the user is automatically logged out of the web client.
Auditing assets with IBM Guardium
The method for integrating with IBM Guardium has changed. IBM Guardium is no longer available as an option from the Connections page. Instead, you can connect to your IBM Guardium appliances from the Platform configuration page.

For details, see Auditing your sensitive data with IBM Guardium.

Common core services
Common core services can be installed once and used by multiple services. The common core services support:
  • Connections
  • Deployment management
  • Job management
  • Notifications
  • Search
  • Projects
  • Metadata repositories

The common core services are automatically installed by services that rely on them. If you don't see these features in the web client, it's because the common core services are not supported by any of the services that are installed on your environment.

New cpd-cli commands
You can use the Cloud Pak for Data command line interface to:
  • Manage service instances
  • Back up and restore the project where Cloud Pak for Data is deployed
  • Export and import Cloud Pak for Data metadata
Use your Cloud Pak for Data credentials to authenticate to a data source
Some data sources now allow you to use your Cloud Pak for Data credentials for authentication. Log in to Cloud Pak for Data and never enter credentials for the data source connection. If you change your Cloud Pak for Data password, you don't need to change the password for each data source connection. Data sources that support Cloud Pak for Data credentials have the selection Use your Cloud Pak for Data credentials to authenticate to the data source on the data source connection page. When you add a new connection to a project, the selection is available under Personal credentials.

The following data sources support Cloud Pak for Data credentials:

  • HDFS via Execution Engine for Hadoop *
  • Hive via Execution Engine for Hadoop *
  • IBM Cognos Analytics
  • IBM Data Virtualization
  • IBM Db2
  • Storage volume *

* HDFS via Execution Engine for Hadoop, Hive via Execution Engine for Hadoop, and Storage volume support only Cloud Pak for Data credentials. 

Service enhancements

The following table lists the new features that are introduced for existing services in Cloud Pak for Data Version 3.5:

What's new What does it mean for me?
Analytics Engine Powered by Apache Spark
Spark 3.0
Analytics Engine Powered by Apache Spark now supports Spark 3.0. You can select:
  • The Spark 3.0 template to run Spark jobs or applications that run on your Cloud Pak for Data cluster by using the Spark jobs REST APIs.
  • A Spark 3 environment to run analytical assets in Watson Studio analytics projects.
Cognos Analytics
Support for additional databases
Cognos Analytics now supports connections to the following additional databases:
  • Amazon Athena
  • Apache Hive
  • Cloudera Impala
  • Informix
  • MariaDB
  • MongoDB
  • Netezza®
  • Snowflake
Support for SSL data sources
You can copy SSL certificates to support SSL data sources.

For details, see Copying SSL certificates, deployments, and JDBC drivers.

Upgrade using the instance upgrade from version 3.2.2
If you are starting with Cognos Analytics on Cloud Pak for Data Version 3.2.2, you can use the instance upgrade from version 3.2.2 to upgrade to version 3.5.

For details, see Upgrading Cognos Analytics.

Data Refinery
Use personal credentials for connections
If you create a connection and select the Personal credentials option, other users can use that connection only if they supply their own credentials for the data source.
Users who have credentials for the underlying data source can:
  • Select the connection to create a Data Refinery flow
  • Edit or change a location when modifying a Data Refinery flow
  • Select a data source for the Join operation

For information about creating a project-level connection with personal credentials, see Adding connections to analytics projects.

Use the Union operation to combine rows from two data sets that share the same schema

Union operation

The Union operation is in the ORGANIZE category. For more information, see GUI operations in Data Refinery.

Perform aggregate calculations on multiple columns in Data Refinery
You can now select multiple columns in the Aggregate operation. Previously all aggregate calculations applied to one column.

Aggregate operation

The Aggregate operation is in the ORGANIZE category. For more information, see Aggregate in GUI operations in Data Refinery.

Automatically detect and convert date and timestamp data types
When you open a file in Data Refinery, the Convert column type GUI operation is automatically applied as the first step if it detects any non-string data types in the data. In this release, date and timestamp data are detected and are automatically converted to inferred data types. You can change the automatic conversion for selected columns or undo the step. For information about the supported inferred date and timestamp formats, see the FREQUENTLY USED category in Convert column type in GUI operations in Data Refinery.
Change the decimal and thousands grouping symbols in all applicable columns
When you use the Convert column type GUI operation to detect and convert the data types for all the columns in a data asset, you can now also choose the decimal symbol and the thousands grouping symbol if the data is converted to an Integer data type or to a Decimal data type. Previously you had to select individual columns to specify the symbols.

For more information, see the FREQUENTLY USED category in Convert column type in GUI operations in Data Refinery.

Filter values in a Boolean column
You can now use the following operators in the Filter GUI operation to filter Boolean (logical) data:
  • Is false
  • Is true

Filter operation

For more information, see the FREQUENTLY USED category in Filter in GUI operations in Data Refinery.

In addition, Data Refinery includes a new template for filtering by Boolean values in the filter coding operation:
filter(`<column>`== <logical>)

For more information about the filter templates, see Interactive code templates in Data Refinery.

Data Refinery flows are supported in deployment spaces
You can now promote a Data Refinery flow from a project to a deployment space. Deployment spaces are used to manage a set of related assets in a separate environment from your projects. You can promote Data Refinery flows from multiple projects to a space. You run a job for the Data Refinery flow in the space and then use the shaped output as input for deployment jobs in Watson Machine Learning.

For instructions, see Promote a Data Refinery flow to a space in Managing Data Refinery flows.

Support for TSV files
You can now refine data in files that use the tab-separated-value (TSV) format. TSV files are read-only.
SJIS encoding available for input and output
SJIS (short for Shift JIS or Shift Japanese Industrial Standards) encoding is an encoding for the Japanese language. SJIS encoding is supported only for CSV and delimited files.

You can change the encoding of input files and output files.

To change the encoding of the input file, click the "Specify data format" icon when you open the file in Data Refinery. See Specifying the format of your data in Data Refinery.

To change the encoding of the output (target) file in Data Refinery, open the Information pane and click the Details tab. Click the Edit button. In the DATA REFINERY FLOW OUTPUT pane, click the Edit icon.

New jobs user interface for running and scheduling flows
For more information, see the What's new entry for Watson Studio.
New visualization charts
For more information, see the What's new entry for Watson Studio.
Data Virtualization
Improve query performance by using cache recommendations
If your queries take a long time to run but your data doesn't change constantly, you can cache the results of queries to make your queries more performant. Data Virtualization analyzes your queries and provides cache recommendations to improve query performance.

For details, see Cache recommendations.

Optimize query performance by using distributed processing
Data Virtualization can determine the optimal number of worker nodes required to process a query. The number of worker nodes is determined based on the number of data sources connected to the service, available service resources, and the estimated size of the query result.
Manage your virtual data by using Data Virtualization APIs
With the Data Virtualization REST API, you can manage your virtual data, data sources, and user roles. Additionally, you can use the API to virtualize and publish data to the catalog.

For details, see Data Virtualization REST API.

Governance and security enhancements for virtual objects
When Watson Knowledge Catalog is installed, you can use policies and data protection rules from Watson Knowledge Catalog to govern your virtual data. Data asset owners are now exempt from data protection rules and policy enforcement in Data Virtualization.

You can also publish your virtual objects to the catalog more easily and efficiently. For example, when you create your virtual objects by using the Data Virtualization user interface, your virtual objects are published automatically to the default catalog in Watson Knowledge Catalog.

Optionally, you can now publish your virtual objects by using the Data Virtualization REST APIs.

For details, see Governing virtual data.

Support for single sign-on and JWT authentication
You can now authenticate to Data Virtualization by using the same credentials you use for the Cloud Pak for Data platform. Additionally, Data Virtualization now supports authentication by using a JSON Web Token (JWT).

For details, see User credentials and authentication methods.

Support for additional data sources
You can now connect to the following data sources:
  • Greenplum
  • Salesforce.com
  • SAP OData

For details, see Adding data sources.

Scale your deployment
You can use the cpd-cli scale command to adjust the number of worker nodes that the Data Virtualization service is running on. When you scale the service it up, it makes the service highly available and increases the processing capacity.

For details, see Provisioning Data Virtualization.

Monitor the service by using Db2 Data Management Console
You can use the integrated monitoring dashboard to ensure that the Data Virtualization service is working correctly. The monitoring dashboard is powered by Db2 Data Management Console. Additionally, the monitoring dashboard provides useful information about databases connected to Data Virtualization.

For details, see Monitoring Data Virtualization.

DataStage
Support for additional connectors
You can now connect to the following data sources:
  • Microsoft Azure Data Lake Store
  • Amazon Redshift
  • Unstructured Data
  • SAP Packs
    • A license is required to use SAP Packs in Cloud Pak for Data. SAP Packs require the legacy Windows DataStage Client to design the jobs. Jobs can then be run in Cloud Pak for Data.
    • User documentation is provided with the license for SAP Packs.
    • For more information on SAP Packs, see:

For more details, see Supported connectors.

Additional improvements and updates
  • You can now access DataStage from your projects page. You can create a DataStage project by following the path Projects > All Projects, then creating a new project of type Data transform.
  • Project creation and deletion are now asynchronous. Previously, the DataStage UI was blocked during the time that is required to create or delete a project. Now, you see a notification that says that the request to create or delete the project is submitted. The project appears after the creation or deletion process completes successfully.
  • You can now set up an NFS mount in DataStage pods to pass data files such as CSV and XML between DataStage and source or target systems.
  • You can now use dynamic configuration files without enabling PXRuntime. With this support, the nodes or pods that are used in the job are dynamically decided based on the availability of resources on them at the time of running the job. You can run your jobs by automatically using nodes that have highest resources available, increasing speed and performance.
  • You can change the resource allocation for the number of CPUs and memory to be used in your jobs.
  • Support is provided for SSL/TLS communication with RPC connection by using Nginx as a proxy server. This support provides greater security for connecting the legacy DataStage Designer client to Cloud Pak for Data. You can then use the Designer client to edit jobs in Cloud Pak for Data
  • You can create custom images to support third-party drivers. Custom images have the benefits of being unchangeable after they are built and reliably consistent across different environments. You can also scan the images for vulnerability.
  • You can use a PersistentVolume (PV) to support third-party libraries and drivers.
  • The Operations Console is enabled for stand-alone DataStage installation on Cloud Pak for Data.
  • Non en-US language packs are now supported.
  • Notification with mailx is supported. Notifications can be sent out by mailx after an activity completes in a job sequence.
  • The FileConnector heap size setting and the message handler settings are now persistent and will not be lost if pods are restarted.
  • You can now add parameters and parameter sets in the transformer dialog box.
  • LongVarChar lengths of up to 3,000,000 characters are now supported in the Transformer stage.
Db2
Deployment with operator
Db2 is now deployed by using an operator, providing better consistency and predictability and faster deployment times. You can also deploy multiple Db2 databases on the same worker node.
Reduced footprint
Db2 consumes fewer resources than in previous releases. The minimum requirement is now 1.5 VPCs per Db2 database.
Db2 REST and Db2 Graph support
You can set up your Db2 service so that application programmers can create Representational State Transfer (REST) endpoints that can be used to interact with Db2 and run most SQL statements, including DDL and DML. You can also set up the service to use Db2 Graph, so that you can query your Db2 data to perform graph analytics without requiring any changes to the underlying database structure.
Run on zLinux
You can deploy Db2 on Red Hat OpenShift clusters that run on the zLinux (s390x) operating system.
Version upgrade
The Db2 service runs Db2 Version 11.5.5.
Storage enhancements
Db2 now supports the following storage options:
  • IBM Spectrum Scale CSI 2.0
  • Red Hat OpenShift Container Storage 4.5
  • Portworx 2.5.5
More backup and restore options
You can back up or restore by using remote storage such as IBM Cloud Object Storage or Amazon S3. Db2 also now offers the option of restoring an encrypted database.
Security enhancements
You can directly authenticate with the Db2® service by using your Cloud Pak for Data user ID and password. The Db2 service uses Cloud Pak for Data authentication and authorization and supports TLS certificates. You can also authenticate with JWT tokens and API keys, and you can download the Db2 SSL certificate directly from the web console.
Db2 Data Management Console support
You can use the Db2 Data Management Console service on Cloud Pak for Data to administer, monitor, manage, and optimize the performance of Db2 databases.
Db2 Big SQL
Support for Cloudera Hadoop clusters
You can query data that is stored in remote CDP and CDH clusters, in addition to HDP clusters, which were already supported. For details, see Remote Hadoop cluster or public or private object store.
Improved integration with the web client
When logged on as an administrator, you can now use the Cloud Pak for Data web client to complete the following tasks:
  • After you install the Db2 Big SQL service, you can use the web client to provision one or more instances of the service. Each instance can use a different resource configuration, be accessed by different users, or point to a different Hadoop cluster.
  • Update an instance configuration. For each instance, you can optionally:
    • Scale the instance up or down by allocating additional or fewer resources.
    • Scale the instance out or in by adding or removing workers.
  • Track Db2 Big SQL resource usage at the instance level.
  • Gather diagnostic information, such as logs.

For details, see Provisioning Db2 Big SQL, Using the Cloud Pak for Data web client to administer the Db2 Big SQL service, and Gathering diagnostic information.

Monitor the service by using Db2 Data Management Console
You can use the integrated monitoring dashboard to ensure that the Db2 Big SQL service is working correctly. The monitoring dashboard is powered by Db2 Data Management Console.

For details, see Monitoring Db2 Big SQL.

Db2 Data Gate
Improved installation experience
It's now easier to install and configure Db2 Data Gate with simplified security setup and certificate generation on z/OS.

It's also easier to provision instances of Db2 Data Gate with a streamlined process.

Improved performance
The Db2 Data Gate service has increased throughput and lower latency when loading and synchronizing data from Db2 for z/OS to the target database.
Run on zLinux
You can deploy Db2 Data Gate on Red Hat OpenShift clusters that run on the zLinux (s390x) operating system.
Db2 Event Store
Run on larger clusters
Db2 Event Store can run on Red Hat OpenShift clusters with more than 3 worker nodes for increased performance and scalability.
Support for new data types
Db2 Event Store now supports the decimal data type.
Support for Apache Spark 2.4.6
Db2 Event Store supports the Apache Spark 2.4.6 unified analytics engine for big data processing.
Db2 Warehouse
Deployment with operator
Db2 Warehouse is now deployed by using an operator, providing better consistency and predictability and faster deployment times. You can also deploy multiple Db2 Warehouse databases on the same worker node.
Reduced footprint
Db2 Warehouse consumes fewer resources than in previous releases. The minimum requirement is now 1.5 VPCs per Db2 Warehouse database.
Db2 REST and Db2 Graph support
You can set up your Db2 Warehouse service so that application programmers can create Representational State Transfer (REST) endpoints that can be used to interact with Db2 Warehouse and run most SQL statements, including DDL and DML. You can also set up the service to use Db2 Graph, so that you can query your Db2 Warehouse data to perform graph analytics without requiring any changes to the underlying database structure.
Support for object storage providers (MPP only)
The Db2 Warehouse service in a massively parallel processing (MPP) configuration can work with data in external tables in cloud object storage providers such as Amazon S3 and Microsoft Azure Blob Storage, or any other S3 compatible storage such as IBM® Cloud Object Storage or MinIO. This option is available for Db2 Warehouse MPP deployments.
Db2 Data Management Console support
You can use the Db2 Data Management Console service on Cloud Pak for Data to administer, monitor, manage, and optimize the performance of Db2 Warehouse databases.
Run on zLinux
You can deploy Db2 Warehouse on Red Hat OpenShift clusters that run on the zLinux (s390x) operating system.
Version upgrade
The Db2 Warehouse service runs Db2 Warehouse Version 11.5.5.
Storage enhancements
Db2 Warehouse now supports the following storage options:
  • IBM Spectrum Scale CSI 2.0
  • Microsoft Azure Blob Storage (object storage)
  • Amazon S3 Cloud object storage
  • Red Hat OpenShift Container Storage 4.5
  • Portworx 2.5.5
More backup and restore options
You can back up or restore by using remote storage such as IBM Cloud Object Storage or Amazon S3. Db2 Warehouse also now offers the option of restoring an encrypted database.
Security enhancements
You can directly authenticate with the Db2® service by using your Cloud Pak for Data user ID and password. The Db2 Warehouse service uses Cloud Pak for Data authentication and authorization and supports TLS certificates. You can also authenticate with JWT tokens and API keys, and you can download the Db2 Warehouse SSL certificate directly from the web console.
Decision Optimization
Overview pane in the model builder
The overview pane provides you with model, data and solution summary information for all your scenarios at a glance. From this view you can also open an information pane where you can create or choose your deployment space.

For details, see the Overview section in Decision Optimization model builder views and scenarios .

Enhanced Explore solution view in the model builder
The Explore solution view of the model builder shows you more information about the objectives (or KPIs), solution tables, constraint or bounds relaxations or conflicts, engine statistics, and log files.

For details, see the Explore solution view section in Decision Optimization model builder views and scenarios .

Gantt charts available for any type of data
From the Visualization view of the model builder, you can create Gantt charts for any type of data, where it is meaningful. Gantt charts are no longer restricted to scheduling models only.

Visualization view with Gantt chart

For details, see the Gantt chart widget section in Visualization view.

Support for Python 3.7
The Decision Optimization model builder now targets the Python 3.7 runtime when generating notebooks from scenarios. In Watson Machine Learning, the Decision Optimization runtime now runs Python 3.7.
Improved data schema editing in the Modeling Assistant
You can now define data types for table columns and edit data schema when you use the Modeling Assistant.
Delegation of CPLEX engine solve to Watson Machine Learning
You can now delegate the Decision Optimization solve to run on Watson Machine Learning from your Java CPLEX or CPO models.
Language support

The Decision Optimization interface is now translated into multiple languages.

Execution Engine for Apache Hadoop
Integration with IBM Spectrum Conductor with Spark clusters
IBM Spectrum Conductor with Spark is now supported. You can integrate IBM Spectrum Conductor with Spark and Watson Studio by using Jupyter Endpoint Gateway endpoints. Users can open a notebook in Watson Studio to access Jupyter Endpoint Gateway instances that are running on IBM Spectrum Conductor with Spark. For details, see Spectrum environments.
New configurations that allow you to use your own certificates
The configurations convert DSXHI to do the following customizations:
  • Provide a custom Keystore to generate the required .crt.
  • Provide any custom truststore (CACERTS), where DSXHI certificates will be added.
  • Provide options to either add the host certificate to the truststore yourself or have DSXHI add it.

For details, see Installing the Execution Engine for Apache Hadoop service on Apache Hadoop clusters or on Spectrum Conductor clusters.

Support for additional types of security
Execution Engine for Apache Hadoop supports:
  • The JSON Web Tokens to Kerberos delegation token provider, which provides authentication to HiveServer2, HDFS, and HMS resources. For details, see Using delegation token endpoints.
  • The updated versions for Jupyter Endpoint Gateway 2.3 and Knox 1.4.
Improved validation
The system_check.py scripts were introduced to validate your Hadoop configuration.
Guardium External S-TAP
Improved integration with the Cloud Pak for Data web client
You can now create and manage your Guardium External S-TAP instances from the Cloud Pak for Data web client.
Support for new target databases
You can use the Guardium External S-TAP to monitor additional databases. For details, see External S-TAP supported platforms on the IBM Support portal.
Jupyter Notebooks with Python 3.7 for GPU
This service now provides environments for Python 3.7 instead of Python 3.6.
Jupyter Notebooks with R 3.6
Support for loading data from database connections
You can use the insert to code function to load data to a notebook from the following database connections:
  • Cognos Analytics
  • HTTP
  • Apache Cassandra
  • Amazon RDS for PostgreSQL
  • Amazon RDS for MySQL
  • Mounted storage volumes
  • IBM Cloud Object Storage

For details see, Data load support for database connections.

RStudio Server with R 3.6
Configure RStudio idle timeout
A Cloud Pak for Data administrator can disable or change the idle timeout of RStudio runtimes.

For details see, Disabling or changing RStudio idle timeout.

Support for RMySQL library functions
You can connect to a MySQL database and use MySQL library functions in RStudio.

For details see, Using RMySQL library functions.

SPSS Modeler
SPSS Analytic Server
A new SPSS Analytic Server connection type is available for SPSS Modeler. With this connection type, you can import and run SPSS Modeler streams (.str) that were created in SPSS Modeler classic to run on SPSS Analytic Server. See Supported data sources for SPSS Modeler for more information.
Jobs
You can now create and schedule jobs as a way of running SPSS Modeler flows. Click the Jobs icon from the SPSS Modeler toolbar and select Create a job. See Creating and scheduling jobs for more information.
New and changed nodes
SPSS Modeler includes the following new and changed nodes:
  • CPLEX® Optimization node: With this new node, you can use complex mathematical (CPLEX) based optimization via an Optimization Programming Language (OPL) model file.

    CPLEX Optimization node

  • Kernel Density Estimation (KDE) Simulation node: This new node uses the Ball Tree or KD Tree algorithms for efficient queries, and walks the line between unsupervised learning, feature engineering, and data modeling.

    KDE Simulation node

  • Data Asset Export node: This node has been redesigned. Use the node to write to remote data sources using connections, write to a data file on your local computer, or write data to your project.

    Data Asset Export node

Support for database functions
You can run SPSS Modeler desktop stream files (STR) that contain database functions.
New visualization charts
For more information, see the What's new entry for Watson Studio.
Deploy Text Analytics models to Watson Machine Learning Server
You can now deploy Text Analytics models to a Watson Machine Learning Server as you can with other model types. Deployment is the final stage of the lifecycle of a model - making a copy of the model available to test and use. For example, you can create a deployment for a model so you can submit new data to it and get a score or prediction back.
Python 3.7
SPSS Modeler now uses Python 3.7.9. Note that the Python schema has changed, so you may need to review and adjust any Python scripts you use in SPSS Modeler.
Streams
Application resource customization
You can customize the resources that are used by your application. You can:
  • Create a custom application image for dependencies, such as software packages or libraries, that are not included in the default application image.
  • Customize the resources, such as CPU or memory, that your Streams applications use by creating custom application resource templates.
For more information, see Customizing the application image.
Production workload management and monitoring
The Deployment spaces page provides a dashboard that you can use to monitor and manage Streams jobs in multiple deployment spaces.
Edge Analytics 1.1.0 beta integration
OpenShift image build integration
You can build a Docker image loaded with your streaming data application, ready for Edge Analytics deployment. For more information, see Packaging an Edge Analytics application or service for deployment.
Enhanced development environments
Use your favorite Streams development environment (streamsx Python API, notebooks, or Visual Studio Code) to build your edge application and image. For more information, see Developing edge applications with IBM Edge Analytics.
Enhanced Streams standalone applications
Application metrics, such as data tuple counters, operator costs, and user-defined metrics, can be exposed and the default threading model can be specified for standalone Streams applications.
Edge-aware samples
Explore new application samples designed for the edge.
Streams jobs as Cloud Pak for Data services
This release introduces the ability to enable a Streams job as a Cloud Pak for Data service. A streams-application service can be used to insert data into and retrieve data from a Streams job. A streams-application service is created by inserting one or more endpoint operators into an application and submitting the application to run as a job. Exchanging data with the job is done by using a REST API. The streams-application service instances are included in the Services > Instances page of the Cloud Pak for Data web client. Selecting a service entry in the list opens the REST API documentation for the service.

For additional information, restrictions, and sample applications, see Resources for Streams developers in the IBM Community.

Streams Flows
  • Support for class style in code operators
  • Support for flat map that allows returning multiple events
  • New window and aggregation operators
Watson Knowledge Catalog
Reference data set enhancements
You can customize your reference data sets in the following ways:
  • Configure hierarchies between reference data sets and between values within a reference data set.
  • Add custom columns.
  • Create values mappings, or crosswalks, between values of multiple reference data sets in 1:1, n:1, and 1:n relationships.

For details, see Reference data sets.

Catalog enhancements
Catalogs are enhanced in the following ways:
  • Additional information is shown on the new Overview page for assets, such as, the asset's path and related assets.
  • More activities are shown on the Activities page for assets.
  • COBOL copybook is now a supported asset type. You can preview the contents of copybooks.
  • You can add more types of assets and metadata to catalogs by coding custom attributes for assets and custom asset types with APIs.
New connections
Watson Knowledge Catalog can connect to:
  • Amazon RDS for MySQL
  • Amazon RDS for PostgreSQL
  • Apache Cassandra
  • Apache Derby
  • Box
  • Elasticsearch
  • HTTP
  • IBM Data Virtualization Manager for z/OS
  • IBM Db2 Event Store
  • IBM SPSS Analytic Server
  • MariaDB
  • Microsoft Azure Blob Storage
  • Microsoft Azure Cosmos DB
  • MongoDB
  • SAP HANA
  • Storage volume
In addition, the following connection names have changed:
  • PureData System for Analytics is now called Netezza (PureData® System for Analytics)

    Your previous settings for the connection remain the same. Only the name for the connection type changed.

New SSL encryption support for connections
The following connections now support SSL encryption in Watson Knowledge Catalog:
  • Amazon Redshift
  • Cloudera Impala
  • IBM Db2 for z/OS
  • IBM Db2 Warehouse
  • IBM Informix
  • IBM Netezza (PureData System for Analytics)
  • Microsoft Azure SQL Database
  • Microsoft SQL Server
  • Pivotal Greenplum
  • PostgreSQL
  • Sybase
Category roles control governance artifacts
The permissions to view and manage all types of governance artifacts, except for data protection rules, are now controlled by collaborator roles in the categories that are assigned to the artifacts.
To view or manage governance artifacts, users must meet these conditions:
  • Have a user role with one of the following permissions:
    • Access governance artifacts
    • Manage governance categories
  • Be a collaborator in a category

Category collaborators have roles with permissions that control whether they can view artifacts, manage artifacts, manage categories, and manage category collaborators. Subcategories inherit collaborators from their parent categories. Subcategories can have other collaborators, and their collaborators can accumulate more roles. The predefined collaborator, All users, includes everyone with permission to access governance artifacts.

For details, see Categories.

Changes to user permissions
If you upgraded from Cloud Pak for Data Version 3.0.1, the following user permissions are automatically migrated as part of the upgrade :
  • Users who had the Manage governance categories permission continue to have that permission and also have the Owner role for all top-level categories.
  • Users who had the Manage governance artifacts permission now have the Access governance artifacts permission, the Editor role in all categories, and the new Manage data protection rules permission.
  • All users now have the Access governance artifacts permission. However, when you add new users, the Access governance artifacts permission is not included in all of the predefined roles. It is include in the Administrator, Data Engineer, Data Steward, and Data Quality Analyst roles.
  • All users who were listed as Authors in a governance workflow now have the Access governance artifacts permission and also the Editor role in all categories.
Workflows for governance artifacts support categories
Workflow configurations for governance artifacts now require categories to identify the governance artifacts and users for the workflow:
  • When you create a new workflow configuration for governance artifacts, you must select either one category or all categories as part of the triggering condition for the workflow, along with governance artifact types and events.
  • You no longer specify artifact authors in a workflow configuration. Artifact authors are all users who have permission to edit artifacts in a category that is specified in the workflow configuration.
  • You now specify one or more of these types of assignees to approve and review artifacts: the workflow requestor, users with specified roles in the categories for the workflow, users with the Data Steward role, or selected users.

For details, see Managing workflows for governance artifacts.

Discovery enhancements
Watson Knowledge Catalog includes the following changes for discovering data:
Automated discovery
The sample size is 1,000 records by default. Changes require specific permissions.
Quick scan
With the improved version, you can perform more scalable data discovery with richer analysis results that can be published to one or more catalogs directly from the quick scan results.

For details, see Running a quick scan.

Import metadata from an analytics project
You can use the metadata import asset type to import data assets from a connection so that you can analyze and enrich the assets later.

For details, see Importing metadata.

Import additional artifacts and properties
You can now import reference data sets. When you import a reference data set, you can also import secondary categories, effective dates, and custom attribute values for most artifacts.
For business terms, you can import:
  • Type of terms relationships
  • Assigned data classes
  • Synonyms

For details, see Importing governance artifacts.

Watson Machine Learning
Support for the V4 REST APIs and Python client
Watson Machine Learning supports the generally available releases of the Watson Machine Learning V4 REST APIs and the V4 Watson Machine Learning Python client, which give you programmatic access to all of the current machine learning features.

For details, see Watson Machine Learning APIs and Watson Machine Learning Python library.

Support for Data Refinery flows
You can run Data Refinery flows jobs in a deployment space and use the resulting data as input for deployment jobs.

For details, see Deployment spaces.

Use data from NFS
You can use data from Network File System (NFS) to train models and as input data for deployment jobs. For example, you can use a CSV file from a storage volume as the training data for an AutoAI model, and use a payload file from the volume to deploy and score the trained model.

For details, see Adding data sources.

Support for additional connections
Support for more types of data connections for use training and deploying models gives you greater flexibility when you create deployment jobs.

For details, see Batch deployment details.

Support for Python 3.7
Train and deploy models and functions using new frameworks and software specifications built with Python 3.7.

For details, see Supported frameworks.

Create batch deployments for R Scripts
In addition to Python scripts, you can now deploy R scripts as a means of working with Watson Machine Learning assets.

For details, see Batch deployment details.

Deployment spaces dashboard
View deployment activity across all spaces you can access in a new deployment spaces dashboard. Use the dashboard to monitor activity for all of your spaces and view visualizations to give you insights into deployments and jobs.

For details, see Deployments dashboard.

Federated learning
Tech preview This is a technology preview and is not supported for use in production environments.

Use federated learning to train a common model using remote, secure data sets. The data sets are not shared so full data security is maintained, while the resulting model gets the benefit of the expanded training.

For details, see Federated learning.

Multiple data sources for AutoAI experiments
Tech preview This is a technology preview and is not supported for use in production environments.

AutoAI experiments support multiple data sources as input for training an experiment. Use the data join canvas to combine the data sets based on common columns, or keys, to build a unified data set. Deploy a data join model using multiple data sets as input for your jobs.

For details, see Joining data.

Save AutoAI as a Watson Machine Learning notebook
Tech preview This is a technology preview and is not supported for use in production environments.

Save an AutoAI as a Watson Machine Learning notebook so you can review the code for developing the pipelines.

For details, see Saving as a notebook.

Watson OpenScale
Enhanced explainability
The updated explainability panel is based on extensive customer feedback and focus groups. It includes the ability run "What if" scenarios.

For details, see Explaining transactions.

Indirect bias
Watson OpenScale analyzes indirect bias, which occurs when one feature can be used to stand for another. For example, one feature in a model might approximate another feature that is a protected attribute. Although it is illegal to discriminate based on race, race can sometimes correlate closely with postal code, which might be the cause of indirect bias.

For details, see Indirect bias.

Dashboard filtering for large deployments
For Watson OpenScale dashboards with a large number of deployments, you can use the new controls to filter and sort tiles.
Role-based access control
You can assign users varying levels of permissions based on the actions they need to perform.

For details, see User roles.

Support for multiple instances
You can now deploy multiple instances of the Watson OpenScale service from a single installation.

For details, see Setting up multiple instances.

Auto config
When you use resources that are already part of the Cloud Pak for Data cluster, such as Watson Machine Learning, many of the values are supplied for you when you configure Watson OpenScale.
The Drift Monitor also completes many of the values for you during configuration and setup
New version of the Python SDK
This release includes a new, more integrated version of the Watson OpenScale Python SDK.

The new Python SDK replaces the Version 1 SDK, eliminates separate APIs for each monitor, and standardizes many of the classes and methods used for monitor configuration and subscription to machine learning providers.

For details, see the Python SDK documentation.

Streamlined Fairness user interface
The Fairness Monitor has undergone extensive redesign based on your feedback! Now you can use the enhanced charts to determine balanced data and perfect equality at a glance. You can even do what-if scenarios with scoring in real time.

Fairness monitor page

Model Risk Management notifications
There are several enhancements to Model Risk Management. You can set thresholds for receiving email notifications of violations. There are enhanced PDF reports. And, when integrated with IBM OpenScale, you can now set when to send metrics (immediately, daily, or weekly).
Debiasing support for regression models
Along with classification models, Watson OpenScale now detects bias in regression models. You can both detect and mitigate bias.
Watson Studio
New jobs interface for running and scheduling notebooks and Data Refinery flows
The user interface gives you a unified view of the job information.

You can create the jobs from either of the following locations:

  • The user interface for the service
  • The Assets page of a project

For details, see Jobs in a project.

New visualization charts
You can use the following visualization charts with Data Refinery and SPSS Modeler:
Evaluation charts
Evaluation charts are combination charts that measure the quality of a binary classifier. You need three columns for input: actual (target) value, predict value, and confidence (0 or 1). Move the slider in the Cutoff chart to dynamically update the other charts. The ROC and other charts are standard measurements of the classifier.

Evaluation chart

Math curve charts
Math curve charts display a group of curves based on equations that you enter. You do not use a data set with this chart. Instead, you use it to compare the results with the data set in another chart, like the scatter plot chart.

Math curve chart

Sunburst charts
Sunburst charts display different depths of hierarchical groups. The Sunburst chart was formerly an option in the Treemap chart.

Sunburst chart

Tree charts
Tree charts represent a hierarchy in a tree-like structure. The Tree chart consists of a root node, line connections called branches that represent the relationships and connections between the members, and leaf nodes that do not have child nodes. The Tree chart was formerly an option in the Treemap chart.

Tree chart

For the full list of available charts, see Visualizing your data.

New project settings
When you create a project, you can select the following options:
Mark the project as sensitive
Marking a project as sensitive prevents members of a project from moving data assets out of the project.

For details, see Marking a project as sensitive.

Log all project activities
Logging all project activity tracks detailed project activity and creates a full activities log, which you can download to view.

For details, see Logging project activity.

New connections
Watson Studio can connect to:
  • Amazon RDS for MySQL
  • Amazon RDS for PostgreSQL
  • Apache Cassandra
  • Apache Derby
  • Box
  • Elasticsearch
  • HTTP
  • IBM Data Virtualization Manager for z/OS
  • IBM Db2 Event Store
  • IBM SPSS Analytic Server
  • MariaDB
  • Microsoft Azure Blob Storage
  • Microsoft Azure Cosmos DB
  • MongoDB
  • SAP HANA
  • Storage volume
In addition, the following connection names have changed:
  • PureData System for Analytics is now called Netezza (PureData System for Analytics).

    Your previous settings for the connection remain the same. Only the name for the connection type changed.

New SSL encryption support for connections
The following connections now support SSL encryption in Watson Studio:
  • Amazon Redshift
  • Cloudera Impala
  • IBM Db2 for z/OS
  • IBM Db2 Warehouse
  • IBM Informix
  • IBM Netezza (PureData System for Analytics)
  • Microsoft Azure SQL Database
  • Microsoft SQL Server
  • Pivotal Greenplum
  • PostgreSQL
  • Sybase
Support for Python 3.7
The default Python environment version in Watson Studio Watson Studio is now Python 3.7.

Python 3.6 is being deprecated. You can continue to use the Python 3.6 environments; however you will be notified that you should move to a Python 3.7 environment.

When you switch from Python 3.6 to Python 3.7, you might need to update your code if the versions of open source libraries that you use are different in Python 3.7.

Spark 3.0
You can run analytical assets from Watson Studio analytics projects in a Spark 3 environment.

If you use the Spark Jobs REST APIs, provided by Analytics Engine Powered by Apache Spark, to run Spark jobs or applications on your Cloud Pak for Data cluster, you can use the Spark 3.0 template.

Notebook execution progress restored
If you accidentally close the browser window while your notebook is still running, or if you are logged out by the system during a long running job, the notebook will continue running and all output cells are restored when you open the notebook again. The execution progress of a notebook can be restored only for notebooks that run in a local kernel. If your notebook runs on a Spark or Hadoop cluster, and you open the notebook again, any notebook changes that were not saved are lost.
Use a self-signed certificate to authenticate to enterprise Git repositories
If you want to store your analytics project in an enterprise-grade instance of Git, such as GitHub Enterprise, and your instance uses a self-signed certificate for authentication, you can specify the self-signed certificate in PEM format when you add your personal access token to Cloud Pak for Data.

New services

The following table lists the new services that are introduced in Cloud Pak for Data Version 3.5:

Category Service Pricing What does it mean for me?
Data source Db2 Data Management Console Included with Cloud Pak for Data
Use Db2 Data Management Console to administer, monitor, manage, and optimize your integrated Db2 databases, including Db2 Big SQL and Data Virtualization, from a single user interface. The console helps you improve your productivity by providing a simplified process for managing and maintaining your complex database ecosystem across Cloud Pak for Data.

The console home page provides an overview of all of the databases that you are monitoring. The home page includes the status of database connections and monitoring metrics that you can use to analyze and improve the performance of your databases.

Db2 Data Management Console home page

From the console, you can also:
  • Administer databases
  • Work with database objects and utilities
  • Develop and run SQL scripts
  • Move and load large amounts of data into databases for in-depth analysis
  • Monitor the performance of your Db2 databases

Learn more about Db2 Data Management Console.

Industry solutions OpenPages Separately priced
You can use OpenPages to manage risk and regulatory challenges across your organization. OpenPages is an integrated governance, risk, and compliance (GRC) suite that can help your organization identify, manage, monitor, and report on risk and compliance initiatives that span your enterprise. The service provides a powerful, scalable, and dynamic set of tools that can help you with:
  • Business continuity management
  • Financial controls management
  • Internal audit management
  • IT governance
  • Model risk governance
  • Operation risk management
  • Policy management
  • Regulatory compliance management
  • Third-party risk management

OpenPages dashboard

Learn more about OpenPages.

Industry solutions

IBM Open Data for Industries Separately priced
Collect, describe, and provide your data according to Oil & Gas industry standards.

IBM Open Data for Industries provides a toolset that supports an industry-standard methodology for collecting and describing Oil & Gas data and serving that data to various applications and services that consume it.

IBM Open Data for Industries provides a reference implementation for a data platform to integrate silos and simplify access to this data for stakeholders. It standardizes the data schemas and provides a set of unified APIs for bringing data into Cloud Pak for Data, describing, validating, finding, and retrieving data elements and their metadata. Effectively, Open Data for Industries becomes a system of record for subsurface and wells data.

Application developers can use these APIs to create applications that are directly connected to the stakeholder's data sets. After the application is developed, it requires minimal or no customization to deploy it for multiple stakeholders that adhere to the same APIs and data schemas.

In addition, stakeholders can use these APIs to connect their applications with the platform and take advantage of the seamless data lifecycle in Cloud Pak for Data.

Learn more about IBM Open Data for Industries.

AI Watson Machine Learning Accelerator Included with Cloud Pak for Data
Watson Machine Learning Accelerator is a deep learning platform that data scientists can use to optimize training models and monitor deep learning workloads.

Watson Machine Learning Accelerator can be connected to Watson Machine Learning to take advantage of the multi-tenant resource plans that manage resource sharing across Watson Machine Learning projects. With this integration, data scientists can use the Watson Machine Learning Experiment Builder and Watson Machine Learning Accelerator hyperparameter optimization.

Learn more about Watson Machine Learning Accelerator.

Installation enhancements

What's new What does it mean for me?
Red Hat OpenShift support
You can deploy Cloud Pak for Data Version 3.5 on the following versions of Red Hat OpenShift:
  • Version 3.11
  • Version 4.5
Support for zLinux
You can deploy the following Cloud Pak for Data software on zLinux (s390x):
  • The Cloud Pak for Data control plane
  • Db2
  • Db2 Warehouse
  • Db2 for z/OS Connector
  • Db2 Data Gate
Simplified and updated installation commands
The Cloud Pak for Data command-line interface uses a simplified syntax. The cpd-Operating_System command is replaced by the cpd-cli command.

When you download the installation files, you must select the appropriate package for the operating system where you will run the commands. For details, see Obtaining the installation files.

Many of the cpd-cli commands have different syntax. Review the installation documentation carefully to ensure that you use the correct syntax.

For example:
  • On air-gapped clusters, the cpd-Operating_System preloadImages command is now cpd-cli preload-images.
  • When you run the install or upgrade commands, you specify the --latest-depenency flag to ensure that the latest prerequisite components are installed.
Upgrading the Cloud Pak for Data metadata
Before you can upgrade to Cloud Pak for Data Version 3.5, you must upgrade the Cloud Pak for Data metadata by running the cpd-cli operator-upgrade command.

For details, see Preparing to upgrade the Watson Machine Learning Accelerator.

New service account
The Cloud Pak for Data control plane requires an additional service account: cpd-norbac-sa, which is bound to a restricted security context constraint (SCC).

This security account is specified in the cpd-cli adm command for the control plane.

Simplified storage overrides
If an assembly requires an override for Portworx or OpenShift Container Storage, the assembly includes predefined override files. The instructions for the assembly will include information on how to install the service with the appropriate override file for your environment.
Rolling back patches
Whether a patch succeeded or failed, you can now revert a service to the state before the patch was applied by running the cpd-cli patch rollback command.

For details, see Applying patches.

Operator-based installation on the Red Hat Marketplace
If you want to install Cloud Pak for Data from the Red Hat Marketplace, you can use the Cloud Pak for Data operator. You can use the operator to install, scale, and upgrade the Cloud Pak for Data control plane and services using a custom resource (CR).

The operator will be available through the Red Hat Marketplace and is compatible with the Red Hat Operator Lifecycle Manager.

For details, see Installing the Cloud Pak for Data from the OpenShift Console.

Deprecated features

What's changed What does it mean for me?
Open Source Management
This service is deprecated and cannot be deployed on Cloud Pak for Data Version 3.5.
Regulatory Accelerator
This service is deprecated and cannot be deployed on Cloud Pak for Data Version 3.5.
Extracting business terms and governance rules from PDF files
This feature was provided as a technology preview in Watson Knowledge Catalog and is no longer supported.
Generating terms from assets
This feature was provided as a technology preview in Watson Knowledge Catalog and is no longer supported.

LDAP group roles

You can no longer map an LDAP group directly to a Cloud Pak for Data role.

Instead, you can create user groups and add an LDAP group to the user group. When you create a user group, you can assign one or more roles to the user group.

Previous releases

Looking for information about what we've done in previous releases? See the following topics in IBM Knowledge Center: