Table of contents

What's new in IBM Cloud Pak for Data?

See what new features and improvements are available in the latest release of IBM® Cloud Pak for Data.

Quick links

What's new in Version 3.0.1

IBM Cloud Pak for Data Version 3.0.1 introduces support for Red Hat® OpenShift® Container Platform Version 4.5 and Red Hat OpenShift Container Storage so that you can deploy your applications on secure and scalable resources. The release also includes enhanced auditing capabilities, several new services, and numerous updates to existing services.

September 2020 update

Red Hat announced that Red Hat OpenShift Container Platform Version 4.3 will be out of support on 22 October 2020.

Starting 1 September 2020, Cloud Pak for Data is introducing support for Red Hat OpenShift Container Platform Version 4.5 and deprecating support for Red Hat OpenShift Container Platform Version 4.3.

If you already installed Cloud Pak for Data on Red Hat OpenShift Container Platform Version 4.3, work with your IBM Support representative to migrate to Red Hat OpenShift Container Platform Version 4.5.

If you are ready to install Cloud Pak for Data Version 3.0.1 on Red Hat OpenShift Container Platform 4.5 ensure that you install the appropriate versions of Cloud Pak for Data services on your cluster. In addition, if you are using Portworx storage, ensure that you install Portworx Version 2.5.5. For details, see Setting up Portworx storage.

Service Minimum version required for 4.5 support Notes
Cloud Pak for Data control plane 3.0.1 The previously released version works on OpenShift 4.5.

For details, see Installing IBM Cloud Pak for Data.

Analytics Engine Powered by Apache Spark cpd-3.0.1-spark-patch-3 Install the patch after you install the 3.0.1 version of the service. For details, see https://www.ibm.com/support/pages/node/5693756.
Cognos® Analytics 3.2.2 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.2.2 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see Cognos Analytics.

Cognos Dashboards 3.0.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Cognos Dashboards.

Data Refinery See Watson™ Knowledge Catalog or Watson Studio. Data Refinery is installed when you install either Watson Knowledge Catalog or Watson Studio.
Data Virtualization 1.4.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Data Virtualization.

DataStage® 11.7.1.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see DataStage.

Db2® 3.0.2 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.0.2 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see Db2.

Db2 Big SQL 6.0.0 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Db2 Big SQL.

Db2 Data Gate 1.1.0 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Db2 Data Gate.

Db2 Event Store 3.0.2 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.0.2 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see Db2 Event Store.

Db2 for z/OS® Connector 3.2.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Db2 for z/OS Connector.

Db2 Warehouse 3.0.2 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.0.2 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see Db2 Warehouse.

Decision Optimization 3.0.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Decision Optimization.

Execution Engine for Apache Hadoop 3.0.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Execution Engine for Apache Hadoop.

Financial Crimes Insight® 6.5.1 If you are upgrading from OpenShift 4.3 to OpenShift 4.5, and Financial Crimes Insight is already installed, you can use your existing installation of Financial Crimes Insight.

If you are installing Financial Crimes Insight on OpenShift 4.5, you must use the updated images. For details, see Financial Crimes Insight platform installation steps.

Financial Services Workbench 2.2.0 The previously released version works on OpenShift 4.5.
For links to installation and upgrade instructions, see:
Guardium® External S-TAP® 11.2.0 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Guardium External S-TAP.

Jupyter Notebooks with Python 3.6 for GPU cpd-3.0.1-runtime-addon-gpupy36-patch-2 Install the patch after you install the 3.0.1 version of the service. For details, see https://www.ibm.com/support/pages/node/5693522.
Jupyter Notebooks with R 3.6 cpd-3.0.1-runtime-addon-r36-patch-2 Install the patch after you install the 3.0.1 version of the service. For details, see https://www.ibm.com/support/pages/node/5693516.
Master Data Connect 1.0.0.0 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Master Data Connect.

MongoDB 3.0.2 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.0.2 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see MongoDB.

Open Source Management 1.1.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Open Source Management.

Planning Analytics 3.0.1 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.0.1 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see Planning Analytics.

Regulatory Accelerator 3.0.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Regulatory Accelerator.

RStudio® Server with R 3.6 3.0.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see RStudio Server with R 3.6.

SPSS® Modeler 3.0.2 A new version of the service was released.

If you have a previous version of the service installed, you must upgrade to the 3.0.2 version of the service before you upgrade your OpenShift installation. For links to installation and upgrade instructions, see SPSS Modeler.

Streams 5.3.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Streams.

Streams Flows 3.0.1 The previously released version works on OpenShift 4.5.
For links to installation and upgrade instructions, see:
Virtual Data Pipeline 8.1 The previously released version works on OpenShift 4.5.
For links to installation and upgrade instructions, see:
Watson AIOps 2.0.0 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson AIOps.

Watson Assistant 1.4.2 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Assistant.

Watson Assistant for Voice Interaction 1.0.6 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Assistant for Voice Interaction.

Watson Discovery 2.1.3 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Discovery.

Watson Knowledge Catalog 3.2.0 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Knowledge Catalog.

Watson Knowledge Studio 1.1.2 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Knowledge Studio.

Watson Language Translator 1.1.2 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Language Translator.

Watson Machine Learning
  • For x86-64: cpd-3.0.1-wml-patch-3
  • For POWER: cpd-3.0.1-wml-patch-2
Install the appropriate patch for your environment after you install the 3.0.1 version of the service. For details, see https://www.ibm.com/support/pages/node/5693732.
Watson OpenScale 3.0.1 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson OpenScale.

Watson Speech to Text 1.1.4 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Speech to Text.

Watson Studio cpd-3.0.1-wsl-patch-2 Install the patch after you install the 3.0.1 version of the service. For details, see https://www.ibm.com/support/pages/node/5693750.
Watson Text to Speech 1.1.4 The previously released version works on OpenShift 4.5.

For links to installation and upgrade instructions, see Watson Text to Speech.

Platform enhancements

The following table lists the new features that were introduced in Cloud Pak for Data Version 3.0.1:

What's new What does it mean for me?
A more useful home page The Cloud Pak for Data home page has been completely redesigned:
  • The getting started experience has been merged with the home page so that you have a single point of entry to the platform.

    The links that you need to get up and running are at the top of the page. The links help you identify common tasks that are associated with your role. If you don't need the links, you can easily hide them so that you can see more of the cards on the home page.

    Animated gif that shows how simple it is to collapse the links.
  • The home page includes more useful cards that give you access to your notifications, recent projects, recent analytics dashboards, requests, and more.
    Screen capture of the recent projects list.
    Screen capture of the notifications list

    The cards that are displayed depend on the services that are installed on top of Cloud Pak for Data.

  • The links you need, at your fingertips.
    The home page includes quick links to commonly used features, such as projects, catalogs, analytics dashboards, and more.
    Screen capture of the Quick navigation feature

    The links that are displayed depend on the services that are installed on top of Cloud Pak for Data.

New format for custom KPI cards The custom cards REST API has been updated to provide new and updated templates for custom key performance indicators. The cards blend seamlessly with the new home page design and offer more ways to display your data.

For details, see Creating custom cards.

New look and feel The Cloud Pak for Data platform has adopted Carbon, IBM's open source design system for digital products and experiences.

Carbon makes it easier for you to know exactly how the web client will behave. With Carbon, you can spend less time learning how to use the platform and more time putting the services to use to accelerate your business.

Customized for your company Make Cloud Pak for Data a part of your business. You can add your own branding to the Cloud Pak for Data web client by:
  • Specify a custom name to display in the web client
  • Adding your own logo to the home page

For details, see Customizing the branding of the web client.

Access to even more of data Augment your analytics and AI with external data sets. Cloud Pak for Data includes access to numerous data sets that can help you address common business problems. For example, there are data sets that help you use weather data to improve flight safety and data sets that help you analyze videos to determine whether employees are performing tasks correctly.

Many of the data sets are included with your purchase of Cloud Pak for Data. However, some data sets are separately priced.

You can use the external data to expand the capabilities of your models and applications. For details, see External data sets.

Support for more types of storage Cloud Pak for Data introduces support for the following types of storage:
  • Red Hat OpenShift Container Storage
  • IBM Cloud File Storage

For information on the types of storage that are supported with Cloud Pak for Data, see Storage considerations.

In addition, many services include support for additional storage types. For information about which storage types are supported for each service, see System requirements for services.

More ways to audit your system Cloud Pak for Data offers additional mechanisms for audit your environment.
In addition to the existing IBM Guardium integration, which enables you to audit sensitive data on remote database, you can also:
  • Integrate Cloud Pak for Data with your security information and event monitoring system (SIEM) software to monitor access to the platform.
  • Deploy the Guardium External S-TAP service to audit containerized database on your Cloud Pak for Data system.

For details, see Auditing Cloud Pak for Data.

Integration with IBM Cloud Platform Common Services IBM Cloud Platform Common Services are optional, foundational services that can be shared by multiple products that are installed on the same Red Hat OpenShift cluster.
Cloud Pak for Data Version 3.0.1 includes support for the following common services:
License Service
Use this service to measure your Virtual Processor Core (VPC) usage data so that you can stay within your license terms.
IAM Service
Use this service to create a single point of entry for access management and single sign-on (SSO).

For details, see Integrating with Cloud Platform Common Services.

More precise administrative permissions Cloud Pak for Data includes additional administrative permissions that enable you to give more specific access to administrative users.
You can now give users one or more of the following permissions:
  • Configure authentication
  • Configure platform
  • Manage users
  • Monitor platform

If you give a user all of these permissions, it is equivalent to giving the user the Administer platform permission.

For details, see Permissions.

Backup and restore utility Ensuring that your IBM® Cloud Pak for Data system is prepared for loss of data or unplanned downtime is one of the most important steps you can take.
The cpdbr utility enables you to backup and restore persistent volumes that are associated with Cloud Pak for Data:
  • You can create offline volume backups on a local disk or an S3 or S3-compatible object store.
  • You can create volume snapshots. (Portworx only.)

For details, see Backup and disaster recovery.

The documentation also includes information on how to plan for disaster recovery.

Translated interfaces Many of the services that are included with Cloud Pak for Data are now available in the following languages:
  • Brazilian Portuguese
  • French
  • German
  • Italian
  • Japanese
  • Simplified Chinese
  • Spanish
  • Traditional Chinese

Several services are also available in Russian and Korean.

For details, see Language support.

Smarter global search When you perform a search in the global search bar, you now see machine-learning infused search suggestions and search results that are based on relevancy.
Simplified packaging
Enterprise Edition
The IBM Cloud Pak for Data Enterprise Edition package includes non-production parts that you can install in your development, test, or quality assurance environments. The non-production Enterprise Edition parts make it easier to transition workloads to your production environments.
Standard Edition
IBM Cloud Pak for Data Cloud Native Edition has been renamed to IBM Cloud Pak for Data Standard Edition. As with Cloud Native Edition, Standard Edition places limits on the number of virtual processor cores (VPCs) that you can have in your cluster.

Service enhancements

The following table lists the new features that are introduced for existing services in Cloud Pak for Data Version 3.0.1:

What's new What does it mean for me?
Data Refinery
New visualization charts in Data Refinery
Data Refinery introduces six new charts. To access the charts, click the Visualizations tab in Data Refinery, and then select the columns to visualize. The chart automatically updates as you refine the data.
  • Bubble charts display each category in the groups as a bubble.
  • Circle packing charts display hierarchical data as a set of nested areas.
  • Multi-charts display up to four combinations of Bar, Line, Pie, and Scatter plot charts.

    You can show the same kind of chart more than once with different data. For example, two pie charts with data from different columns.

  • Radar charts integrate three or more quantitative variables that are represented on axes (radii) into a single radial figure.

    Data is plotted on each axis and joined to adjacent axes by connecting lines. Radar charts are useful to show correlations and compare categorized data.

  • Theme river charts use a specialized flow graph that shows changes over time.
  • Time plot charts illustrate data points at successive intervals of time.

For details, see Visualizing your data.

Data Virtualization
Backup, restore, and disaster recovery
It is a best practice to back up the Data Virtualization service. You can use the cpdbr utility to back up and restore all persistent volumes that are associated with your Data Virtualization installation. The cpdbr utility performs local volume snapshots and restores the service from a local snapshot via Portworx.

For details, see Backing up and restoring Data Virtualization.

Support for additional data sources
You can connect to the following data sources in Data Virtualization:
  • Google BigQuery
  • SAP HANA
  • Amazon Redshift
  • Netezza®

For details, see Adding data sources (Data Virtualization).

Virtualize data from remote TSV files
Virtualization is now available for data that is stored in TSV files on remote data sources.

For details, see Creating virtualized tables from files (Data Virtualization).

Governance enhancements for virtual objects
Data Virtualization uses policies and business terms in Watson Knowledge Catalog to govern your virtual data. These policies consist of data protection rules with conditions that use business terms, tags, data classes etc. For example, you can use data protection rules to deny access to a virtual table or to mask data in columns of a virtual table.

Additionally, you can publish your virtual objects to Watson Knowledge Catalog.

For details, see Governing virtual data (Data Virtualization).

Improve query performance by using multiple worker nodes
You can configure the Data Virtualization service to run on multiple worker nodes. Running the service on multiple worker nodes improves query performance and enhances parallelism for query processing.

When you provision Data Virtualization, you can specify the number of worker nodes to allocate to the service.

For details, see Enabling query processing parallelism on multiple worker nodes (Data Virtualization).

Reduced footprint to provision the service
Data Virtualization has a reduced footprint when you initially provision the service.. This new footprint reduces default minimum requirements and enables a more efficient usage of service resources.

For details, see Preparing to install the service (Data Virtualization).

DataStage
DataStage Enterprise and DataStage Enterprise Plus
You can choose from two different DataStage offerings: DataStage Enterprise and DataStage Enterprise Plus.
Use DataStage Enterprise for all the benefits of classic DataStage, designing and running data flows that move and transform data. Use DataStage Enterprise Plus to get all the capabilities of DataStage Enterprise with additional features for data quality. These features include:
  • Cleansing data by identifying potential anomalies and metadata discrepancies.
  • Identifying duplicates by using data matching and probabilistic matching of data entities between two data sets.
Automatic job generation
You can use a new job generation feature that automates and simplifies data movement, taking data from a source (X) and moving it to a target (Y). Rather than having to craft a job to move this data by using the DataStage Edition canvas and palette, and then dragging connectors and stages to build a job, you can use the new job template to follow a simple series of steps and generate a parallel job. You can then use this parallel job to move the data. Also, if you are an administrator you can define target rule sets that define best practices and have users apply those to their job templates and generated jobs.

For details, see Generating a job by using a template.

Edit column metadata
You can define and edit column metadata. For example, for a column, you can specify which delimiter character should separate text strings or you can set null field values. With the ability to edit column metadata you get more fine-tuned processing of your jobs and more useful data transformation.

For details, see Running a data transformation job.

Shared containers support
You can create a container that has its own job flow on the canvas and share it with other jobs.

For details, see Running a data transformation job.

Connectors
  • SAP OData
  • z/OS file

For details, see Supported connectors.

Stages
  • You can use the following stages:
    • Complex flat file
    • Slowly changing dimension for warehousing
  • Transformer stage enhanced to support Slowly changing dimension stage
  • Transformer stage supports tabs such as build, surrogate key, and triggers.
  • All stages support property tabs General, Stage Advanced, and Output Advanced.

For details, see Supported stages.

Enhancements for the Hierarchical stage
  • You can use the following steps in the Hierarchical stage:
    • JSON composer step
    • JSON parser step
    • REST step
    • Test assembly
    • Details inspector
    • Create/view contract libraries
    • Administration
  • Support for 10 in-build operator steps.
  • Tree-based view

For details, see Supported stages.

Decision Optimization
Upgraded optimization engines
CPLEX® V12.9 and V12.10 optimization engines are now available. When you click the Run model button in the Decision Optimization model builder, CPLEX V12.10 is used to solve the model.
More data formats
You can now import data in other formats (for example XLS, JSON and so on), as well as CSV files or connected files, in the Prepare data view of the Decision Optimization builder.

For details, see Prepare data view.

Customize the deployment runtime
The Decision Optimization runtime used in deployment can be extended to include other APIs.

For details, see ExtendWMLSoftwareSpec notebooks in the DO-samples repository on the Decision Optimization GitHub site.

Exported notebooks contain all files
When you export a notebook from the Decision Optimization model builder from a model containing multiple files, the additional files are also exported with the model.

For details, see the information on the Generate a notebook from a scenario option in Scenario panel.

Db2
Support for new storage types
The Db2 service supports additional types of storage.

For details, see System requirements for services.

Separate storage types for metadata and user data
The Db2 service now uses different storage classes for metadata and user data.

For details, see Configuring database storage for Db2.

Persistent volume claim templates
You can define a volumeClaimTemplate so that a new persistent volume claim is automatically created for each replica.

For details, see Configuring database storage for Db2.

Automated failover for HADR
The Db2 service includes support for automated failover of Db2 high availability disaster recovery (HADR) by using a mechanism called a governor.

Also, during a new deployment of the primary and standby HADR databases with the web console, you can specify that the service endpoints that are required to transport HADR traffic are automatically created as part of the deployment.

For details, see Db2 high availability disaster recovery (HADR).

Db2 Warehouse
Support for new storage types
The Db2 Warehouse service supports additional types of storage. The types of storage that are available depend on whether you are deploying with the symmetric multiprocessing (SMP) architecture or the massively parallel processing (MPP) architecture.

For details, see System requirements for services.

Separate storage types for metadata and user data
The Db2 Warehouse service now uses different storage classes for metadata and user data.

For details, see Configuring database storage for Db2 Warehouse.

Persistent volume claim templates
You can define a volumeClaimTemplate so that a new persistent volume claim is automatically created for each replica.

For details, see Configuring database storage for Db2 Warehouse.

Execution Engine for Apache Hadoop
New connection type
You can use the new Impala via Execution Engine for Hadoop connection to browse, preview, and refine data in Impala tables.
R scripts
You can schedule jobs for R scripts on remote Hadoop clusters using Execution Engine for Hadoop. For details, see R scripts.
Customizing Spark runtime settings
Users can now fine tune their Spark session using user-defined session variables. The Hadoop administrator must first determine what options to modify and the default values or ranges that are allowed. Once the system is set up, users can then create an environment with these options to run their jobs or launch their notebook. For more details, see Hadoop environments and Configuring the Hadoop integration with Cloud Pak for Data.
Updated CDH support
Execution Engine for Hadoop is now supported on CDH version 6.1, 6.2, and 6.3
Financial Crimes Insight The latest version of the Financial Crimes Insight service includes:
  • Hadoop 3.0 support.
  • Transaction List Screening multi-pipeline support.
  • New performance visualization.
  • Narrative Generation Capabilities to allow customizable templates and point-in-time data report.
  • Enhancements to Policy Admin tool to support nested policies.
  • Voice processing using Watson Speech to Text.
Open Source Management
Understand your open source landscape
The Open Source Management includes a dashboard that gives you insight into how you use open source packages across your organization. For example, the dashboard lets you see:
  • The number of projects that use open source packages
  • The number of packages that use any given type of open source license
  • The number of packages that are affected by vulnerabilities
  • The number of times users interact with open source packages
  • And so much more

For details, Open Source Management.

RStudio Server with R 3.6
Git integration to collaborate on R Shiny® apps and R scripts
You can configure RStudio to use a Git repository, which enables you to collaborate on R Shiny apps and R scripts in RStudio. When you are ready to use your apps and scripts, you can pull them from the Git repository and add them to an analytics project. The apps and scripts become project assets, which you can deploy by promoting them to deployment spaces.

For details, see Analyzing data with RStudio.

R scripts
You can create R scripts in RStudio from an analytics project that is integrated with a Git repository. You can schedule jobs for R scripts on local and remote Hadoop clusters and pass parameters for running the jobs. You can also include R scripts in a project export.

For details, see R scripts.

SPSS Modeler
Support for SAV files
You can now import or export SPSS Statistics SAV files from SPSS Modeler.
New cross-validation options
New cross-validation options are available in the Auto Classifier node and the Auto Numeric node.
Streams
Reduced footprint to provision the service
Streams has a reduced footprint when you initially provision the service, resulting in decreased costs because of reduced vCPU, RAM, and disk requirements.

For details, see System requirements for services.

Streams flows code
For users who want to understand and enhance Streams flows Python source code, flows are now available for download and export to notebook, where they can be run, tweaked, and expanded with advanced API features. Look for the download code and export to notebook buttons on the flow editor toolbar.
Speech To Text Gateway
Streaming data applications can ingest audio data and transcribe into text by using the Watson Speech to Text service. Audio data is either read from files for batch workloads or streamed through a telephony infrastructure for real-time workloads, such as call center support.
Integration with Watson Machine Learning
Streaming data applications can use the online scoring functionality of Watson Machine Learning.
Watson AIOps AI Manager

Watson AIOps 2.0.0 is now composed of the following components: AI Manager, Metric Manager, Event Manager, and Topology.

AI Manager for Watson AIOps 2.0.0 now supports application groups. Application groups are a means for isolating streams of data from one another. You can now group multiple SRE groups alongside multiple internal clients (for example, several business units) all within a single cluster that runs Watson AIOps 2.0. Not only does it simplify your IT footprint, but it also significantly reduces your IT costs because you need to provision only a single instance to share across many application groups.

Event grouping
  • Data validation before log anomaly training ensures that your training is not affected by quality issues in your input data sets.
  • Template-based clustering adds to the accuracy of establishing patterns by using detailed entity extraction information from log contents. Those patterns are then applied when AI Manager provides a holistic view of the problem through a story.
  • Trained models can now be verified and evaluated at training time and you can save the logs from evaluating your models. By evaluating your models, you can gain a higher degree of confidence in your models before your deploy them.
  • Stories can now include anomalies over a range of time.
Topology and localization
  • Legacy topologies are now handled as part of the hybrid approach to topology management. Legacy topologies are integrated with Kubernetes topologies.
  • Localization and blast radius for applications across the hybrid environments are now supported.
  • Localization and blast radius now support using historical topology as it existed at the time of the story and are available from your reported stories directly in Slack. You can now jump straight in to an interactive IBM Netcool Agile Service Manager dashboard to view where events occurred and the scope of their impact right from Slack.
  • Log readability has been improved.
Watson Knowledge Catalog
More accurate automatic term assignments during advanced data curation
Now, if you reject term assignments that are generated by the class-based and linguistic name matching services, they are also excluded when repeating the same data analysis. When you reject terms that the machine learning service provides, it learns from your actions, and generates better results in future.

For details, see Automatic term assignment (Watson Knowledge Catalog).

Enhanced data analysis during advanced data curation
  • You can speed up relationship analysis and overlap analysis by filtering out columns by name, column type, or first N candidates.
  • The delta row in the data quality violations table shows how the number of violations changed between the last two analyses. You can now view trends in data quality for each data asset. A user can quickly visualize how their Data Quality has changed over time by data asset, based on their specified time interval.
Supported data sources and connections
  • Teradata connections and connected data assets are now synchronized from the Information assets view to the default catalog.
  • Information assets for files are now synced to default catalog.
  • You can now create connection assets in catalogs to the following data sources:
    • Impala via Execution Engine for Hadoop
    • SAP OData
    • Planning Analytics (formerly known as TM1®)
  • You can now supply personal credentials for connection assets and connected data assets that require them.
Data protection rules are more powerful and flexible
You can now include classifications in criteria when you create data protection rules.

For details, see Managing data protection rules.

Automatic data class creation
You can now quickly create and assign a data class to clusters of similar columns in a data quality project.

For details, see Running a column similarity analysis (Watson Knowledge Catalog).

Migrate assets from IBM InfoSphere Information Server
You can migrate assets from IBM InfoSphere Information Server versions 11.7.1.x to the Watson Knowledge Catalog service on Cloud Pak for Data Version 3.0.1.

For details, see Migrating data from IBM InfoSphere Information Server to IBM Cloud Pak for Data (Watson Knowledge Catalog).

Watson Machine Learning
AutoAI improvements
  • AutoAI includes more powerful visualizations, as well as support for larger data sources and more estimators. Additionally, you can now create a batch deployment for an AutoAI model.

    For details, see AutoAI overview.

  • Tech preview You save an AutoAI experiment pipeline as an auto-generated notebook so that you can audit the transformations that created the pipeline. The documentation in the notebook and in the autoai-lib reference helps you interpret the results.

    For details, see Saving an AutoAI generated notebook.

More batch deployment options
  • Batch deployments have been significantly enhanced to allow for the programmatic connection to an expanded set of data sources. For example, connect to Db2 for an SPSS deployment.
  • Create a batch job that accepts multiple inputs from Db2 or DashDB for an SPSS deployment.
  • You can create batch deployment jobs so you can schedule runs or change the runtime environment for a deployment.

For details, see Batch deployment details.

More deployment options
  • New deployment support for a wider variety of frameworks and assets helps you deploy assets your way.

    For details, see Supported frameworks.

  • Update an online deployment with a better performing model from an existing endpoint.

    For details, see Creating an online deployment.

  • You can promote data connections as well as data assets for use with deployments and scripts.

    For details, see Deploying assets.

  • You can deploy Python scripts and R Shiny apps to further extend your deployment and model management options.

    For details, see Deploying assets.

More import and export options
  • New import and export capabilities make it easier for you to share deployment spaces and deployed content. You can export a space and all of its assets to a file or import a space and all of its assets from a file.
  • Share models more easily with the capability of importing a model in PMML format.

For details, see Analytics deployment spaces.

Watson OpenScale
Context-sensitive help pane
The context-sensitive help pane provides help that is based on what you are doing. The results are displayed inside the user interface for added convenience.
Model risk management
  • Integrates with IBM OpenPages governance, risk, and compliance platform
  • Supports pre-production model environments
  • Ability to compare two models
Watson Studio
Customization options for accessing custom libraries
You can customize the software configuration of Jupyter notebook environment definitions in Watson Studio by using conda channels and pip to access custom packages and libraries. If access to the public network is not available or desired for security reasons, you can customize the conda and pip configuration to access libraries by alternate methods, or you can build your own custom images.

For details, see Customizing environments.

Python Scripts
When working in JupyterLab, you can now share your work in Python scripts as well as in notebooks. You can collaborate through the Git repository associated with your project and later pull changes back to the project to create a script asset. You can schedule jobs for scripts and pass parameters for running the jobs. You can promote scripts to deployment spaces and include them in a project export.

For details, see JupyterLab.

Supported data sources and connections
  • Impala via Execution Engine for Hadoop
  • OData
  • Planning Analytics (formerly known as TM1)
  • SAP OData

Many of the services that supplement Watson Studio also have new features. For details, see the rows for those services.

New services

The following table lists the new services that are introduced in Cloud Pak for Data Version 3.0.1:

Category Service Pricing What does it mean for me?
Analytics Planning Analytics Separately priced Easily create more accurate plans, budgets, and forecasts using data from across your business.

A good plan starts with good data. Ensure that your plans are based on data from across your business with IBM Planning Analytics powered by TM1.

Planning Analytics is an AI-infused solution that pulls data from multiple sources and automates the creation of plans, budgets, and forecasts. Planning Analytics integrates with Microsoft Excel so that you can continue to use a familiar interface while moving beyond the traditional limits of a spreadsheet. Infuse your spreadsheets with more analytical power to build sophisticated, multidimensional models that help you create more reliable plans and forecasts.

The Planning Analytics service includes:
  • Easy-to-use visualization tools
  • Built-in data analytics and reporting capabilities
  • What-if scenario modeling that helps you understand the impact of your decisions
Learn more
Data governance Guardium External S-TAP Included with Cloud Pak for Data

IBM Guardium External S-TAP is a component of Guardium that works with databases that are hosted on Cloud Pak for Data. The service provides compliance monitoring and data security.

You can install and configure the External S-TAP service in high-availability mode to intercept TCP/IP traffic (plain-text or encrypted) between Cloud Pak for Data users and database services. The intercepted traffic is sent to the Guardium collector for parsing, policy enforcement, logging, and reporting.

To use the External S-TAP service, you must be entitled to use IBM Guardium Data Protection.

Learn more
Data governance Master Data Connect Included with Cloud Pak for Data

Power® your business applications with trusted master data.

Master data is the high-value, core information that supports critical business processes across your enterprise. Master data is at the heart of every business transaction, application, and decision.

IBM InfoSphere Master Data Management acts as a central repository to manage, store, and maintain master data across your organization. IBM InfoSphere MDM
  • A consolidated, central view of an organization’s key business facts.
  • The ability to manage master data throughout its lifecycle.

The Master Data Connect service uses a RESTful API to provide geographically distributed users with fast, scalable, and concurrent access to your organization's most trusted master data from IBM InfoSphere MDM. By making your trusted master data available to business applications, you can capitalize on the benefits that master data brings.

Users and systems can use the Master Data Connect API to access and search master data, enabling your mobile and online applications to access trusted master data in a timely and efficient way. For example, you can improve your sales processes by using Master Data Connect as a real-time provider of master data for Salesforce.com.

Learn more
Data source Db2 Big SQL Included with Cloud Pak for Data

Use standard SQL to query your data on Hadoop or Object Stores with Db2 Big SQL

Db2 Big SQL is an advanced query service that makes it easy to analyze data in object stores or Hadoop using ANSI SQL that is optimized for advanced analytics in big data environments. Powerful Db2 open source technologies are the prime driver for machine learning, interactive, ad hoc, and batch analytics use cases on open source file formats stored on Hadoop and object stores.

The Db2 Big SQL service offers the following features:
Quick access to data
Query the data in object stores or Hadoop without having to setup and manage any servers or data warehouses. Querying is simple. Just point to your data where it resides.
The power of Db2
Enterprise grade SQL engine, Db2, is optimized for a variety of open source data formats, including ORC, Parquet, Avro and CSV.

You can use Db2 Big SQL for ad-hoc queries or complex queries, like large joins, window functions, and so on.

Fast and scalable
With its advanced SQL processing capabilities, Db2 Big SQL returns results quickly. In addition, the service can balance requests when multiple users run queries simultaneously.

In addition, you can easily scale the compute resources allocated to Db2 Big SQL based on your workloads.

Simplicity
With Db2 Big SQL, you don't need to move data or rewrite applications. If the data is already in big data stores, simply point the service to the data. You can reuse your existing applications and tools.

With data sizes ranging from gigabytes to petabytes, business analysts or data scientists run interactive queries to explore and understand data before building models or charts. With its robust scalability and performance, Db2 Big SQL empowers users and applications to unlock insights from data with the analytics tools of your choice, while achieving high concurrency for business intelligence workloads by running complex queries more efficiently.

Learn more
Data source Db2 Data Gate Included with Cloud Pak for Data Extract, load, and synchronize mission-critical Db2 for z/OS data for high volume transactional or analytic applications.
The service propagates your Db2 for z/OS data to a Db2 Warehouse or Db2 database on Cloud Pak for Data. Through its high throughput and low latency synchronization technology, the service provides:
  • Efficient access to your enterprise data for high volume transactional or analytic workloads.
  • Near real-time data synchronization without degrading the performance of your core business transaction engine.
  • Reduced cost and complexity for developing and operating applications that access data in the cloud.
  • Accelerated journey to the cloud and AI.
Learn more
Data source Virtual Data Pipeline Separately priced Access all the data you need for analytics and application testing without impacting production databases.
Your production databases are critical for running your business, so you don’t want to overload them with too many requests. At the same time, your users need access to that data to drive business results. With IBM InfoSphere Virtual Data Pipeline, your users can instantly provision virtual database copies that they can use to work with near real-time data for:
  • Data analytics
  • Application testing
  • AI model training and testing
  • Data virtualization

Each virtual database copy can be refreshed to any point in time in a matter of minutes and can be masked to protect sensitive data. In addition, virtual database copies use almost no storage, so you save on storage costs.

Give your users access to production data without impacting priority workloads or compromising data security and privacy. Get started with the Virtual Data Pipeline service to accelerate your analytics and modernize your applications.

Developer tools Anaconda Repository with IBM Cloud Pak® for Data Separately priced Control and administer the software packages that data scientists can use in Jupyter notebooks and JupyterLab in Watson Studio analytics projects.

Data scientists in analytics projects can create custom environment definitions that include the conda channels and packages from the repository and then use those environments to run Jupyter notebooks and scripts.

With the Anaconda Repository with IBM Cloud Pak for Data service, you can access more than 7,500 open-source packages (Conda-Forge, CRAN, PyPI) from your central enterprise repository and add your own proprietary packages. Get Conda package updates in real time, as they are released.

Block, exclude, and include packages according to your enterprise standards. Control which packages your team can download and who can access them. Keep vulnerabilities and unreliable software out of your data science and machine learning pipeline and manage dependent packages.

Industry accelerators

Industry accelerators are available from the Cloud Pak for Data community. Each industry accelerator includes a set of artifacts that help you address common business issues. The following accelerators were recently released for Cloud Pak for Data:

What's new What does it mean for me?
Demand Planning Manage thermal systems to produce accurate energy volumes based on anticipated demand and energy generation.

For details, see Demand planning on the Cloud Pak for Data Community.

Manufacturing Analytics with Weather (using SPSS and Cognos) Use machine-learning models and The Weather Company data to understand the impact that weather has on failure rate. Identify actions that you can take to save time and money.

For details, see Manufacturing Analytics with Weather on the Cloud Pak for Data Community.

Retail Predictive Analytics with Weather (using SPSS and Cognos) Use machine-learning models and The Weather Company data to understand how a retail inventory manager, marketer and retail sales planner can quickly determine the optimal combination of store, product, and weather conditions to maximize revenue uplift, know what to keep in inventory, where to send a marketing offer, or provide a future financial outlook.

For details, see Retail Predictive Analytics with Weather (using SPSS and Cognos) on the Cloud Pak for Data Community.

Sales Prediction using The Weather Company Data Use machine-learning models and The Weather Company data to predict how weather conditions impact business performance, such as prospective sales.

For details, see Sales Prediction using The Weather Company Data on the Cloud Pak for Data Community.

Telco Churn Predict a given customer's propensity to cancel their membership or subscription and recommend promotions and offers that may help retain the customer.

For details, see Telco churn on the Cloud Pak for Data Community.

Utilities Customer Attrition Prediction Discover why your customers are leaving.

For details, see Utilities Customer Attrition Prediction on the Cloud Pak for Data Community.

Utilities Customer Micro Segmentation Divide a company's customers into small groups based on their lifestyle and engagement behaviors.

For details, see Utilities Customer Micro Segmentation on the Cloud Pak for Data Community.

Utilities Demand Response Program Propensity Identify which customers should be targeted for enrollment in the Demand Response Program.

For details, see Utilities Demand Response Program Propensity on the Cloud Pak for Data Community.

Utilities Payment Risk Prediction Identify which customers are most likely to miss their payment this billing cycle.

For details, see Utilities Payment Risk Prediction on the Cloud Pak for Data Community.

Offering packages

Cloud Pak for Data Version 3.0.1 introduces new offering packages, each of which includes the Cloud Pak for Data entitlements that are required to run the service. The following packages are available in this release:

What's new What does it mean for me?
IBM Cloud Pak for Data Planning Analytics The Planning Analytics service is included in the IBM Cloud Pak for Data Planning Analytics bundle.

For more information, see the description of the new Planning Analytics service.

IBM Cloud Pak for Data Virtual Data Pipeline The Virtual Data Pipeline service is included in the IBM Cloud Pak for Data Virtual Data Pipeline bundle.

For more information, see the description of the new Virtual Data Pipeline service.

Installation enhancements

What's new What does it mean for me?
Support for Red Hat OpenShift Version 4.3 Cloud Pak for Data Version 3.0.1 can run on either:
  • Red Hat OpenShift Container Platform Version 3.11
  • Red Hat OpenShift Container Platform Version 4.3

For more information about Red Hat OpenShift Container Platform, see System requirements for IBM Cloud Pak for Data.

If you are currently running Cloud Pak for Data Version 2.5 on Red Hat OpenShift Container Platform Version 3.11 and want to migrate to Cloud Pak for Data Version 3.0.1 on Red Hat OpenShift Container Platform Version 4.3, see Migrating Cloud Pak for Data data from Red Hat OpenShift Version 3.11 to Version 4.5.

Support for IBM POWER hardware You can install the Cloud Pak for Data control plane and some services on IBM POWER hardware.

For a list of the services that you can install on IBM POWER hardware, see hardware requirements in the System requirements for services topic.

Upgrade You can run the cpd command-line interface to upgrade the Cloud Pak for Data control plane and many of the services that support the cpd command-line interface.

Deprecated features

What's changed What does it mean for me?
Monthly virtual core usage and targets You can no longer use the Manage platform page in the web client to track the number of virtual cores that you use each month. This feature was deprecated because it did not give an accurate count of the number of virtual cores used each month.

Because this feature was deprecated, the related feature that allowed you to set your target usage was also deprecated.

Object Storage Open Stack Swift (Infrastructure) connections
Impacted services:
  • Watson Studio
  • Watson Knowledge Catalog

You can no longer create connections to Object Storage Open Stack Swift (Infrastructure) from Watson Studio or Watson Knowledge Catalog. This type of connection is deprecated in Cloud Pak for Data Version 3.0.1.

If your project contains a connection to Object Storage Open Stack Swift (Infrastructure), the connection will no longer work.

Python and SPSS operators
Impacted services:
  • Streams Flows

The Python and SPSS operators are no longer supported in Streams Flows. The WML Deployment operator replaces both of these operators.

For details on fixing flows that contain these operators, see Troubleshooting a streams flow.

Previous releases

Looking for information about what we've done in previous releases? See the following topics in IBM Knowledge Center: