Table of contents

What's new in IBM Cloud Pak for Data ?

See what new features and improvements are available in the latest release of IBM® Cloud Pak for Data .

Quick links

What's new in Version 2.5

IBM Cloud Pak for Data Version 2.5 introduces some major changes. Previously, when you installed Cloud Pak for Data, several services were installed by default. However, starting with Cloud Pak for Data Version 2.5, you can choose the services that you want to run on Cloud Pak for Data. This means you are running only the services that are important for your line of business. In addition, starting with Version 2.5, Cloud Pak for Data can be deployed only on Red Hat OpenShift.

You might notice that the term add-ons has been replaced by the term services. The list of services in the catalog has grown, and the pricing of some services has changed. For example, services such as Watson™ OpenScale and Regulatory Accelerator are now included in your purchase of Cloud Pak for Data.

Platform enhancements

The following table lists the new features that were introduced in Cloud Pak for Data Version 2.5:

What's new What does it mean for me?
More modular Don't waste precious resources running services that you don't need. Install only the services that support your line of business.

Previously, when you installed Cloud Pak for Data, the following features were also installed: data governance, data transformation, data science, and analytics dashboards. However, starting with Cloud Pak for Data Version 2.5, you can choose the services that you want to run on Cloud Pak for Data.

For information about the platform, see Architecture for IBM Cloud Pak for Data.

For a list of the services that you can install on Cloud Pak for Data, see Services in the catalog.

New insights on your home page Keep an eye on what's important.
DataStage® Edition users
When you install the DataStage Edition service, the Transformation job status card is added to your home page. This card lets you quickly determine whether things are running smoothly or whether you need to intervene with failing data transformation jobs to get business critical processes back on track.

If you bookmark your most important transformation jobs, you can optionally set the card to display your bookmarked jobs.

Customizable key performance indicator cards Turn your Cloud Pak for Data home page into a one-stop shop for your users. Create analytics dashboards that are customized to specific types of users in your organization, from data scientists who want to monitor the health of their models to executives who need to make data-driven decisions.

See Custom KPI cards.

Integrated monitoring, metering, and serviceability
Manage the Cloud Pak for Data platform
A Cloud Pak for Data administrator can see the resource usage for the platform and for installed services. The redesigned interface provides a clear picture of your current resource usage. As an administrator, you can:
  • See which services are deployed
  • Quickly determine how much CPU and memory each service is currently using compared to resources that the service has reserved
  • Track the number of virtual cores you are using each month
  • Specify your target virtual core usage to keep your costs under control

From the interface, you can also:

  • Download usage reports
  • Diagnose problems
  • Delete resources
  • Restart pods

See Managing deployments.

Resolve problems more efficiently
If you encounter a problem that you can't resolve yourself, use the Gather diagnostics interface to collect log files. You can collect the logs for all of the services that are running on the platform or for specific services. Whether the problem just started or has been ongoing for the past month, you can gather the correct set of logs that you can download and send to IBM Support when you open a case.

See Gathering diagnostic information.

Published and extensible APIs
An open and extensible platform
Cloud Pak for Data and several of the services that run on the platform provide open and extensible APIs that enable you to collect, organize, and analyze your data and to infuse your applications with AI. With this collection of REST APIs, you can:
  • Automate and govern your AI lifecycle in business applications so that you can operationalize AI
  • Implement data-driven processes and operations that feed your AI and ML applications
Platform APIs
The platform APIs are provided by the Cloud Pak for Data control plane. These APIs enable you to connect to your Cloud Pak for Data deployment, manage the users who have access to the deployment, and manage user roles.
Service APIs

Depending on the services that you install, you can also use the following APIs:

  • Collect and Organize APIs - When you install Watson Knowledge Catalog, you can access APIs that enable you to manage connections in analytics projects, search for data, and govern your data.
  • Analyze APIs - When you install Watson Studio, you can access APIs that enable you to build, train, and deploy analytical models and neural networks.
  • Infuse APIs - When you install Watson OpenScale, you can access APIs that help you track and measure the outcomes of your AI models to ensure that they remain fair, explainable, and compliant.

In addition, some separately priced services, such as Watson Assistant, Watson Discovery, and Watson API Kit provide a rich set of APIs

See Available APIs

Developer code patterns An IBM Developer learning path is available to help you get more out of the Cloud Pak for Data APIs. As a developer, you can use the patterns, tutorials, and code to learn how to complete tasks such as:
  • Virtualizing Db2® Warehouse data with Data Virtualization (Tutorial)
  • Visualizing data with Data Refinery (Tutorial)
  • Using notebooks to analyze data and build and deploy models with Watson Machine Learning (Pattern)
  • Monitoring models with Watson OpenScale (Pattern)

For details, see https://developer.ibm.com/series/cloud-pak-for-data-learning-path/

Service enhancements

The following table lists the new features that are introduced for existing services (add-ons) in Cloud Pak for Data Version 2.5:

What's new What does it mean for me?
Cognos® Analytics
Sophisticated time series forecasting
When you create a visualization, select the forecasting option and let Cognos Analytics do the heavy lifting. Cognos Analytics automatically evaluates multiple algorithms to select the optimal model based on your data and then displays the resulting forecast. You can examine the input parameters and model to ensure that you have a clear understanding of how Cognos Analytics generated the statistically accurate forecast. (Available in select visualizations in Dashboard and Explore.)
Additional visualizations
  • Add custom visualizations to the library of visualizations that are available in Dashboard and Reporting. Create your own visualizations or use existing visualizations that are provided by open source libraries, such as D3 and Highcharts. (Available in Dashboard and Reporting.)
  • Understand the cumulative effect of sequentially introduced positive or negative values with a waterfall visualization. (Available in Dashboard.)
  • Track and examine your results compared to your targets with the KPI widget. (Available in Dashboard.)
Data Virtualization
Improve query performance with data caches
Your users want their data as quickly as possible. If your queries take a long time to run but your data doesn't change constantly, you can cache the results of queries to make your queries more performant.

See Managing data caches and queries (Data Virtualization).

Discover remote data sources automatically
Use automated discovery to simplify the process of connecting your Data Virtualization service to your data sources, so that you can spend more time creating virtual views and tables.

See Discovering remote data sources (Data Virtualization).

Deployment on multiple worker nodes
When you provision the Data Virtualization service, you can specify the number of nodes on which you want to run the service.

See Provisioning the service (Data Virtualization).

Backup and restore
It is a best practice to back up the Data Virtualization service. To this end, the Data Virtualization service provides a Kubernetes job called dv-backup-job.yaml that automatically performs all the required tasks to back up the service. To restore the service from a backup, use the dv-restore-versioned-job.yaml Kubernetes job.

See Backing up and restoring Data Virtualization.

Support for additional data sources
You can connect to the following data sources in Data Virtualization:
  • Snowflake
  • Z data sources support through Data Virtualization Manager:
    • VSAM
    • IMS
    • CICS
    • Adabas
Learn more
Data Refinery Starting in Cloud Pak for Data Version 2.5, this service is no longer separately installable. The Data Refinery service is installed when you install either the Watson Knowledge Catalog service or the Watson Studio service.

The latest version of Data Refinery includes features such as:

Over 20 new GUI operations
Data Refinery has new operations to shape data, including Aggregate, Conditional replace, Join, Remove stop words, and Tokenize.

See GUI operations in Data Refinery.

Automatic detection and conversion of data types
Now, the first step in each Data Refinery flow uses the Convert column type GUI operation to automatically detect and convert data types to inferred data types (for example, to Integer, Date, Boolean, etc.) as needed. This enhancement saves you time, particularly when your data has many columns. It is easy to undo the automatic conversion or to edit the operation for selected columns.
New and improved Visualization charts
Data Refinery introduces new visualization charts with a new user interface that gives you more views of your data to better help you explore your data as you refine it. No syntax required.

To access the charts in Data Refinery, click the Visualizations tab, then select the columns to visualize.

See Visualizing your data in Data Refinery.

Improved flow execution with jobs
Now you can use the jobs user interface to run or schedule a Data Refinery flow. You can view all the jobs in a project and their run details. You can also create multiple jobs for the same asset, for example a job with different runtimes or different schedules. In addition, you can now create a job from Data Refinery.
More runtime choices
You can run Data Refinery flows with three different kinds of runtimes: Default Data Refinery XS runtimes for small data sets; Spark runtimes for larger data sets; and Hadoop cluster runtimes for data on Hadoop Distributed File System (HDFS) or data stored in tables in a Hive warehouse.
DataStage Edition Starting in Cloud Pak for Data Version 2.5, this service is separately priced and is available only in the IBM Cloud Pak for Data DataStage package.

The latest version of DataStage Edition includes features such as:

New connectors and operators
You can use the following connectors and operators:
  • Classic Federation
  • Distributed Transactions
  • Informix Load
  • ISD Input
  • ISD Output
  • Pivot
  • Sybase 12 Load
  • Sybase IQ

See Supported connectors.

Multi-cloud data integration
If your data exists in multiple cloud environments, you can integrate the data with better performance by running the transformation job in the cloud where the data exists.

You can create and test a job on-premises then run it in a cloud environment, such as an Azure instance, making use of the on-cloud Azure data lake. The job parameters and their values are passed to a remote instance of IBM® InfoSphere® Information Server by way of a Kafka message.

See Multi-cloud data integration.

Local containers to simplify complex data integration jobs
A local container enables you to visually simplify complex jobs by grouping logically related steps together. For example, you might use a local container to describe several stages in a process, such as a Sort stage, followed by a Filter stage, followed by a Remove Duplicates stage.

See Running a data transformation job.

Learn more
Decision Optimization
Import for models with multiple files
You are no longer limited to models that contain a single file. For example, if your model contains multiple OPL model files, you can now import the model in the Run model view.
Ability to create and edit OPL models
Previously, you could only import an OPL model. Now you have the option to edit OPL model files or create them from scratch in the model builder.
Support for connected data
In previous releases, you could use only CSV files when you prepared data. Now you can also prepare data from connected data sources, making it easier to work with the data you need.
SPSS® Modeler
More ways to analyze your data
SPSS Modeler includes new nodes that you can use to analyze your data:
  • Text Analytics nodes
  • Charts nodeGaussian
  • Mixture node
  • GLMM node
  • KDE node
  • Sim Gen node
  • Reproject node
  • Space-Time-Boxes node
  • Spatio-Temporal Prediction (STP) node

See Nodes palette.

Better performance when previewing nodes
Previously, when you right-clicked a node and selected Preview, multiple tabs opened to enable you to examine the data in your flow in multiple ways. Now, when you select Preview you get a snapshot of your data that loads more quickly. If you need to work with all of the features, such as the Visualizations tab, use the new Profile option.
Better performance during data preparation and data mining
You can use SQL pushback to run data preparation and data mining operations in the database where the data resides.

See SQL optimization.

Export node conversion
Previously, when you imported a stream (.str) that was created in SPSS Modeler Subscription or SPSS Modeler client, only your import nodes were converted. Now your export nodes are also converted, making it easier to work with your existing stream in SPSS Modeler.

See Migrating import and export nodes.

Customize operations in a flow
If you need to customize the operations in a flow or a SuperNode, you can set the parameters by providing a script or by entering the parameters as properties. For example, you can use this feature to specify that terminal nodes in a flow run in a specific order. If you specify parameters for a SuperNode, the parameters are visible only to the nodes that are encapsulated within that SuperNode.
Learn more
Streams
Simplified installation
The StreamsStreams service no longer has a dependency on Apache Zookeeper, so there are fewer steps needed to install the Streams service.
Simplified provisioning
When you provision a Streams service instance, you can choose to have the persistent volume claims dynamically provisioned. You simply specify the type of storage for the persistent volumes. (This option requires that you enabled dynamic provisioning on the cluster.)

Learn more: Prerequisites

Streams flows
Streams Flows is an easy-to-use tool to build real-time streaming applications in a drag-and-drop experience. In a streams flow, you can access and analyze massive amounts of changing data as it is created. Regardless of whether the data is structured or unstructured, you can leverage data at scale to drive real-time analytics for up-to-the-minute business decisions.

Learn more: Developing apps with Streams Flows

Enhanced Streams problem determination graph

The problem determination graph is enhanced to show connected Streams jobs, provide warnings for unconnected import operators, watches on threaded ports, additional watch commands (all, branch, upstream, downstream), and additional job commands (delete, restart all processing elements).

Learn more
Watson Assistant
Webhooks
With the new webhooks feature, you can easily configure your dialog nodes to trigger calls to an external API that you can set up to personalize answers, post transactions to internal systems, or check other APIs for answers to questions.

Learn more: It's Now Way Easier to Personalize Your Assistant

Dialog skill versions
Now it's easier to promote from development to production. With dialog skill versions, you can take a snapshot of your skill and simply have your assistant point to the new version.
Support for call center in a box
Now you can use Watson Assistant in your all-in-one call center solution.
Watson Discovery
Advanced capability for mining content
Now you can visually slice and dice large corpora of data to uncover new insight, unseen correlations, and improve business processes.
Domain knowledge for governing documents
Save your knowledge workers thousands of hours by using the out-of-the-box domain knowledge on governing documents like contracts, invoices, and purchase orders. For example, find all contracts in your system that contain a certain supplier in minutes.
Reduced development time
New components are now available for the most common user interfaces. After you ingest your data to Watson Discovery, you'll now have the pieces to have a productive application up and running in significantly less time, with less (or maybe even no) extra development.
Table extraction from unstructured documents
Now you can extract tables from unstructured documents. You can also use natural language to query those documents to find the tables.
Dynamic faceting
Get the exact table you need from a large document. Dynamic faceting automatically creates contextual facets based on your query. As you search for broad topics, you get the keys to narrow down your search by topic, without additional training.
Rapid model dictionary creation
Now, you can rapidly create a dictionary for faceting, querying or using in a business process by providing only a handful of examples. This feature cuts significant training time, and there's no need for advanced NLP experts to get involved.
Updated interface
The completely revised interface will allow you to organize applications into projects, query across multiple collections in a project, and more.
Learn more
Watson Knowledge Catalog Watson Knowledge Catalog replaces the data governance features that were previously included in Cloud Pak for Data. All of the features of Watson Knowledge Catalog are included with your Cloud Pak for Data entitlement.

Watson Knowledge Catalog provides an end-to-end solution for maintaining business-ready data that your users can easily consume in analytics projects.

  • Define business terms to build a consistent language across your business
  • Create custom data classes to make automatic data profiling more accurate
  • Use classification to help your catalog users understand how data must be handled
  • Define policies and rules to control access to data based on the content of the data

The latest version of Watson Knowledge Catalog includes features such as:

Two ways to discover your data
Watson Knowledge Catalog supports the following types of discovery:
  • Automated discovery, which enables you to run an in-depth analysis on all of the assets in a data source and automatically import the assets to the catalog. You can choose whether you also want to import the analysis results. (This is the discovery you're used to from previous releases of Cloud Pak for Data)
  • Quick scan, which enables you to quickly get a sense of your data. This scan runs on a sample of the data and the assets are imported only if you approve the results.

See Discovering assets (Watson Knowledge Catalog).

Enhanced search
Don't remember where you saw it? You can search everything you have access to. Or, you can reduce erroneous results by scoping your search to a specific project or catalog. You can also narrow your search further by specifying the type of governance artifact that you want to search for.

See Searching for assets across projects and catalogs.

Governance workflow
Ensure that your governance artifacts are thoroughly reviewed before they are published by configuring a governance workflow. A Watson Knowledge Catalog administrator can optionally configure a governance workflow that specifies the level of oversight that your organization needs to ensure that the process is responsible, accountable, consulted, and informed.

See Managing workflow definitions.

Reference data sets
Make it easier to centrally manage standards for consistency and quality with reference data sets, which define the list of permissible values within a given context. You can also include a reference data set in the definition of a data class as part of the data matching criteria.

See Reference data sets.

Watson Knowledge Studio
Dictionary suggestions
Create dictionaries faster in Watson Knowledge Studio. Even if you have only a few terms specified, Watson Knowledge Studio can analyze your uploaded documents and suggest other terms in the same domain. Use these dictionaries as machine learning preannotators or within rules to create effective and useful models.
Advanced rules editor
You can create rules-based text extractors with the Advanced Rules editor. The editor provides a graphical interface that creates Annotation Query Language rules so you can build and manage complex rule sets.
Watson Machine Learning Watson Machine Learning is enhanced with new tools and capabilities. In addition to building machine learning models using the Python client, you can now:
  • Automate model building using the AutoAI Experiment Builder.
  • Model and solve complex problems using Decision Optimization Builder
  • Use the Deep Learning Experiment Builder to run hundreds of training runs for neural network models
  • Create online, batch, and virtual deployments and manage them from analytic deployment spaces
Watson Natural Language Understanding
Text enrichment features
New text enrichment features include built-in categories, entity extraction, keyword identification, and sentiment classification. With these features you can build a wide range of applications including content recommendation, question answering, data mining, and advertisement optimization systems.
Language support
Build cognitive apps natively in these supported languages:
  • Entities, Sentiment and Keywords: English, Spanish, Italian, Korean, Japanese, Portuguese, French, Dutch, Chinese Simplified, German, Arabic
  • Categories: English, German
Watson OpenScale Watson OpenScale is now included with Cloud Pak for Data.

New features for Cloud Pak for Data Version 2.5 include:

Drift detection
(This feature is now generally available.) When the accuracy of your model or the consistency of your data drops, it can have a negative impact on the business outcomes that are associated with the model. Use the drift detection feature to determine when it's time to retrain your model. For more information, see Watson OpenScale.
Key performance indicator monitoring
(This feature is now in beta.) You can use an application monitor to keep an eye on your key performance indicators (KPIs) and to understand the impact that model metrics, such as model drift, have on the KPIs in your application. Run-on-demand options for evaluation and correlation enable better business impact measurement. For more information, see Watson OpenScale.
Updated Python SDK
A new version of the Watson OpenScale Python SDK (ibm_ai_openscale-2.1.1.16) is now available. The newest version of the SDK includes bug fixes and provides support for:
  • Drift monitoring (including status messaging)
  • Getting a count of payload records
Additional Machine Learning Frameworks
Watson OpenScale now supports the following Watson Machine Learning framework enhancements. For more information, see Watson OpenScale.
  • XGBoost frameworks: XGBoost is an open-source library that provides scalable, portable and distributed gradient boosting for decision tree algorithms.
  • AutoAI experiments: AutoAI automatically prepares data, applies algorithms, or estimators, and builds model pipelines best suited for your data and use case.

New services

Category Service Pricing What does it mean for me?
AI Watson Studio Included with Cloud Pak for Data Watson Studio replaces the data science features that were installed by default when you installed previous versions of Cloud Pak for Data.

Unleash the power of your data. Build custom models and infuse your business with AI and machine learning.

Watson Studio is a collaborative environment for data scientists, developers, and domain experts to prepare, analyze, and model data. Whether you’re an expert or a novice, you can find a tool to suit your needs from the wide range of open source, graphical canvas, and automatic builder tools.

With Watson Studio you can:

  • Prepare and visualize data from many connected data sources
  • Build models to make predictions or to classify tabular, textual, or image data
  • Run workloads in configurable and elastic runtime environments or on remote clusters

Watson Studio seamlessly integrates with the Watson Knowledge Catalog service, which acts as a source of governed, curated assets, and with the Watson Machine Learning service, which enables you to deploy and evaluate models. You can also add other services that provide additional tools in Watson Studio, such as Analytics Dashboards, SPSS Modeler, and Decision Optimization.

New features for Cloud Pak for Data Version 2.5 include:

AutoAI tool for building and deploying machine learning models without coding
When you install the Watson Machine Learning service with Watson Studio, you can use the AutoAI tool to build and deploy machine learning models without any coding. The AutoAI tool:
  • Automatically analyzes data and generates candidate model pipelines customized for your predictive modeling problem
  • Provides a leaderboard that shows the automatically generated model pipelines ranked according to the problem optimization objective
JupyterLab
More than just notebooks, JupyterLab enables you to work with your Jupyter notebooks, documents, files, text editors, and custom components in an IDE-like development framework that can be integrated with Github.
Learn more
Analytics Analytics Engine Powered by Apache Spark Included with Cloud Pak for Data Automatically spin up lightweight, dedicated Apache Spark clusters to run a wide range of workloads.

You can use Analytics Engine Powered by Apache Spark to run a variety of workloads on your Cloud Pak for Data cluster:

  • Watson Studio notebooks that call Apache Spark APIs
  • Spark applications that run Spark SQL
  • Data transformation jobs
  • Data science jobs
  • Machine learning jobs

Each time you submit a job, a dedicated Spark cluster is created for the job. You can specify the size of the Spark driver, the size of the executor, and the number of executors for the job. This enables you to achieve predictable and consistent performance.

When a job completes, the cluster is automatically cleaned up so that the resources are available for other jobs. The service also includes interfaces that enable you to analyze the performance of your Spark applications and debug problems.

Analytics Execution Engine for Apache Hadoop Separately priced

This service replaces the Hadoop integration software.

Explore data or build and deploy models on your Apache Hadoop cluster.

Use the Execution Engine for Apache Hadoop service to integrate the Watson Studio service with your remote Apache Hadoop cluster.

When the Execution Engine for Apache Hadoop service is installed, data scientists can:

  • Use familiar tools, such as Data Refinery, Jupyter Notebooks, and RStudio, to build models and then train and deploy models at scale
  • Leverage the distributed computing power of a Hadoop cluster
  • Analyze data in place
This service introduces the following improvements from the Hadoop integration software:
Build and train models faster and experience better integration
The Jupyter Enterprise Gateway service enables you to build and train models faster on Hadoop. You also experience better integration between Jupyter notebooks and Hadoop.
Access Hive data on secure Hadoop clusters
Use Data Refinery to visualize, profile and transform data in Hive tables.
Support for newer versions of Hadoop distributions
CDH 6.1, 6.2, and 6.3 are now supported.
Improvements in usability
Integration with systemctl is available to automatically start the service after a reboot.
Scripts are available to automate self-healing of the service.
Developer tools

Jupyter with Python 3.6 for GPU

Included with Cloud Pak for Data An optional development environment for Watson Studio that enables you to create Jupyter Notebooks that use GPU-accelerated Python 3.6 libraries.
Developer tools Jupyter with R 3.6 Included with Cloud Pak for Data An optional development environment for Watson Studio that enables you to create Jupyter Notebooks that use R 3.6 libraries.
Developer tools Open Source Management Included with Cloud Pak for Data Make it easy for developers and data scientists to find and access approved open source packages.

Open source enables businesses to modernize their offerings quickly and with lower costs. But if your enterprise relies on open source software packages, you know how difficult it can be to ensure that users are working with approved packages.

With the Open Source Management service, you can manage access to open source software packages at the scale of your enterprise, so that you can optimize the benefits of open source while minimizing potential risks.

With the Open Source Management service:

  • Developers and data scientists can easily locate and access approved packages
  • Developers and data scientists can submit requests for additional packages, and other users can upvote requests
  • Project managers can review vulnerabilities to assess risk
  • Users can help each other identify useful packages with ratings and reviews

Maximize your return on your open source investment.

Industry solutions Financial Crimes Insight® Separately priced Simplify the process of detecting and mitigating financial crimes with AI and regulatory expertise.

Financial Crimes Insight combines AI, big data, and automation with input from regulatory experts to make it easier to detect and mitigate financial crimes.

Install the base offering, Financial Crimes Insight, to proactively detect, intercept, and prevent attempted fraud and financial crimes. Then, install one or more of the following optional packages depending on your use case:
Financial Crimes Insight for Alert Triage
Enable analysts to quickly assess alerts using Watson analytics and cognitive capabilities to determine which alerts warrant further investigation.
Financial Crimes Insight for Claims Fraud
Uncover suspicious behavior early in the insurance claims process before fraudulent claims are paid.
Financial Crimes Insight for Conduct Surveillance
Identify, profile, and prioritize employee misconduct using holistic surveillance, behavioral analysis, and cognitive insights.

Industry accelerators

Industry accelerators are available from the Cloud Pak for Data community. Each industry accelerator includes a set of artifacts that help you address common business issues. The following accelerator was released in Cloud Pak for Data Version 2.5:

What's new What does it mean for me?

Healthcare Location Services Optimization

How far will patients travel to access quality health care? Use the Healthcare Services Location Optimization accelerator to jump-start your analysis.

For details see the Healthcare Location Services Optimization accelerator on the Cloud Pak for Data Community.

Offering packages

Cloud Pak for Data Version 2.5 introduces new offering packages, each of which includes the Cloud Pak for Data entitlements that are required to run the service. The following packages are available in this release:

What's new What does it mean for me?
IBM Cloud Pak for Data DataStage This package provides enhanced data integration through IBM DataStage Edition, which enables users to move and transform data between operational, transactional, and analytical target systems.

Starting in Cloud Pak for Data Version 2.5, the DataStage Edition service is separately priced and is available only in the IBM Cloud Pak for Data DataStage package.

IBM Cloud Pak for Data Watson Studio Premium This package provides a suite of tools for data scientists:
  • SPSS Modeler, which provides visual modeling
  • Decision Optimization, which enables you to make optimal business decisions by evaluating millions of possibilities
  • Execution Engine for Apache Hadoop, which enables you to explore data or build and deploy models on your Apache Hadoop cluster
IBM Cloud Pak for Data Db2 This package provides a Db2 relational database with advanced data management and analytics capabilities for enterprises that require actionable insights paired with performance, reliability, and data availability.

Installation enhancements

What's new What does it mean for me?
Simplified installation The commands to install Cloud Pak for Data are simplified in the new cpd command. The new modular architecture enables you to install the exact services you need. Additionally, the installation uses a registry to manage the images that are deployed to your Red Hat OpenShift cluster.

Unless you are installing in an air-gapped environment, you can run the installation from a Linux or Mac OS workstation that is outside of the cluster.

For details, see Installing IBM Cloud Pak for Data.

Exclusive support for Red Hat OpenShift Version 3.11 Red Hat OpenShift Version 3.11 is required to use Cloud Pak for Data. IBM Cloud Private is no longer supported.

Cloud Pak for Data includes entitlement to both the Red Hat OpenShift Container Platform and Red Hat Enterprise Linux. You can download both Red Hat OpenShift and Cloud Pak for Data from IBM Passport Advantage®. Or, you can download Red Hat OpenShift directly from the Red Hat Customer Portal. See the Cloud Pak for Data readme on IBM Passport Advantage for details.

Deprecated features

What's changed What does it mean for me?
Zeppelin Notebook Server Zeppelin Notebook Server is no longer provided as a development environment. You must use another development environment.

For a list of available development environments, see the list of Developer tools.

Accelerite ShareInsights This service is no longer offered for Cloud Pak for Data.

For a list of available services, see Services in the catalog.

IBM Cloud Private support If you deployed a previous version of Cloud Pak for Data on an IBM Cloud Private cluster, work with your IBM Support representative to migrate your system to Red Hat OpenShift.
Governance features The data governance features that were previously included in Cloud Pak for Data are replaced by the Watson Knowledge Catalog service, which supports import of governance artifacts only in the CSV format.

Previous releases

Looking for information about what we've done in previous releases? See the following topics in IBM Knowledge Center: