What's new in IBM Cloud Pak for Data?

See what new features and improvements are available in the latest release of IBM® Cloud Pak for Data.

Version 4.5.3

Released: October 2022

This release of Cloud Pak for Data Data is primarily focused on defect and security fixes. However, this release includes new features for services such as Data Virtualization, DataStage®, Watson™ Knowledge Catalog, and Watson Machine Learning.

This release also includes:
New services
Guardium® External S-TAP® is now available on Cloud Pak for Data Version 4.5.
New data fabric tutorials
Learn how to use Cloud Pak for Data to implement a data fabric with the new data fabric tutorials. For an overview of data fabric, see the data fabric use cases.
Software Version What does it mean for me?
Cloud Pak for Data platform 4.5.3
The 4.5.3 release of the Cloud Pak for Data platform includes the following features and updates:
Disaster recovery with IBM Spectrum® Fusion
Cloud Pak for Data is now integrated with IBM Spectrum Fusion, which enables you to create online backups and to restore them to the same cluster or to a different cluster. For details, see Cloud Pak for Data online backup and restore.

Refresh 3 of Version 4.5 platform includes various fixes.

For details, see What's new and changed in the platform.

Related documentation:
Cloud Pak for Data command-line interface (cpd-cli) 11.3.0

The 11.3.0 release of the Cloud Pak for Data command-line interface includes the following features and updates:

New and updated commands for online backups with IBM Spectrum Protect Plus

The 11.3.0 release of the Cloud Pak for Data command-line interface includes the following features and updates:

cpd-cli oadp install
The command now supports the installation cpdbr-hooks, which enable you to back up and restore Cloud Pak for Data deployments with IBM Spectrum Fusion Data Protection.

For details, see oadp install

cpd-cli oadp uninstall
The command now supports the removal of cpdbr-hooks.

For details, see oadp uninstall

Version 11.3.0 of theCloud Pak for Data command-line interface includes various fixes.

For details, see What's new and changed in the common core services.

Cloud Pak for Data common core services 4.5.3

Version 4.5.3 of the common core services includes various fixes.

For details, see What's new and changed in the common core services.

If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.

Cloud Pak for Data scheduling service 1.6.0

Version 1.6.0 of the scheduling service includes various fixes.

For details, see What's new and changed in the scheduling service.

Related documentation:
Analytics Engine Powered by Apache Spark 4.5.3

Version 4.5.3 of the Analytics Engine Powered by Apache Spark includes various fixes.

For details, see What's new and changed in Analytics Engine Powered by Apache Spark.

Related documentation:
Analytics Engine Powered by Apache Spark
Cognos® Analytics 22.3.0

Version 22.3.0 of the Cognos Analytics service includes various fixes.

Related documentation:
Cognos Analytics
Data Privacy 4.5.3

Version 4.5.3 of the Data Privacy service includes various fixes.

For details, see What's new and changed in Data Privacy.

Related documentation:
Data Privacy
Data Refinery 4.5.3

Version 4.5.3 of the Data Refinery service includes various fixes.

For details, see What's new and changed in Data Refinery.

Related documentation:
Data Refinery
DataStage 4.5.3

The 4.5.3 release of DataStage includes the following features and updates:

Stored procedures for additional connectors
You can now use stored procedures in the following connectors:
  • IBM Db2® for z/OS®
  • IBM Db2 for i
For more information, see Using stored procedures.
Connect to storage volumes as data sources in DataStage flows
You can now include data from a volume on an NFS server or a persistent volume claim in your DataStage flows. For more information, see Adding connections to analytics projects and Asset browser.
Before-job and after-job subroutines support KSH scripts and bash
You can now use before-job and after-job subroutines to run both bash and KSH scripts. For more information, see Setting up before-job and after-job subroutines.
Share files with Watson Studio Pipelines
You can now use a shared storage volume to share files between Watson Studio Pipelines and DataStage. For details, see Sharing storage volumes.
New stage
You can now use the Slowly Changing Dimension stage in your DataStage flows to store and manage current and historical data over time.

For more information, see Slowly Changing Dimension stage.

Simplify setup with the DSStageName server macro
You can add the DSStageName macro to stage properties or in transformer functions. You can use the macro to simplify the setup of DataStage jobs and flows because the macro acts as a DataStage function and outputs data without the need for arguments. When the job compiles, "DSStageName" is replaced with the name of the stage.

For more information, see Macros.

New parameter types
In DataStage jobs you can use two new parameter types:
Boolean
Use the Boolean parameter type to specify a true or false value.
List
Use the List parameter type to specify a list of values that are available for selection in a job.

For more information about parameters, see Creating and using parameters and parameter sets.

Add and edit column metadata for data definitions
In data definitions, you can now add and edit metadata properties at the column level. For example, you can set properties such as field level, delimiter, quotation mark, and string type.

For more information about data definitions, see Defining data definitions.

Copy and paste subflows
You can copy and paste shared subflows within a DataStage flow or between different DataStage flows in the same project. You can copy and paste a subflow as part of a larger flow or as just the subflow itself.

For more information about subflows, see Subflows.

Oracle, Snowflake, and Teradata connectors can now have multiple input links
Previously the Oracle, Snowflake, and Teradata connectors had only one input link, and you specified the link's properties in the Stage properties.

Now, the connectors can have multiple input links, and each link can have a different property. Therefore, each link can have an individual action, such as read, write, or append. You can view the properties by switching the links in the Input tab.

Version 4.5.3 of the DataStage service includes various fixes.

For details, see What's new and changed in DataStage.

Related documentation:
DataStage
Data Virtualization 1.8.3

The 1.8.3 release of Data Virtualization includes the following features and updates:

Sharing your virtualized objects is quicker and easier

When you virtualize objects, you can assign the objects to multiple projects or data requests, and you can publish the objects to a catalog, all in one step.

A Data Virtualization connection is now available in the Platform assets catalog by default

You can add a Data Virtualization connection to projects without manually populating the connection details.

Version 1.8.3 of the Data Virtualization service includes various fixes.

For details, see What's new and changed in Data Virtualization.

Related documentation:
Data Virtualization
Db2 4.5.3

The 4.5.3 release of Db2 includes the following features and updates:

HADR deployments available during upgrade
Now you can upgrade Db2 to Cloud Pak for Data 4.5.3 without disrupting connections to HADR deployments. During an upgrade, only a momentary pause occurs when processing is switched from one database to the other when the primary and standby databases are updated asynchronously. See Upgrading Db2 deployments with an HADR configuration for details.

Version 4.5.3 of the Db2 service includes various fixes. For details, see What's new and changed in Db2.

Related documentation:
Db2
Db2 Big SQL 7.3.3

Version 7.3.3 of the Db2 Big SQL service includes various fixes.

Related documentation:
Db2 Big SQL
Db2 Data Management Console 4.5.3

Version 4.5.3 of the Db2 Data Management Console service includes various fixes.

For details, see What's new and changed in Db2 Data Management Console.

Related documentation:
Db2 Data Management Console
Db2 Warehouse 4.5.3

The 4.5.3 release of Db2 Warehouse includes the following features and updates:

HADR deployments available during upgrade
Now you can upgrade Db2 Warehouse to Cloud Pak for Data 4.5.3 without disrupting connections to HADR deployments. During an upgrade, only a momentary pause occurs when processing is switched from one database to the other when the primary and standby databases are updated asynchronously. See Upgrading Db2 Warehouse with an HADR configuration for details.

Version 4.5.3 of the Db2 Warehouse service includes various fixes. For details, see What's new and changed in Db2 Warehouse.

Related documentation:
Db2 Warehouse
Decision Optimization 5.3.0

Version 5.3.0 of the Decision Optimization service includes various fixes.

For details, see What's new and changed in Decision Optimization.

Related documentation:
Decision Optimization
EDB Postgres 13.7 and 12.11

Version 13.7 and 12.11 of the EDB Postgres service includes various fixes.

Related documentation:
Execution Engine for Apache Hadoop 4.5.3

The 4.5.3 release of Execution Engine for Apache Hadoop includes the following features and updates:

Updated version of Jupyter Environments Gateway
Jupyter Environments Gateway (JEG) is upgraded from Version 2.3.0 to 2.6.0.

Version 4.5.3 of the Execution Engine for Apache Hadoop service includes various fixes.

For details, see What's new and changed in Execution Engine for Apache Hadoop.

Related documentation:
Execution Engine for Apache Hadoop
Guardium External S-TAP 1.1.0

The Guardium External S-TAP service is now available on IBM Cloud Pak for Data Version 4.5.

Use the Guardium External S-TAP service to monitor compliance and secure data in your databases.

You can install and configure the Guardium External S-TAP service in high-availability mode to intercept TCP/IP traffic (plain-text or encrypted) between Cloud Pak for Data users and database services. The intercepted traffic is sent to the Guardium collector for parsing, policy enforcement, logging, and reporting.

For more information about the service, see Guardium External S-TAP.

Version 1.1.0 of the Guardium External S-TAP service includes various fixes.

For details, see What's new and changed in Guardium External S-TAP.

Related documentation:
Guardium External S-TAP
IBM Match 360 1.5.26

Version 1.5.26 of the IBM Match 360 service includes various fixes.

For details, see What's new and changed in IBM Match 360.

Related documentation:
IBM Match 360 with Watson
MongoDB 4.2.6 and 4.4.0

Versions 4.2.6 and 4.4.0 of the MongoDB service include various fixes.

Related documentation:
MongoDB
Planning Analytics 4.5.3

The 4.5.3 release of Planning Analytics includes the following features and updates:

Updated versions of Planning Analytics Workspace and Planning Analytics Spreadsheet Services
The 4.5.3 release provides the following software versions:
Related documentation:
Planning Analytics
Product Master 2.2.0

Version 2.2.0 of the Product Master service includes various fixes.

Related documentation:
Product Master
RStudio® Server with R 3.6 4.5.3

Version 4.5.3 of the RStudio Server with R 3.6 service includes various fixes.

For details, see What's new and changed in RStudio Server with R 3.6.

Related documentation:
RStudio Server with R 3.6
SPSS® Modeler 4.5.3

Version 4.5.3 of the SPSS Modeler service includes various fixes.

For details, see What's new and changed in SPSS Modeler.

Related documentation:
SPSS Modeler
Voice Gateway 1.0.8

Version 4.5.3 of the Voice Gateway service includes various fixes.

Related documentation:
Voice Gateway
Watson Assistant 4.5.3

Version 4.5.3 of the Watson Assistant service includes various security fixes.

Related documentation:
Watson Assistant
Watson Discovery 4.5.3

Version 4.5.3 of the Watson Discovery service includes various fixes.

Related documentation:
Watson Discovery
Watson Knowledge Catalog 4.5.3

The 4.5.3 release of Watson Knowledge Catalog includes the following features and updates:

Enhancements for AI Factsheets
Automatic term assignment enhancements to metadata enrichment
The following enhancements were made to automatic term assignment in metadata enrichment:
A global model for ML-based term assignment is now the default
Instead of having individual ML models per project, the default configuration now uses a global model that is trained on metadata in the default catalog. For details, see Automatic term assignment.

You can switch to project-specific models if you prefer. For details, see Changing the scope of the built-in ML models used for term assignment.

Automatic term assignment now considers removed terms
In metadata enrichment results, users can remove terms from a column if they think those terms are inaccurate. A new machine learning model that is trained on such negative feedback now contributes to the overall confidence score for automatic term assignment to reduce inaccuracies. For details, see Automatic term assignment.
Enhancements to metadata import
Metadata import now includes the following enhancements:
Import physical data models
In addition to the logical data model support introduced in 4.5.2, you can now also import physical data models into Watson Knowledge Catalog from the following data modeling tools:
  • ER/Studio
  • erwin Data Modeler

After you import the data model assets and their relationships to a catalog, you can enhance the assets with business terms, reference data, and rules, and you can add tags.

For more information about adding data models, see Metadata import.

For more information about the new asset types, see Asset types and properties.

Fine-grained scope selection for lineage imports
When you configure lineage imports from relational data sources, you can now narrow the import scope to specific schemas in the data source instead of importing all data from the connection. For details, see Capturing lineage.
Enhancements to data quality assets in projects
Data quality assets now have the following enhancements:
Assign business terms to data quality assets
You can now assign business terms to data quality definitions and data quality rules in projects. When you publish a data quality definition, the assigned terms are also published. For details, see Managing data quality definitions and Managing data quality rules.
Test data quality rules
You can now test a rule before you add it to the project. For details, see Managing data quality rules.
Additional databases for output of data quality rules
You can now create tables for the output of data quality rules in databases on Microsoft SQL Server and Hive data sources.

For details, see Managing data quality rules.

User interface visual enhancements for data quality rules
Variables in expressions in several sections of the data quality rules user interface are now highlighted, and you can see binding information when you hover over a variable.
Highlighted variables in rule expressions
Deprecated connections
The following connections are deprecated:
  • The IBM Cloud® Compose for MySQL connection is deprecated by IBM Cloud. All instances on IBM Cloud will be removed after 1 March 2023.
  • The IBM Db2 Event Store connection is deprecated and will be removed in a future release of IBM Cloud Pak for Data.

Version 4.5.3 of the Watson Knowledge Catalog service includes various fixes.

For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson Knowledge Studio 4.5.3

Version 4.5.3 of the Watson Knowledge Studio service includes various fixes.

Related documentation:
Watson Knowledge Studio
Watson Machine Learning 4.5.3

The 4.5.3 release of Watson Machine Learning includes the following features and updates:

Improve AutoAI Time Series forecasts with supporting features
When you configure your time series experiment, you can optionally specify supporting features, which influence, or add context to, the prediction target. For example, if you are forecasting ice cream sales, daily temperature would be a logical supporting feature that would make the forecast more accurate. For details, see Building a time series experiment.
AutoAI experiments with joined data are deprecated
The AutoAI experiment feature for joining multiple data sources to create a single training data set is deprecated. Support for joining data in an AutoAI experiment will be removed in a future release. After support ends, AutoAI experiments with joined data and deployments of resulting models will no longer run.

To join multiple data sources, use a data preparation tool such as Data Refinery or DataStage to join and prepare data. Then, use the resulting data set to train an AutoAI experiment before you redeploy the resulting model. For details, see AutoAI overview.

Expanded support for AutoAI deployment input data
You can now use XLSX and Parquet connected data assets as input for an AutoAI model deployment. Now the data sources that you can use to train and deploy AutoAI models are the same.
Updated input form for online deployments
An updated entry form makes it simpler for you to provide input values for an online deployment. You can enter test values in a spreadsheet or upload structured input values from a file. For details, see Creating an online deployment.
Patch deployed RShiny apps from the user interface
You can now use the user interface to update deployed RShiny apps that were created from code packages. You can update:
  • The app’s underlying code package asset
  • The app's folder path

For details, see Updating a deployment.

Version 4.5.3 of the Watson Machine Learning service includes various fixes.

For details, see What's new and changed in Watson Machine Learning.

Related documentation:
Watson Machine Learning
Watson Machine Learning Accelerator 2.6.0

The 2.6.0 release of Watson Machine Learning Accelerator includes the following features and updates:

Watson Machine Learning Accelerator supports connections that store credentials in a vault
Watson Machine Learning Accelerator can now use training data from platform connections that use secrets in vaults to store credentials.

Previously, Watson Machine Learning Accelerator supported only connections where users manually entered credentials.

Version 2.6.0 of the Watson Machine Learning Accelerator service includes various fixes.

Related documentation:
Watson Machine Learning Accelerator
Watson OpenScale 4.5.3

Version 4.5.3 of the Watson OpenScale service includes various fixes.

For details, see What's new and changed in Watson OpenScale.

Related documentation:
Watson OpenScale
Watson Speech services 4.5.3

The 4.5.3 release of Watson Speech services includes new features and updates.

For a list of new features in Watson Speech services, see:

Version 4.5.3 of the Watson Speech services service includes various fixes.

For details, see What's new and changed in Watson Speech services.

Related documentation:
Watson Speech services
Watson Studio 4.5.3

The 4.5.3 release of Watson Studio includes the following features and updates:

Insert data from Excel sheets into Jupyter notebooks
You can now use the Insert to code function in Jupyter notebooks to generate code that accesses data in single sheets in Excel file assets.
Add catalog assets from within a project

You can now add catalog assets directly from within a project. You no longer have to go to the catalog to add catalog assets to a project. For details, see Adding catalog assets to a project.

Removal of the prefix "IBM" from notebook environment templates
The prefix "IBM" has been removed from all IBM Runtime 22.1 environment templates. For example, the IBM Runtime 22.1 on Python 3.9 template is now called Runtime 22.1 on Python 3.9.
Obtaining your access token by using the USER_ACCESS_TOKEN environment variable is now deprecated
The option to obtain your access token with the USER_ACCESS_TOKEN environment variable is deprecated and will be removed in a future release. This token was used in notebooks, scripts, Python functions, and R Shiny Apps to interact with the Watson Data APIs.

Now, the recommended method for obtaining your access token is to use the get_current_token() function in the ibm-watson-studio-lib library.

For details, see Stop using the environment variable USER_ACCESS_TOKEN.

Deprecated connections
The following connections are deprecated:
  • The IBM Cloud Compose for MySQL connection is deprecated by IBM Cloud. All instances on IBM Cloud will be removed after 1 March 2023.
  • The IBM Db2 Event Store connection is deprecated and will be removed in a future release of IBM Cloud Pak for Data.
New instructions for visualizations in projects
Now you can learn more about how to create and edit visualizations in your project. See Visualizing your data.

Version 4.5.3 of the Watson Studio service includes various fixes.

For details, see What's new and changed in Watson Studio.

Related documentation:
Watson Studio
Watson Studio Runtimes 5.3.0

The 5.3.0 release of Watson Studio Runtimes includes the following features and updates:

Removal of the prefix "IBM" from notebook environment templates
The prefix "IBM" has been removed from all IBM Runtime 22.1 environment templates. For example, the IBM Runtime 22.1 on Python 3.9 template is now called Runtime 22.1 on Python 3.9.

Version 5.3.0 of the Watson Studio Runtimes service includes various fixes.

For details, see What's new and changed in Watson Studio Runtimes.

Related documentation:
Jupyter Notebook runtimes for Watson Studio

Version 4.5.2

Released: August 2022

This release of Cloud Pak for Data is primarily focused on defect and security fixes. However, this release includes new features for services such as DataStage, Db2 Data Gate, and Watson Knowledge Catalog.

This release also includes the initial release of Voice Gateway on Cloud Pak for Data Version 4.5.

Software Version What does it mean for me?
Cloud Pak for Data platform 4.5.2
The 4.5.2 release of the Cloud Pak for Data platform includes fixes for the following components:
  • Cloud Pak for Data control plane

For details, see What's new and changed in the platform.

Related documentation:
Cloud Pak for Data command-line interface (cpd-cli) 11.2.0

The 11.2.0 release of the Cloud Pak for Data command-line interface includes the following features and updates:

New cpd-cli plug-in for LDAP migration
If you have an existing Cloud Pak for Data LDAP configuration but you want to start using the Identity and Access Management Service (IAM Service), you can use the cpd-cli migrate-ldap command to migrate your Cloud Pak for Data LDAP configuration to the IAM Service.
Important: The IAM Service might not be right for your environment. Before you start using the IAM Service, review the benefits and drawbacks in Integrating with the IAM Service.

For more information, see Migrating your LDAP configuration to the IAM Service.

New command reference

The migrate-ldap command reference documents the full syntax, arguments, and options. The command reference also provides examples.

Version 11.2.0 of theCloud Pak for Data command-line interface includes various fixes.

Cloud Pak for Data common core services 4.5.2
The 4.5.2 release of the common core services includes changes to support features and updates in Watson Studio and Watson Knowledge Catalog.
Version 4.5.2 of the common core services includes the following features and updates:
Configure custom encryption for access tokens for projects that support Git integration
If you have projects that are integrated with Git, you can optionally use custom encryption keys to encrypt the Git access tokens that are associated with the projects. Using custom encryption allows you to manage the encryption process and control how your tokens are securely stored. For details, see Configuring custom encryption for access tokens.

Version 4.5.2 of the common core services includes various fixes.

For details, see What's new and changed in the common core services.

If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.

Cloud Pak for Data scheduling service 1.5.0

Version 1.5.0 of the scheduling service includes various fixes.

For details, see What's new and changed in the scheduling service.

Related documentation:
You can install and upgrade the scheduling service along with the Cloud Pak for Data platform. For details, see:
Analytics Engine Powered by Apache Spark 4.5.2

Version 4.5.2 of the Analytics Engine Powered by Apache Spark includes various fixes.

For details, see What's new and changed in Analytics Engine Powered by Apache Spark.

Related documentation:
Analytics Engine Powered by Apache Spark
Cognos Analytics 22.2.0

The 22.2.0 release of Cognos Analytics includes the following features and updates:

Removal of database drivers
The following JDBC drivers are removed from Cognos Analytics on Cloud Pak for Data 4.5.2:
  • Athena
  • Impala
Updated software version for Cognos Analytics
This release provides software version 11.2.2 IF 1008 of Cognos Analytics.
Related documentation:
Cognos Analytics
Cognos Dashboards 4.5.2

The 4.5.2 release of Cognos Dashboards includes the following features and updates:

Exasol data in Cognos Dashboards
You can now connect to Exasol as a data source within Cognos Dashboards. For details on the connection, see Exasol connection.

Version 4.5.2 of the Cognos Dashboards service includes various fixes.

For details, see What's new and changed in Cognos Dashboards.

Related documentation:
Cognos Dashboards
Data Privacy 4.5.2

Version 4.5.2 of the Data Privacy service includes various fixes.

For details, see What's new and changed in Data Privacy.

Related documentation:
Data Privacy
Data Refinery 4.5.2

Version 4.5.2 of the Data Refinery service includes various fixes.

For details, see What's new and changed in Data Refinery.

Related documentation:
Data Refinery
DataStage 4.5.2

The 4.5.2 release of DataStage includes the following features and updates:

Send reports by using before/after-job subroutines
You can now send full job reports by using the Send mail function of before/after-job subroutines.
Support for migrating Db2 server-type data connection objects from traditional DataStage
Traditional DataStage supports data connection objects of the type Db2 server. When you migrate these data connection objects to modern DataStage, they are automatically converted to Db2 connector objects so that you can still use them in your DataStage flows and jobs.
New Custom stage
You can now use the Custom stage in your DataStage flows. Use the Custom stage to process your data according to your own specifications.

For details on the Custom stage, see Defining custom stages in DataStage.

Use function libraries in the Transformer stage
You can now define your own functions for use in the Transformer stage's expression editor by creating a function library asset from an SO file. For details on function libraries, see Function libraries in DataStage.
Use new functions in the Transformer stage
You can now use the ConvertDatum and NextValidDate functions in the Transformer stage as part of your DataStage flows.

For the full list of available functions, see Parallel transform functions.

Migrate nested sequence jobs to Watson Pipelines
A traditional DataStage sequence job can contain a "Job Activity" type that can invoke another sequence job, making the second sequence job a nested sequence job.

You can now migrate these nested sequence jobs into modern DataStage. When migrated, a sequence job in the ISX import file is converted to a Watson pipeline.

Connect to a new data source
You can now include data from Apache HBase in your DataStage flows.

For the full list of DataStage connectors, see DataStage connectors.

Kerberos authentication supported for Apache Hive connections
You can now use the Kerberos authentication protocol to connect to an Apache Hive data source when you create the connection from the DataStage service. For information, see Apache Hive connection.
The Db2 (optimized) connector can now have multiple input links, each with an individual action
Previously the Db2 (optimized) connector had only one input link, and you specified the link's properties in the Stage properties. Now the connector can have multiple input links, and each link can have a different property. This enhancement means that each link can have an individual action, such as read, write, append, etc. You can view the properties by switching the links in the Input tab.

Version 4.5.2 of the DataStage service includes various fixes.

For details, see What's new and changed in DataStage.

Related documentation:
DataStage
Db2 Data Gate 2.6.0

The 2.6.0 release of Db2 Data Gate includes the following features and updates:

Early notice for certificate renewals
When certificates are renewed, Db2 Data Gate is unavailable for 10 to 20 minutes. The Db2 Data Gate user interface now includes notifications about upcoming renewals. You can optionally renew the certificates manually before the scheduled renewal to avoid an unplanned or inconveniently timed outage. For more information on manually renewing the certificates, see Replacing certificates for Db2 Data Gate 2.6.0 or higher.

Version 2.6.0 of the Db2 Data Gate service includes various fixes.

For details, see What's new and changed in Db2 Data Gate.

EDB Postgres 13.7 and 12.11

Version 13.7 and 12.11 of the EDB Postgres service includes various fixes.

Related documentation:
EDB Postgres
Execution Engine for Apache Hadoop 4.5.2

Version 4.5.2 of the Execution Engine for Apache Hadoop service includes various fixes.

For details, see What's new and changed in Execution Engine for Apache Hadoop.

Related documentation:
Execution Engine for Apache Hadoop
IBM Match 360 1.4.38

Version 1.4.38 of the IBM Match 360 service includes various fixes.

For details, see What's new and changed in IBM Match 360.

Related documentation:
IBM Match 360 with Watson
Informix® 4.6.0

Version 4.6.0 of the Informix service includes various fixes.

Related documentation:
Informix
MongoDB 4.2.6 and 4.4.0

Versions 4.2.6 and 4.4.0 of the MongoDB service include various fixes.

Related documentation:
MongoDB
OpenPages 8.300.2

Version 8.300.2 of the OpenPages service includes various fixes.

Related documentation:
OpenPages
Planning Analytics 4.5.2

Version 4.5.2 of the Planning Analytics service includes the following features and updates:

Updated versions of Planning Analytics Workspace, TM1®, and Planning Analytics Spreadsheet Services
The 4.5.2 release provides the following software versions:

For details, see What's new and changed in Planning Analytics.

Related documentation:
Planning Analytics
Product Master 2.1.0

Version 2.1.0 of the Product Master service includes various fixes.

For details, see What's new and changed in Product Master.

Related documentation:
Product Master
RStudio Server with R 3.6 4.5.2

Version 4.5.2 of the RStudio Server with R 3.6 service includes various fixes.

For details, see What's new and changed in RStudio Server with R 3.6.

Related documentation:
RStudio Server with R 3.6
SPSS Modeler 4.5.2

Version 4.5.2 of the SPSS Modeler service includes various fixes.

For details, see What's new and changed in SPSS Modeler.

Related documentation:
SPSS Modeler
Voice Gateway 1.0.8

Version 1.0.8 of the Voice Gateway service includes various fixes.

For details, see What's new and changed in Voice Gateway.

Related documentation:
Voice Gateway
Watson Knowledge Catalog 4.5.2

The 4.5.2 release of Watson Knowledge Catalog includes the following features and updates:

Import logical data models
To provide a single collection point for all business knowledge that is related to your data management landscape, you can now import logical data models into Watson Knowledge Catalog from the following data modeling tools:
  • ER/Studio
  • erwin Data Modeler

After you import the data model assets and their relationships to a catalog, you can enhance the assets with business terms, reference data, and rules, and add tags.

For more information about adding data models, see Metadata import.

For more information about the new asset types, see Asset types and properties.

Version 4.5.2 of the Watson Knowledge Catalog service includes various fixes.

For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson Machine Learning 4.5.2

The 4.5.2 release of Watson Machine Learning includes the following features and updates:

Automate complex flows by using nested pipelines
Watson Studio Pipelines (beta) now supports nested pipelines. Nested pipelines give you more flexibility for reusing components between pipelines. For details, see Configuring pipeline components.
Expanded input options for AutoAI model deployments
This release includes support for DataStax and Exasol as input data sources for deploying an AutoAI model. See Batch deployment input details for AutoAI models.

Version 4.5.2 of the Watson Machine Learning service includes various fixes.

For details, see What's new and changed in Watson Machine Learning.

Related documentation:
Watson Machine Learning
Watson Machine Learning Accelerator 2.5.0

The 2.5.0 release of Watson Machine Learning Accelerator includes the following features and updates:

Migrate Watson Machine Learning Accelerator metadata between Cloud Pak for Data installations
You can now use the cpd-cli export-import command to export Watson Machine Learning Accelerator metadata from one Cloud Pak for Data installation and import it to another Cloud Pak for Data installation. For more information, see Migrating metadata between Cloud Pak for Data installations.

Version 2.5.0 of the Watson Machine Learning Accelerator service includes various fixes.

For details, see What's new and changed in Watson Machine Learning Accelerator.

Related documentation:
Watson Machine Learning Accelerator
Watson OpenScale 4.5.2

Version 4.5.2 of the Watson OpenScale service includes various fixes.

For details, see What's new and changed in Watson OpenScale.

Related documentation:
Watson OpenScale
Watson Studio 4.5.2

Version 4.5.2 of the Watson Studio service includes various fixes.

For details, see What's new and changed in Watson Studio.

Related documentation:
Watson Studio
Watson Studio Runtimes 5.2.0

Version 5.2.0 of the Watson Studio Runtimes service includes various fixes.

For details, see What's new and changed in Watson Studio Runtimes.

Related documentation:
Jupyter Notebook runtimes for Watson Studio

Version 4.5.1

Released: July 2022

This release of Cloud Pak for Data is primarily focused on defect and security fixes. However, this release includes new features for services such as Data Virtualization, DataStage, Db2 Data Management Console, Planning Analytics, and Watson Knowledge Catalog.

Software Version What does it mean for me?
Cloud Pak for Data platform 4.5.1
The 4.5.1 release of the Cloud Pak for Data platform includes fixes for the following components:
  • IBM Cloud Pak® for Data platform operator
Related documentation:
Cloud Pak for Data command-line interface (cpd-cli) 11.1.0

Version 11.1.0 of theCloud Pak for Data command-line interface includes various fixes.

For details, see What's new and changed in the Cloud Pak for Data command-line interface.

Cloud Pak for Data common core services 4.5.1

Version 4.5.1 of the common core services includes various fixes.

For details, see What's new and changed in the common core services.

If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.

Cloud Pak for Data scheduling service 1.4.0

Version 1.4.0 of the scheduling service includes various fixes.

For details, see What's new and changed in the scheduling service.

Related documentation:
You can install and upgrade the scheduling service along with the Cloud Pak for Data platform. For details, see:
Analytics Engine Powered by Apache Spark 4.5.1

The 4.5.1 release of Analytics Engine Powered by Apache Spark includes the following features and updates:

Running applications on a remote Hadoop cluster is deprecated
With the remote Hadoop feature, you can access an HDFS or HMS service that is running on a Kerberos-enabled HDP 2.6.5 or HDP 3.1 Hadoop cluster. Because HDP is reaching the end of service, the remote Hadoop feature is deprecated.
End of support for Spark 3.0
You can no longer run your Spark applications using Spark 3.0. You must now use Spark 3.2.

Version 4.5.1 of the Analytics Engine Powered by Apache Spark includes various fixes.

For details, see What's new and changed in Analytics Engine Powered by Apache Spark.

Related documentation:
Analytics Engine Powered by Apache Spark
Cognos Analytics 22.1.0

The 22.1.0 release of Cognos Analytics includes the following features and updates:

Updated software version for Cognos Analytics
This release provides software version 11.2.2 IF 1 of Cognos Analytics.

Version 22.1.0 of the Cognos Analytics service includes various fixes.

Related documentation:
Cognos Analytics
Cognos Dashboards 4.5.1

Version 4.5.1 of the Cognos Dashboards service includes various fixes.

Related documentation:
Cognos Dashboards
Data Privacy 4.5.1

The 4.5.1 release of Data Privacy includes the following features and updates:

Support for Apache Hive
Data Privacy now supports masking data through Apache Hive. Use Apache Hive to read, write, and manage masked data that is copied to Apache Hadoop Distributed File System (HDFS). Hive can store petabytes of data, and because it stores copied data in HDFS, it’s a more scalable solution than a traditional database.

For more information, see Advanced data masking.

Version 4.5.1 of the Data Privacy service includes various fixes.

For details, see What's new and changed in Data Privacy.

Related documentation:
Data Privacy
Data Refinery 4.5.1

The 4.5.1 release of Data Refinery includes the following features and updates:

The Default Spark 3.0 & R 3.6 environment is discontinued
If you have any Data Refinery flow jobs set up with the Default Spark 3.0 & R 3.6 environment or a custom environment that uses Spark 3.0, the jobs will fail. Change the environment to one of the following environments:
  • Default Spark 3.2 & R 3.6
  • Default Data Refinery XS
  • A custom environment that does not use Spark 3.0

For information, see Data Refinery environments.

Version 4.5.1 of the Data Refinery service includes various fixes.

For details, see What's new and changed in Data Refinery.

Related documentation:
Data Refinery
DataStage 4.5.1

The 4.5.1 release of DataStage includes the following features and updates:

Connect to more data sources in DataStage
You can now include data from these data sources in your DataStage flows:
  • Cognos Analytics
  • SAP IQ

For the full list of DataStage connectors, see DataStage connectors.

Use new functions and features in the DataStage Transformer stage
  • You can now use the Fold, Fmt, and Rmunprint functions in the Transformer stage as part of your DataStage flows.
  • The Transformer stage now supports partitions.
  • You can now use type-ahead search in the Transformer stage for functions, columns, and variables.

For the full list of available functions, see Parallel transform functions.

Version 4.5.1 of the DataStage service includes various fixes.

For details, see What's new and changed in DataStage.

Related documentation:
DataStage
Data Virtualization 1.8.1
The 1.8.1 release of Data Virtualization includes the following features and updates.
Use Cognos Authentication Method (CAM) credentials to connect to Planning Analytics data sources
You can now use CAM credentials as an authentication method when you create a connection to a Planning Analytics data source in Data Virtualization. For more information, see IBM Planning Analytics connection.
Screenshot of a Planning Analytics connection in Data Virtualization.
Use Watson Knowledge Catalog policies to filter rows in virtualized tables

You might have a data source that has tables with government, enterprise, and retail client data combined. For example, a billing table might have data for all the customers, where some of the rows are for government clients and some are for nongovernment clients. The type of the client is not indicated in the billing table. Now, you can filter the list of client records by using one of the following techniques.

  • You can use a separate table to identify customers that are government clients. The IDs from this table can be used to filter out rows from the billing table. When you filter out rows, the masked table does not contain the rows with data of government clients.
  • You can use a table of blocked customer identifiers as a reference table. Any rows in the billing table that have rows with the customer identifier that is included in the blocked customer set are filtered out of the resulting set.

For more information, see Filtering rows in data protection rules.

Version 1.8.1 of the Data Virtualization service includes various fixes.

For details, see What's new and changed in Data Virtualization.

Related documentation:
Data Virtualization
Db2 4.5.1

Version 4.5.1 of the Db2 service includes various fixes.

For details, see What's new and changed in Db2.

Related documentation:
Db2
Db2 Big SQL 7.3.1

Version 7.3.1 of the Db2 Big SQL service includes various fixes.

Related documentation:
Db2 Big SQL
Db2 Data Management Console 4.5.1

The 4.5.1 release of Db2 Data Management Console includes the following features and updates:

Manage databases that use vaults to store secrets
You can now manage Db2 and Db2 Warehouse databases that use vaults and secrets to securely store and access credentials in connections.

Version 4.5.1 of the Db2 Data Management Console service includes various fixes.

For details, see What's new and changed in Db2 Data Management Console.

Related documentation:
Db2 Data Management Console
Db2 Warehouse 4.5.1

Version 4.5.1 of the Db2 Warehouse service includes various fixes.

For details, see What's new and changed in Db2 Warehouse.

Related documentation:
Db2 Warehouse
Decision Optimization 5.1.0

Version 5.1.0 of the Decision Optimization service includes various fixes.

For details, see What's new and changed in Decision Optimization.

Related documentation:
Decision Optimization
Execution Engine for Apache Hadoop 4.5.1

Version 4.5.1 of the Execution Engine for Apache Hadoop service includes various fixes.

For details, see What's new and changed in Execution Engine for Apache Hadoop.

Related documentation:
Execution Engine for Apache Hadoop
IBM Match 360 1.3.31

Version 1.3.31 of the IBM Match 360 service includes various fixes.

For details, see What's new and changed in IBM Match 360.

Related documentation:
IBM Match 360 with Watson
OpenPages 8.300.1

Version 8.300.1 of the OpenPages service includes various fixes.

Related documentation:
OpenPages
Planning Analytics 4.5.1

The 4.5.1 release of Planning Analytics includes the following features and updates:

Planning Analytics technical preview
The Planning Analytics technical preview includes technical previews of Planning Analytics Engine and Planning Analytics Spreadsheet Services.

To use these components, you must enable the Planning Analytics technical preview when you install the Planning Analytics add-on.

For details, see:

Version 4.5.1 of the Planning Analytics service includes various fixes.

Related documentation:
Planning Analytics
RStudio Server with R 3.6 4.5.1

Version 4.5.1 of the RStudio Server with R 3.6 service includes various fixes.

For details, see What's new and changed in RStudio Server with R 3.6.

Related documentation:
RStudio Server with R 3.6
SPSS Modeler 4.5.1

Version 4.5.1 of the SPSS Modeler service includes various fixes.

For details, see What's new and changed in SPSS Modeler.

Related documentation:
SPSS Modeler
Watson Assistant 4.5.1

Version 4.5.1 of the Watson Assistant service includes various security fixes.

Related documentation:
Watson Assistant
Watson Discovery 4.5.1

Version 4.5.1 of the Watson Discovery service includes various fixes.

Related documentation:
Watson Discovery
Watson Knowledge Catalog 4.5.1

The 4.5.1 release of Watson Knowledge Catalog includes the following features and updates:

Support for advanced data masking
Data masking mitigates data risks by minimizing unauthorized exposure of sensitive data. Advanced data masking extends the capability of data protection rules by protecting sensitive data with advanced de-identification techniques. The techniques provide users with fictitious, yet statistically equivalent data instead of real and sensitive data. Users can use this statistically equivalent data to run business processes.

You can now transform data by using redact, substitute, and obfuscate when you create data protection rules. The transformations use the advanced data masking techniques to mask data in Data Refinery, in masking flows, and when the asset is previewed or downloaded.

For more information, see Advanced data masking.

Name change for the IBM SQL Query connection
The IBM SQL Query connection has been renamed toIBM Cloud Data Engine. Your previous settings for the connection remain the same. Only the connection name has changed.
Active Directory supported for the Microsoft SQL Server connection
You can now select Active Directory for Microsoft SQL Server authentication. With this enhancement, you can take advantage of credentials that are stored in an NTLM account database instead of on the Microsoft SQL Server. For more information, see Microsoft SQL Server connection.

Version 4.5.1 of the Watson Knowledge Catalog service includes various fixes.

For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson Knowledge Studio 4.5.1

Starting in Version 4.5.1, Watson Knowledge Studio is available on IBM Cloud Pak for Data Version 4.5.

Version 4.5.1 of the Watson Knowledge Studio service includes various fixes.

Related documentation:
Watson Knowledge Studio
Watson Machine Learning 4.5.1

The 4.5.1 release of Watson Machine Learning includes the following features and updates:

End of support for Spark 3.0
You can no longer use machine learning models based on the Spark 3.0 framework. If you need Spark, you must use Spark 3.2.

For more information, see Supported frameworks and software specifications.

Version 4.5.1 of the Watson Machine Learning service includes various fixes.

For details, see What's new and changed in Watson Machine Learning.

Related documentation:
Watson Machine Learning
Watson Machine Learning Accelerator 2.4.1

The 2.4.1 release of Watson Machine Learning Accelerator includes the following features and updates:

Support for Python 3.9.12
When you install Watson Machine Learning Accelerator, the service uses Python 3.9.12 for running deep learning frameworks.

If you are upgrading to Watson Machine Learning Accelerator Version 2.4.1, update and test your models to use the latest supported frameworks for Python 3.9.12.

Access models in storage volumes
When you create an inference service, you can configure the service to read data from a Cloud Pak for Data storage volume. For more information, see:
  • For more information about setting up storage volumes in Cloud Pak for Data, see Managing storage volumes.
  • For more information about creating an inference service, see Create an inference service in the Watson Machine Learning Accelerator documentation.
Restriction: Watson Machine Learning Accelerator supports only PVC-based storage volumes. You cannot read data from NFS storage volumes or External SMB storage volumes.

Version 2.4.1 of the Watson Machine Learning Accelerator service includes various fixes.

Related documentation:
Watson Machine Learning Accelerator
Watson OpenScale 4.5.1

Version 4.5.1 of the Watson OpenScale service includes various fixes.

For details, see What's new and changed in Watson OpenScale.

Related documentation:
Watson OpenScale
Watson Speech services 4.5.1

The 4.5.1 release of Watson Speech services includes features and enhancements, such as language updates.

For a list of new features in Watson Speech services, see:

Version 4.5.1 of the Watson Speech services service includes various fixes.

Related documentation:
Watson Speech services
Watson Studio 4.5.1

The 4.5.1 release of Watson Studio includes the following features and updates:

Use the mamba package manager to customize your Jupyter environment templates and custom images
You can now use the mamba package manager when you add a software configuration to Jupyter environment templates or when you create a custom environment image. For details, see Customizing environment templates.
End of support for Spark 3.0
You can no longer use Jupyter environments with Spark 3.0. If you need Spark, you must use Spark 3.2. For details, see Spark environments.
Name change for the IBM SQL Query connection
The IBM SQL Query connection has been renamed to IBM Cloud Data Engine. Your previous settings for the connection remain the same. Only the connection name has changed.
Active Directory supported for the Microsoft SQL Server connection
You can now select Active Directory for Microsoft SQL Server authentication. With this enhancement, you can take advantage of credentials that are stored in an NTLM account database instead of on the Microsoft SQL Server. For information, see Microsoft SQL Server connection.

Version 4.5.1 of the Watson Studio service includes various fixes.

For details, see What's new and changed in Watson Studio.

Related documentation:
Watson Studio
Watson Studio Runtimes 5.1.0

The 5.1.0 release of Watson Studio Runtimes includes the following features and updates:

Use the mamba package manager to customize your Jupyter environment templates and custom images
You can now use the mamba package manager when you add a software configuration to Jupyter environment templates or when you create a custom environment image. For details, see Customizing environment templates.
End of support for Spark 3.0
You can no longer use Jupyter environments with Spark 3.0. If you need Spark, you must use Spark 3.2. For details, see Spark environments.

Version 5.1.0 of the Watson Studio Runtimes service includes various fixes.

For details, see What's new and changed in Watson Studio Runtimes.

Related documentation:
Jupyter Notebook runtimes for Watson Studio

What's new in Version 4.5

IBM Cloud Pak for Data 4.5 introduces a new command-line interface to simplify installations and upgrades, support for online, disruption-free backups, support on Red Hat® OpenShift® Container Platform Version 4.10, and more.

In addition, the release includes enhancements to existing services, such as Data Privacy, Data Virtualization, IBM Match 360 with Watson, and Watson Knowledge Catalog.

Platform enhancements

The following table lists the new features that were introduced in Cloud Pak for Data Version 4.5.

What's new What does it mean for me?
Disruption-free online backups
Administrators can now create frequent backups of Cloud Pak for Data without taking Cloud Pak for Data offline. The new backup capability enables you to efficiently protect your data without sacrificing productivity.

The online backup capability relies on Container Storage Interface (CSI) snapshots, which do not require any downtime. For details, see Backing up and restoring Cloud Pak for Data.

Users with the Manage projects permission can join any project
Users with the Manage projects permission can join any project as an administrator so that they can:
  • Delete unused projects
  • Ensure that active projects have at least one owner
To join projects as an administrator, go to the Projects page, and under Your role, click Join as admin.
Monitor workflow tasks Workflow administrators can now view metrics for active tasks. The Task status page includes graphic overview of the ownership status and due dates for all active tasks. You can also filter the task list and set multiple tasks back to unclaimed at once.Screenshot of the workflow task monitoring page

For details, see Monitoring workflow tasks.

Find what you need quickly with the new search experience

Find what you need quickly and evaluate results more easily with the new search experience. You'll see the new search experience when you search for assets or governance artifacts in the global search field.

You can now get better search results that are based on your intent. Common phrases are prioritized and unimportant words are discarded. When you search for phrases in English, natural language analysis optimizes your search.

You can now quickly evaluate results. The new search results experience shows the context for your search term and provides many filters based on more asset and artifact properties.

Shows sample search results

For details, see Searching for assets and artifacts across the platform.

Set quotas on projects
In addition to setting quotas on the platform and on individual services, you can now set and enforce quotas on projects (collaborative workspaces where you work with data and other assets to accomplish a particular goal). You can set quotas to help you track your actual use against your target use. When you set quotas on a project, you specify:
  • How much vCPU and memory you want the project to use.
  • The alert threshold for vCPU and memory use.

    When you reach the alert threshold, the platform sends you an alert so that you aren't surprised by unexpected spikes in resource use.

For details, see Monitoring the platform.

Manage resources more effectively
Cloud Pak for Data Version 4.5 introduces new ways to manage your cluster resources:
  • Several services support Red Hat OpenShift Horizontal Pod Autoscaler (HPA). You can enable these services to automatically scale up and down based on user workloads. This enables you automatically to free up resources if the services are not using them to run workloads. For details, see Automatically scaling resources for services.
  • Some services can be shut down when they are not in use to prevent the services from using cluster resources. You can also shut down the services to perform maintenance. For details, see Shutting down and restarting services.
Create dynamic, attribute-based user groups.
If you integrate Cloud Pak for Data with the Identity and Access Management Service (IAM Service), you can use the following attributes to define dynamic user groups:
  • Location
  • Nationality
  • Organization
  • User type

Users are automatically added and removed from the group based on the attributes that are assigned to them on the identity provider. For example, you create a user group for people managers (user type) in the finance group (organization) in Canada (location). If Annette is hired as a people manager for the finance group in Canada, she will automatically become a member of the group. Similarly, if Rajesh is transferred to Spain, he will automatically be removed from the group.

Dynamic user groups simplify the process of managing user groups in a large organization. For details, see Managing user groups.

New Cloud Pak for Data CLI commands and reference
Starting in Cloud Pak for Data Version 4.5, the cpd-cli includes new commands and a new command reference:
Manage command
You can use the cpd-cli manage command to install and manage the Cloud Pak for Data software on your Red Hat OpenShift Container Platform cluster. You can use the cpd-cli manage to install all of the components that you need to install at the same time. For details, see Installing the Cloud Pak for Data platform and services.
Command reference
The cpd-cli commands use a combination of arguments and options. Each command has a syntax that designates both the required and optional arguments and options. The cpd-cli command reference documents the full syntax, arguments, options, and provides syntax examples for each supported command.
Command plug-ins
The cpd-cli commands are divided into the following command plug-ins.
backup-restore
Backup and restore the Cloud Pak for Data software in a Red Hat OpenShift Container Platform cluster.
config
Create and configure user profiles, which are required to run cpd-cli commands.
diag
Gather diagnostic information and check the health of Cloud Pak for Data services.
export-import
Migrate data, including metadata, between Cloud Pak for Data clusters.
manage
Install and manage the Cloud Pak for Data software in a Red Hat OpenShift Container Platform cluster.
oadp
Run backup and restore operations by calling Velero client APIs (similar to the Velero CLI).
service-instance
Manage Cloud Pak for Data service instances.
user-mgmt
Import and manage Cloud Pak for Data users and user groups.
For details, see cpd-cli command reference.
Expanded support for IBM Spectrum storage All of the services that are available in Cloud Pak for Data Version 4.5.0 support IBM Spectrum Fusion and IBM Spectrum Scale Container Native storage. For details, see Persistent storage requirements for services.
Support for Amazon Elastic File System and Amazon Elastic Block Store storage
Cloud Pak for Data Version 4.5.0 includes support for Amazon Elastic File System and Amazon Elastic Block Store:
  • Many of the services that are available in Cloud Pak for Data Version 4.5.0 support Amazon Elastic File System storage.
  • Some services support a combination of Amazon Elastic File System and Amazon Elastic Block Store

For details, see Persistent storage requirements for services.

Support for Microsoft Edge
You can now use the Cloud Pak for Data web client on Microsoft Edge Version 95 and higher.

Service enhancements

The following table lists the new features that are introduced for existing services in Cloud Pak for Data Version 4.5:

Software Version What does it mean for me?
Cloud Pak for Data common core services 4.5.0
The 4.5.0 release of the common core services includes changes to support features and updates in Watson Studio and Watson Knowledge Catalog.
Version 4.5.0 of the common core services includes the following features and updates:
Boost your productivity with the new projects experience
The projects interface has a new design that makes working and collaborating in a project easier and more efficient. Check out the enhanced asset organization, asset relations, improved navigation, and built-in guidance.

With the new projects UI, you can:

  • View the project and resource usage summary on the Overview tab.
  • Add, filter, and browse assets on the Assets tab using the Add asset and New asset buttons.
  • Conduct all administrative tasks under the Manage tab.
Screenshot of the new projects UI

Version 4.5.0 of the common core services includes various fixes.

For details, see What's new and changed in the common core services.

If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.

Cloud Pak for Data scheduling service 1.3.6

Version 1.3.6 of the scheduling service includes various fixes.

For details, see What's new and changed in the scheduling service.

Related documentation:
You can install and upgrade the scheduling service along with the Cloud Pak for Data platform. For details, see:
Analytics Engine Powered by Apache Spark 4.5.0

The 4.5.0 release of Analytics Engine Powered by Apache Spark includes the following features and updates:

Support for Spark 3.2
You can use Spark 3.2 to run your applications on Analytics Engine Powered by Apache Spark instances.
Deprecation of Spark 3.0
You can still use Spark 3.0 in your applications. However, you should consider moving to Spark 3.2.
Spark history server
New Spark history server endpoints are now available to start, stop and view the history server. Additionally, you can customize the Spark history server properties, including the resource allocation for the history server. For details, see Accessing and customizing the Spark history server.

Version 4.5.0 of the Analytics Engine Powered by Apache Spark includes various fixes.

For details, see What's new and changed in Analytics Engine Powered by Apache Spark.

Related documentation:
Analytics Engine Powered by Apache Spark
Cognos Analytics 22.0.0

The 22.0.0 release of Cognos Analytics includes the following features and updates:

Optionally create service instances in tethered projects
You can create one Cognos Analytics service instance in the project where the Cloud Pak for Data control plane is installed.

If you want to create multiple service instances, or if you want to isolate the service instance from other workloads that are associated with Cloud Pak for Data, you can create instances of Cognos Analytics in tethered projects. (You can deploy one instance in each tethered project.)

For details, see:

New image management API
Now you can use the image management API commands to manage images that are used in Cognos Analytics reports, dashboards, and so on. The image management API also makes it easy to get information about all your image files and to upload multiple images.

For more information, see Cognos Analytics artifacts and images APIs.

Updated software version for Cognos Analytics
This release provides software version 11.2.2 of Cognos Analytics.

Version 22.0.0 of the Cognos Analytics service includes various fixes.

For details, see What's new and changed in Cognos Analytics.

Related documentation:
Cognos Analytics
Cognos Dashboards 4.5.0

Version 4.5.0 of the Cognos Dashboards service includes various fixes.

Related documentation:
Cognos Dashboards
Data Privacy 4.5.0

The 4.5.0 release of Data Privacy includes the following features and updates:

New options for masking dates with the obfuscate and redact methods
When you create rules by using advanced masking, you now have new expanded options for masking dates. The obfuscate method and redact method include the following options to help you transform dates into similar date formats:
Obfuscate method
When you select the obfuscate method for masking dates, the following options for altering the dates are available:
  • Basic date masking: Specify a date range, and the dates in that range are used to mask your dates.
  • Shift date by fixed amount: Specify an interval of a set number of days to mask your dates.
  • Mask date to same time period: Specify the same week, month, quarter, or year of the dates you want to mask. For example, if the original passport expiration date is 2022-03-11, and you selected Same week, the date that will be used to mask the expiration date is 2022-03-09.

For details, see Preserve format method.

Redact method
When you select the redact method for masking dates, you can specify the date values you want to use to alter the original dates. For details, see Redacting data method.

Version 4.5.0 of the Data Privacy service includes various fixes.

For details, see What's new and changed in Data Privacy.

Related documentation:
Data Privacy
Data Refinery 4.5.0

The 4.5.0 release of Data Refinery includes the following features and updates:

Review and change the data types assigned in the first step of the flow
Data Refinery automatically inserts a first step in the flow to convert any non-string data types that it finds in the data to inferred types. Now you can view these inferred data types and change them, if appropriate.
Edit auto-convert operation

For information about the Convert column type operation, see GUI operations.

View your Data Refinery data in a CSV file without running a Data Refinery flow job
You can now export the data at the current step in your Data Refinery flow to a CSV file without saving or running a Data Refinery flow job. With this enhancement, you can now quickly save and view data that is in progress. For information, see Managing Data Refinery flows.
Control the placement of a new column in a Data Refinery flow
When you use an operation that can create a new column in the Data Refinery flow and you select Create a new column for results, you can now optionally place the new column to the right of the original column.
Option for placing new column

This option is available for the following operations:

  • Calculate
  • Conditional replace
  • Convert column type
  • Convert column value to missing
  • Extract date or time value
  • Math
  • Replace missing values
  • Replace substring
  • Text
  • Tokenize
New step options give you more control of your Data Refinery flow
Data Refinery introduces new options for the steps that give you greater flexibility and control of the Data Refinery flow:
  • Duplicate
  • Insert step before
  • Insert step after

You can access these options from the Steps pane. For information about the actions you can do with steps, see Managing Data Refinery flows.

New Spark 3.2 environment for running Data Refinery flow jobs
You can now select Default Spark 3.2 & R 3.6 when you select an environment for a Data Refinery flow job. The environment includes enhancements from Spark.

The Default Spark 3.0 & R 3.6 environment is deprecated.

For details, see Data Refinery environments.

Version 4.5.0 of the Data Refinery service includes various fixes.

For details, see What's new and changed in Data Refinery.

Related documentation:
Data Refinery
DataStage 4.5.0

The 4.5.0 release of DataStage includes the following features and updates:

Connect to more data sources in DataStage
You can now include data from these data sources in your DataStage flows:
  • IBM Match 360
  • SAP Bulk Extract
  • SAP Delta Extract

For the full list of connectors, see DataStage connectors.

Preview and add metadata faster for Apache Kafka and Netezza® Performance Server (optimized) connectors
After you create the connection, you can drag the Asset browser to the DataStage canvas, select a connection and drill down to add or preview the data for these connectors:
  • Apache Kafka (Preview is available only if the connection has a schema registry configured.)
  • Netezza Performance Server (optimized)
Use new stages in DataStage
You can now use the following stages in your DataStage flows:
Two-source Match
Compare two sources of input data (reference records and data records) for matches.
Wrapper
Define a Wrapped stage to specify a UNIX command that is run by another DataStage stage.

For the full list of stages, see DataStage stages.

Orchestrate flows with Watson Studio Pipelines
Beta feature You can now create a pipeline to run a sequence of DataStage flows. You can add conditions, loops, expressions, and scripts to a pipeline. For details, see Orchestrating flows.

This component is offered as a beta feature and must be installed separately. For details, see Watson Studio Pipelines.

Version 4.5.0 of the DataStage service includes various fixes.

For details, see What's new and changed in DataStage.

Related documentation:
DataStage
Data Virtualization 1.8.0
Version 1.8.0 of the Data Virtualization service includes the following features and updates.
Upgrade to Cloud Pak for Data version 4.5 (Data Virtualization 1.8.0)
You can upgrade Data Virtualization from the following Cloud Pak for Data versions to Cloud Pak for Data version 4.5.
Back up and restore Data Virtualization
You can use the Cloud Pak for Data backup and restore utilities to take frequent online backups of Data Virtualization without sacrificing productivity. Or you can put Cloud Pak for Data in quiesce mode to consistently back up Data Virtualization while your cluster is offline.

For more information, see Backing up and restoring Cloud Pak for Data.

Quickly find and virtualize tables with the Explore tab
You can now quickly find the tables that you want to virtualize. On the Virtualize page, you can use the Explore tab to browse through databases, schemas, and available tables in a connected data source. The List tab displays all of the available tables in all of your connected data sources. On the Data sources page, you can filter your data sources to quickly load the reduced list of available tables in the List tab.
Screenshot of the Explore view on the Virtualize page that shows a table select and ready to be added to the cart.

For more information, see Creating virtual objects in Data Virtualization.

Improve statistics collection for virtualized tables by using data sampling

Data sampling improves statistics collection by reducing the resources that you need to collect statistics. When you collect statistics by selecting the Remote query collection method in the web client, a default sampling rate of 20% is used. To optimize statistics collection, select Enable table sampling and choose a sampling rate between 1% and 99%.

If you collect statistics by using the DVSYS.COLLECT_STATISTICS procedure, you can use the TABLESAMPLE option with the remote-query statistics collection type to sample data when you collect statistics. For tips, see Usage notes.

You can also use the DVSYS.COLLECT_STATISTICS procedure to collect statistics for virtualized tables over flat files. For more information, see the COLLECT_STATISTICS stored procedure in Data Virtualization.

Virtualize files with column headers in cloud object storage
You can now virtualize flat files in cloud object storage that contain column headers.

For more information, see Creating a virtualized table from files in cloud object storage in Data Virtualization.

Manage access for multiple groups if you are an Admin
As a Data Virtualization Admin, you can now grant and revoke access for multiple users, groups, and roles at the same time.

For more information, see Managing access to virtual objects in Data Virtualization.

Filter rows in virtualized data based on data protection rules in Watson Knowledge Catalog
Data Virtualization supports masking columns in virtualized data based on data protection rules that are defined in Watson Knowledge Catalog. Now, you can create data protection rules to include or exclude rows in your virtualized data to avoid exposing sensitive data.

For more information, see Governing virtual data with data protection rules in Data Virtualization and Designing data protection rules.

Improve query performance and enforcement of data protection rules
Data Virtualization now stores and caches data protection rules from Watson Knowledge Catalog in a policy enforcement point cache to avoid evaluating rules every time an object is queried. This cache improves the performance of previously executed queries by reducing the number of calls to Watson Knowledge Catalog to fetch the rules. However, you might notice a delay of up to 10 minutes before newly added or updated data protection rules are applied to queries.

For more information, see Enabling enforcement of data protection rules in Data Virtualization.

Manage metadata for Data Virtualization assets with metadata enrichment
Metadata enrichment helps you find data faster, trust your data, and protect your data. Metadata includes terms that define the meaning of the data, rules that document ownership, and quality standards.

For more information, see Managing metadata enrichment.

Support for predicate pushdown on more data sources
Predicate pushdown is an optimization that reduces query times and memory usage. The following data sources now support pushdown of predicates: MySQL (My SQL Community Edition and My SQL Enterprise Edition), Cloudera Impala, and Data Virtualization Manager for z/OS.

The following enhanced pushdown capabilities have also been implemented on more SQL patterns to improve query performance.

  • SQL statements with LIKE predicates are now pushed down for: Db2, SAP HANA, Oracle, PostgreSQL, Apache Hive, MySQL, Microsoft SQL Server, Snowflake, Netezza Performance Server, and Teradata.
  • SQL statements with Fetch clauses are now pushed down for: Db2, Db2 for z/OS, Apache Derby, Oracle, Amazon Redshift, Google BigQuery, and Salesforce.com data sources.
  • SQL statements with a string comparison filter are now pushed down for: Db2, Microsoft SQL Server, Teradata, Netezza Performance Server, and Apache Derby data sources.
  • SQL statements with OLAP functions are now pushed down for: Db2 and Netezza Performance Server data sources.

Version 1.8.0 of the Data Virtualization service includes various fixes.

For details, see What's new and changed in Data Virtualization.

Related documentation:
Data Virtualization
Db2 4.5.0

The 4.5.0 release of Db2 includes the following features and updates:

Support for external vaults to store secrets
You can use vaults and secrets to securely store and access credentials to connect to your Db2 data sources. For details, see Managing secrets and vaults.

Version 4.5.0 of the Db2 service includes various fixes.

Related documentation:
Db2
Db2 Big SQL 7.3.0

The 7.3.0 release of Db2 Big SQL includes the following features and updates:

Refresh credentials while the Db2 Big SQL instance is running
You can now refresh object store connection credentials (access and secret HMAC keys) without having to restart the Db2 Big SQL instance. For more information, see Configuring, monitoring, and managing access to Db2 Big SQL instances.

Version 7.3.0 of the Db2 Big SQL service includes various fixes.

Related documentation:
Db2 Big SQL
Db2 Data Gate 2.5.0
Version 2.5.0 of the Db2 Data Gate service includes the following features and updates:
Metadata for Db2 Data Gate tables can now be published to Watson Knowledge Catalog
You can now make metadata about tables and its sources available on the Cloud Pak for Data platform. This metadata will now be easier for team members to find and access and to evaluate for analyses.

To learn more, see Publishing table metadata to Watson Knowledge Catalog.

Query acceleration feature
Now you can route your analytical Db2 for z/OS queries to a Db2 Warehouse target database on Cloud Pak for Data and at the same time accelerate these queries. Routing the queries via Db2 Data Gate to Db2 Warehouse shifts the query workload and saves z/OS processing resources. And in most cases, the acceleration feature returns the query results much faster than Db2 for z/OS.

To learn more, see Query acceleration.

Unified stored procedures for Db2 Data Gate and IBM Db2 Analytics Accelerator for z/OS
Both products now share and use the same set of stored procedures. Now you can maintain the environments where the two products coexist more easily.

To learn more, see Db2 Data Gate stored procedures.

Version 2.5.0 of the Db2 Data Gate service includes various fixes.

For details, see What's new and changed in Db2 Data Gate.

Related documentation:
Db2 Data Gate
Db2 Data Management Console 4.5.0

The 4.5.0 release of Db2 Data Management Console includes the following features and updates:

Share jobs with other users and groups
Users who own a job with the job owner privilege can share the job with other users or groups that have either the job owner or job viewer privilege based on the credential type that is defined in the job.
  • If you have job owner privilege, you can copy, edit, delete, update, control access, run, and view the history of the job.
  • If you have the job viewer privilege, you can copy, run, and view the history of the job.
Data pruning
Db2 Data Management Console now supports data pruning. You can now retrieve or modify the pruning rule settings for the job and report history data in the respective pages. In the data pruning setting page, you can select one of the following pruning rules for data pruning:
Prune by status
Prune based on success and failed execution status of a job or report.
Prune by schedule
Prune based on schedule and on-demand types of a job or report.
Prune by the number of records
Prune based on the number of records and the runs of each job.
Disable pruning
Disable data pruning.
Query tuning support
Db2 Data Management Console now supports query tuning for Db2 and Db2 Warehouse databases.
Query tuning provides the following advisors:
Statistics advisor
Provides recommendations to use RUNSTATS command for collecting the statistics.
Index advisor
Provides recommendations to create valuable indexes for improving the query performance.

With the query tuning feature, you can create tuning tasks for a single query or a workload. When the tuning task is completed, you can review and apply the recommendations from the advisor to improve the query performance. You can also view the results to check the access path graph of the query and analyze the query performance data.

For more information, see Tuning.

Manage table spaces and buffer pools
You can now create and manage table spaces and buffer pools to organize storage data and improve service performance.
SQL editor enhancements
Db2 Data Management Console now supports a beta version of the SQL editor that includes a database object tree view. When you write a query, the SQL editor helps to quickly find a target table and other table-like objects. It can also generate DDL or get column details to help you complete the query. You can switch to the classic version of SQL editor as needed.
Enhancements to performance reports
Database performance reports now include the following key performance indicators (KPIs):
  • Transaction commits per minute
  • Transaction rollback per minute
  • Rows read per minute
  • Rows returned per minute
  • Rows modified per minute
  • Rows read per fetched row (rows read / rows returned)
  • Logical reads per minute
  • Direct reads per minute
  • Direct writes per minute
  • Lock wait time
  • Lock timeouts
  • Deadlocks
  • Lock escalations
  • Other wait time breakdown
  • I/O time breakdown
  • Other processing time breakdown
  • Sort information
    • Sorts per minute
    • Sorts per transaction
    • Sort time (milliseconds)
  • Log information
    • Buffer wait time (milliseconds)
    • Disk wait time (milliseconds)
  • Buffer pool information
  • Table space information
  • SQL execution-time breakdown
  • Operating system time breakdown

Version 4.5.0 of the Db2 Data Management Console service includes various fixes.

For details, see What's new and changed in Db2 Data Management Console.

Related documentation:
Db2 Data Management Console
Db2 Warehouse 4.5.0

The 4.5.0 release of Db2 Warehouse includes the following features and updates:

Support for external vaults to store secrets
You can use vaults and secrets to securely store and access credentials to connect to your Db2 Warehouse data sources. For details, see Managing secrets and vaults.

Version 4.5.0 of the Db2 Warehouse service includes various fixes.

Related documentation:
Db2 Warehouse
Decision Optimization 5.0.0

The 5.0.0 release of Decision Optimization includes the following features and updates:

Use improved runtimes
CPLEX 22.1 with its new Decision Optimization runtime do_22.1 is now available. CPLEX 20.1 remains the default runtime.

CPLEX 12.9 and 12.10 are now removed and their equivalent do_12.9 and do_12.10 runtimes are no longer supported.

Important: Before you upgrade to Decision Optimization 5.0.0, ensure that all your existing deployments use do_20.1 by using one of the following options:

Any outdated deployments will be removed during the upgrade.

Easily configure the environment for your Decision Optimization experiment
When you build models in an experiment in the Build model view, the Run parameters pane now contains an Environment tab. From the Environments tab, you can see the default run environment that is used for the solve.

You can also create environments by using the Environment tab in the Information pane in the Overview.

Environment tab for Decision Optimization

For details, see Configuring environments..

Run more than one scenario at a time
You can now run and delete multiple scenarios in a Decision Optimization experiment from the Overview. For details, see Decision Optimization views and scenarios.
Create custom constraints in the Decision Optimization Modeling Assistant
You can customize constraint suggestions in the Modelling Assistant if you want to express constraints beyond the predefined constraints for the given problem domains. You can also use more advanced custom constraints that use Python DOcplex. For an example of how to create custom constraints, see Advanced custom constraints.

Version 5.0.0 of the Decision Optimization service includes various fixes.

For details, see What's new and changed in Decision Optimization.

Related documentation:
Decision Optimization
EDB Postgres 13.7 and 12.11

The 13.7 and 12.11 release of EDB Postgres includes the following features and updates:

Continuous backup for EDB Postgres instances
EDB Postgres instances now support continuous backup to S3 buckets. An administrator can configure the service to enable continuous backups. For details, see Setting up continuous backup and restore for EDB Postgres.

Version 13.7 and 12.11 of the EDB Postgres service includes various fixes.

Related documentation:
EDB Postgres
Execution Engine for Apache Hadoop 4.5.0

Version 4.5.0 of the Execution Engine for Apache Hadoop service includes various fixes.

For details, see What's new and changed in Execution Engine for Apache Hadoop.

Related documentation:
Execution Engine for Apache Hadoop
IBM Match 360 1.2.86

The 1.2.86 release of IBM Match 360 includes the following features and updates:

IBM Match 360 is now available as a connected data source or target
Connect your data assets to other Cloud Pak for Data tools to transform, refine, or analyze the data before you bring it into IBM Match 360. Use the new Match 360 connection to share data betweenIBM Match 360 and the following workspaces and tools:
  • Platform assets catalog
  • Other catalogs (Watson Knowledge Catalog)
  • Data Refinery
  • DataStage

For example, you can now connect master data from IBM Match 360 to DataStage, where you can transform the data and then bring it back into IBM Match 360.

Screen shot of the DataStage flow

For details, see IBM Match 360 connection.

Improve your matching algorithm by reviewing record pairs
Review pairs of records to train the IBM Match 360 matching algorithm how to decide which records get matched into master data entities. During a pair review, a data steward compares records to determine whether they are a match.

When the pair review is complete, IBM Match 360 analyzes the responses and recommends adjustments to your matching algorithm's weights and matching thresholds. The more pairs you review, the better the tuning recommendations will be. A data engineer can then decide whether to apply the recommendations.

Screen shot of pair review statistics.

For details, see Customizing and strengthening your matching algorithm.

Define and work with relationships between your master data records
Find new connections within your master data by adding relationship information to IBM Match 360. Now you can add relationship types to your data model, and then either bulk load relationship data assets or manually define relationships between records. Explore the relationships between your records to gain new insight about your data.

For details, see Exploring relationship data.

Save and load snapshots of your master data configuration
Use configuration snapshots to create point-in-time versions of your master data configuration settings, including your data model and matching settings. Load a snapshot to return your master data configuration to a previous version, or share snapshots across service instances to ensure consistency.

For details, see Saving and loading master data configuration snapshots.

Version 1.2.86 of the IBM Match 360 service includes various fixes.

For details, see What's new and changed in IBM Match 360.

Related documentation:
IBM Match 360 with Watson
Informix 4.5.0

The 4.5.0 release of Informix includes the following features and updates:

Platform connections for Informix
If the common core services are installed, a platform connection is created automatically when you deploy an instance of Informix. You can use this connection to integrate with other data services.
Support for external vaults to store secrets
You can use vaults and secrets to securely store and access credentials to connect to your Informix data sources. For details, see Managing secrets and vaults.
Gather diagnostic and health data for service instances more easily
You can optionally include the output of the Informix must-gather tool (ifxcollect) in your Cloud Pak for Data diagnostic jobs. For details, see Gathering diagnostic information.
License Service integration
When you use the License Service to generate an audit snapshot of your use, the report includes information about your Informix use.

For more information, see Retrieving an audit snapshot in the License Service APIs.

Version 4.5.0 of the Informix service includes various fixes.

For details, see Fix list for Informix Server 14.10.xC8 release.

Related documentation:
Informix
MongoDB 4.2.6 and 4.4.0
The MongoDB service is available on Cloud Pak for Data Version 4.5. You can use the service to install the following versions of MongoDB:
  • 4.2.6
  • 4.4.0

This release of MongoDB includes various fixes.

Related documentation:
MongoDB
OpenPages 8.300.0

The 8.300.0 release of OpenPages includes the following features and updates:

Integration with AI Factsheets
Refresh or remove it. and AI Factsheets can now exchange model information to aid in the model lifecycle. Model risk teams and model validators can use the information from AI Factsheets to easily document key AI technology characteristics so they can facilitate the risk assessment and model validation processes.
New features from OpenPages with Watson 8.3
The OpenPages service includes enhancements that were introduced in OpenPages with Watson 8.300.0. You can read about these enhancements in the OpenPages with Watson new features in the OpenPages documentation.
Integration with Cognos Analytics Version 11.2.2
OpenPages integrates with the Cognos Analytics service. Version 22.0.0 of the Cognos Analytics service bundles Cognos Analytics Version 11.2.2. For more information about OpenPages integration with Cognos Analytics, see IBM Cognos Analytics 11 integration.

Version 8.300.0 of the OpenPages service includes various fixes.

Related documentation:
OpenPages
Planning Analytics 4.5.0

The 4.5.0 release of Planning Analytics includes the following features and updates:

Optionally create service instances in tethered projects
You can create one Planning Analytics service instance in the project where the Cloud Pak for Data control plane is installed.

If you want to create multiple service instances, or if you want to isolate the service instance from other workloads that are associated with Cloud Pak for Data, you can create instances of Planning Analytics in tethered projects. (You can deploy one instance in each tethered project.)

For details, see:

Version 4.5.0 of the Planning Analytics service includes various fixes.

For details, see What's new and changed in Planning Analytics.

Related documentation:
Planning Analytics
Product Master 2.0.0

The 2.0.0 release of Product Master includes the following features and updates:

Easier method for creating the service instance
You can now create the Product Master service instance from the Product Master tile in the Cloud Pak for Data Services catalog. For more information, see Provisioning an instance of Product Master.
Backup and restore
The Product Master service now supports restoring a backed-up instance to a different cluster. For details, see Backing up and restoring Cloud Pak for Data.

Version 2.0.0 of the Product Master service includes various fixes.

Related documentation:
Product Master
RStudio Server with R 3.6 4.5.0

The 4.5.0 release of RStudio Server with R 3.6 includes the following features and updates:

Load and access data from files and connections
Although you can't use the Insert to code function directly in RStudio to load and access data from files or connections, you can now generate code in a sample R notebook and then copy this code to use in your scripts in RStudio. You can run this generated code in projects with and without Git integration. See the section Loading and accessing data for the type of project you are working in under Analyzing data with RStudio.
Support for Spark 3.2
You can use Spark 3.2 in your R scripts and Shiny apps by accessing Spark 3.2 kernels programmatically.

Version 4.5.0 of the RStudio Server with R 3.6 service includes various fixes.

For details, see What's new and changed in RStudio Server with R 3.6.

Related documentation:
RStudio Server with R 3.6
SPSS Modeler 4.5.0

The 4.5.0 release of SPSS Modeler includes the following features and updates:

Updated Text Analytics Workbench
SPSS Modeler provides specialized nodes for handling text. From a Text Mining node, you can open the Text Analytics Workbench (formerly known as the Interactive Workbench). The workbench has a new design that provides more features and better usability. The documentation includes a new video and updated tutorial. For details, see Text Analytics.
Screen capture of the Text Analytics Workbench.
See which nodes will have SQL pushback
Previously, you had to run a flow to see which nodes would push back to the database. Now, you can click the SQL preview button to see which nodes will push back to the database.
Animated GIF showing the SQL preview.

This enables you to modify the flow before you run it to improve performance by moving the non-pushback operations as far downstream as possible.

Compare outputs simultaneously
You can now quickly compare outputs such as charts, tables, and model metrics to speed up the time to insight during data analysis or while evaluating challenger models.
Animated GIF showing the output comparison.
XGBoost Tree model viewer
The XGBoost Tree model viewer displays evaluation metrics, model information, feature importance, and a confusion matrix so data scientists can easily understand their model after building an XGBoost Tree model or an Auto Modeling node.
Animated GIF showing the XGBoost Tree model viewer.
Scripting enhancements
The updated scripting panel makes it easier to write scripts that automate processes in the user interface, such as imposing a specific order for running nodes in a flow. For details, see Scripting overview.
Animated GIF showing the Scripting panel.
Generate new nodes from table output
When you view table output, you can now select one or more fields, click Generate, then select a node to add to your flow.
Animated GIF showing how to generate new nodes from table output.

Version 4.5.0 of the SPSS Modeler service includes various fixes.

For details, see What's new and changed in SPSS Modeler.

Related documentation:
SPSS Modeler
Watson Assistant 4.5.0

The 4.5.0 release of Watson Assistant includes the following features and updates:

Language support improvements for Japanese and Korean
Entity recognition and intent classification for Japanese and Korean languages were updated to improve the reliability of Watson Assistant. You might notice minor differences in how Watson Assistant handles entity recognition and intent classification. The most noticeable changes are in dictionary-based or pattern-based entity matching.

It is recommended that you test your dialog skill with your current test framework to determine whether your workspace is impacted before you update your production workspace. If entity values or synonyms that previously matched no longer match, you can update the entity and add a synonym with white space between the tokens, for example:

  • Japanese: Add “見 た” as a synonym for “見た”
  • Korean: Add “잘 자 요” as a synonym for “잘자요”
Assistant preview link can be disabled
The Assistant preview now includes a toggle to disable the preview link. This toggle allows you to stop access to the preview link if necessary.

Version 4.5.0 of the Watson Assistant service includes various security fixes.

Related documentation:
Watson Assistant
Watson Discovery 4.5.0
Version 4.5.0 of the Watson Discovery service includes the following features and updates:
New home page
A new home page is displayed when you start Watson Discovery. The home page gives you quick access to a product overview video and tours. You can collapse the home page welcome banner to see more projects. The home page also includes a Helpful links tab that has quick links to documentation, a community site, and other resources.
Screen capture of the welcome banner on new Watson Discovery home page.
The JSON view is improved to show numbers of elements in objects
The updated JSON view numbers the occurrences of elements in each JSON object, which makes it easier to keep track of information and to read totals at a glance. You can also use your keyboard to tab through elements in the view.
Screen capture of the new JSON view with element totals for collapsed sections.
Change to the default deployment type
The default deployment type is now Production instead of Starter (formerly Development).

Version 4.5.0 of the Watson Discovery service includes various fixes.

Related documentation:
Watson Discovery
Watson Knowledge Catalog 4.5.0

The 4.5.0 release of Watson Knowledge Catalog includes the following features and updates:

Install only what you need
The core installation includes basic functionality. To keep your footprint as small as possible, select only the components you want to use.

For details on installing the components, see Installing Watson Knowledge Catalog.

By default, the legacy features for discovery and data quality, such as legacy metadata import, automated discovery, and data quality projects, are installed with the core version of the service. You can choose to install Watson Knowledge Catalog without these features.

You can optionally add the following components to Watson Knowledge Catalog:

Data quality in analytics projects (new feature)
To modernize data quality assessment, data quality rules and underlying definitions are now available as assets in analytics projects:
  • Create data quality rules based on templates. Then, run those rules on your data to evaluate its quality.
  • Run manual checks on demand or automate your quality checks to monitor data quality changes over time.
  • Identify records in your data that do not meet the defined quality criteria and require remediation.

For details, see Managing data quality.

Knowledge graph (new features)
Enable lineage and semantic search features.
Lineage
View business data lineage in the new Lineage interface in catalogs. Quickly see where your assets come from, how they have been transformed, and where they are consumed.
View the flow of data and the activities involved.
Semantic search
You can now get better search results that are based on your intent. Assets that are semantically similar to your search phrase are returned. When your search phrase contains the name or abbreviation of a business term, the search follows business term relationships to return related terms and their associated assets.
Advanced metadata import and MANTA Automated Data Lineage for IBM Cloud Pak for Data (new feature)
Advanced metadata import can be installed with your IBM Cloud Pak for Data base license. With this option, you can import assets and their lineage from data sources such as Microsoft Power BI, Tableau, and Snowflake. For details, see Metadata import.

In addition, you can purchase a separate license for the MANTA Automated Data Lineage for IBM Cloud Pak for Data service. The license entitles you to import lineage information and gives you access to additional lineage details, such as technical data lineage, historical data lineage, and indirect data lineage in MANTA Automated Data Lineage for IBM Cloud Pak for Data.

For details, see the MANTA Automated Data Lineage for IBM Cloud Pak for Data documentation.

AI Factsheets for tracking models in a model inventory (new feature)
Use a model inventory to create model entries that track a model from the request stage to deployment and evaluation. With a model inventory, you can:
  • View details about models and deployments in AI Factsheets.
  • Keep model development and deployment transparent and in compliance with your organizational policies.
  • Integrate with OpenPages for seamless model governance.

For details, see Tracking models in an model inventory.

New data connections are available
You can now create connections for the following data sources:
Boost your productivity with metadata enrichment
As a data steward, you can now more efficiently provide business context to data and ensure the quality, usefulness, and appropriate protection of the data. Metadata enrichment is now available in analytics projects and works with standard connection types supported by metadata import in Watson Knowledge Catalog.

With metadata enrichment, you can:

  • Run profiling with automatic data class assignment.
  • Have business terms suggested or automatically assigned based on data class, name matching, and machine learning (one model per project) and adjust those assignments manually.
  • Evaluate data quality based a set of predefined quality checks.
  • Analyze and enrich only changed data in subsequent enrichment runs.
  • Schedule and automate enrichment tasks.
  • Publish results to a catalog.

For details, see Metadata enrichment.

Metadata enrichment in conjunction with metadata import in projects replaces the data discovery quick scan. The new workflow for discovering, enriching, and publishing large numbers of data assets is now more aligned with the individual user roles involved in the process. For details, see:

You can use the IBM Watson Data API to integrate metadata enrichment and metadata import with external tools and workflows.

The View information assets and the Manage information assets permissions are not included in predefined roles
The View information assets and Manage information assets permissions provide access to the Information assets page. In new installations, these permissions are no longer included in the permission set of any of the predefined user roles. However, you can still assign them manually.

If you are upgrading to Cloud Pak for Data Version 4.5, the existing permission sets remain unchanged.

For details, see Predefined permissions and roles in Cloud Pak for Data.

Access the Knowledge Accelerators curated glossaries
You can now add Knowledge Accelerators to your governance framework to accelerate your data discovery and build up your business vocabulary.

Knowledge Accelerators help organize data with industry-specific business vocabularies, which provide business context and definitions that help describe your data assets within Watson Knowledge Catalog.

The Knowledge Accelerator for Cross Industry Personal Data Business Scope in Watson Knowledge Catalog.
Each Knowledge Accelerator is an extensive business vocabulary, which covers the full breadth of the following industries:
  • Energy & Utilities
  • Financial Services
  • Healthcare
  • Insurance
The Knowledge Accelerators also include Business Scopes, which are an extracted and focused set of terms to address specific business topics. You can import multiple Business Scopes to gradually build up your vocabulary. For details, see Knowledge Accelerators.
Add relationships between assets more easily
When you add a relationship between assets in a catalog, you can now easily find the target asset:
  • You can filter by the workspace (catalog, project, or deployment space) or by the asset type.
  • You can search for assets by name.

After you add a relationship, it appears in the Related assets section on the asset page.

Creating an asset relationship by searching for an asset using workspaces and asset types
Take control of reporting synchronization
You can pause the synchronization of Watson Knowledge Catalog data into the reporting data mart when interruptions occur instead of stopping the synchronization completely.
New API capabilities and behaviors
The IBM Watson Data assets API for assigning roles includes the following improvements:
  • You can assign user groups as asset members in bulk.
  • You can specify asset editor and asset viewer roles when you assign asset members.
  • You can assign multiple asset owners and an asset creator to an asset.
  • When you add an asset to a project or publish or promote an asset, you become the asset creator and the list of asset owners in the source asset is preserved in the target asset.
New home for asset activities
In catalogs and projects, information about asset activities is now available in a side panel. Open an asset in a catalog or a project, and access its activities by clicking Activities icon.. For details, see Activities.
Expanded reporting experience
You can now get reports on data quality rules, data quality problems, custom attributes in governance artifacts, and custom category roles. For details, see Setting up reporting for Watson Knowledge Catalog.
Lock down your data
You can now specify a new type of data access convention for data protection rules. Previously, the data access convention allowed access to data unless prevented by a rule. Now you can choose to deny access to data by default unless a rule allows access. For details, see Managing rule conventions.
Filter rows with data protection rules
You can now specify that the action for a data protection rule filters rows from the affected data asset. You can include or exclude rows based on values in a specified column in the same asset or in a reference asset. For details, see Filtering rows.
Glossary improvements
Performance improvements:
  • The Glossary UI is now faster to use.
  • Exporting glossary artifacts now takes seconds instead of minutes.
  • Importing glossary artifacts is up to 6-times faster for large files and up to 2-times faster for small files.
  • Synchronizing to Global/Semantic Search after you publish and import zip files is now almost 10 times faster.
  • Sending events to OMRS after you  publish and import zip files is now 10 times faster.
  • Long running operations on large number of artifacts are now faster and more robust.
New features:
  • The new REST API is available to check the status of your glossary import.
  • Two new reference data sets are now available to support Data location rules: Physical Locations and Sovereign Locations. For more information, see Predefined reference data sets.
  • Egeria 3.5 is now supported by Watson Knowledge Catalog for seamless metadata synchronization.
  • You can now export and import custom relationships with CSV and ZIP format.
  • You can now restart publish and discard operations on the same draft when previous publish operation is in progress.
  • An activities log is now generated after every import so you can audit changes.
  • You can now use semantic search and business data lineage with all artifact types.

Version 4.5.0 of the Watson Knowledge Catalog service includes various fixes.

For details, see What's new and changed in Watson Knowledge Catalog.

Related documentation:
Watson Knowledge Catalog
Watson Machine Learning 4.5.0

The 4.5.0 release of Watson Machine Learning includes the following features and updates:

Track models in Watson Knowledge Catalog
Now you can track models in a Watson Knowledge Catalog model inventory to monitor progress of a model and related assets throughout the AI lifecycle.
Use AutoAI enhancements for more robust model training
New AutoAI enhancements are available to help you train more reliable, trustworthy machine learning models. Now you can:
  • Train and deploy fully-supported time series models to forecast future values in a single-variate or multi-variate time series.
  • Save a time series experiment as a Python notebook and run it with the Watson Machine Learning Python client.
  • Save a generated model pipeline in a time series experiment as a Python notebook, and run it inside or outside of Watson Machine Learning.
  • Evaluate fairness as part of model training so you can detect potential bias for attributes such as gender, age, or race.
  • Fill in missing values in your training data by using data imputation methods.
  • Upload a user-defined holdout dataset to evaluate the generated model pipelines.
Automate model training and deployment with Watson Studio Pipelines
Tech preview Use Watson Studio Pipelines to automate an end-to-end flow to prepare data, train and deploy a model, and update deployment details. This component is offered as technical preview and must be installed separately. For details, see Watson Studio Pipelines.
Import Git assets into a deployment space
You can easily move assets from a Git-based project to a deployment space by creating a Git archive file in your Git provider's user interface. (The archive file is a ZIP file that contains the contents of your repository from a particular branch or tag.) You can then import this ZIP file to an existing deployment space. This process creates a Code Package asset with all the associated code files. For details, see Importing spaces and projects into existing deployment spaces.
Deploy Shiny apps more easily
You can now save a Shiny app or a code package that contains your Shiny app to a deployment space. Then you can deploy it as an app and make the URL available to users. For details, see Deploying a Shiny app.
Use the latest frameworks and software specifications
Support is available for frameworks and software specifications based on Spark 3.2 and Python 3.9. For details, see Supported machine learning frameworks.
Improved Federated Learning party connector script
Federated Learning now supports new enhancements to the party connector script. You no longer need to download the script or update the aggregator ID when you re-run an experiment. All parameters except the data set path are retrieved automatically.
Tensorflow 2.7.2 is now supported
Federated Learning now supports Tensorflow 2.7.2.

Tensorflow 2.7.1 is deprecated, and support will be discontinued in a future release.

Version 4.5.0 of the Watson Machine Learning service includes various fixes.

Related documentation:
Watson Machine Learning
Watson Machine Learning Accelerator 2.4.0
Version 2.4.0 of the Watson Machine Learning Accelerator service includes the following features and updates:
Support for new deep learning libraries
Watson Machine Learning Accelerator now includes support the following deep learning libraries:
  • TensorFlow 2.7.2
  • PyTorch 1.10.2
  • NVIDIA CUDA Toolkit 11.2.2, which supports NVIDIA Ampere 100 GPU
Support for new NVIDIA GPU Operator version
Watson Machine Learning Accelerator now includes support the following versions of the NVIDIA GPU Operator:
x86-64
  • NVIDIA GPU Operator 1.7.1 on OpenShift 4.8
  • NVIDIA GPU Operator 1.10 on OpenShift 4.10
Power®
  • Rocket Software GPU Operator 1.10 on OpenShift

    (Watson Machine Learning Accelerator is available only on Power 9 hardware).

Version 2.4.0 of the Watson Machine Learning Accelerator service includes various fixes.

Related documentation:
Watson Machine Learning Accelerator
Watson OpenScale 4.5.0

The 4.5.0 release of Watson OpenScale includes the following features and updates:

Simplified batch deployment configuration
You can use Watson OpenScale to create your model input data tables when you configure batch model deployments. For details, see Configuring the batch processor in Watson OpenScale.
Support for running jobs against a Kerberized Hive
You can run Watson OpenScale Spark jobs with IBM Analytics Engine Powered by Apache Spark against a Kerberized Hive store.
Support for custom monitors
You can configure custom monitors within Watson OpenScale and set up schedules for the monitors. The Watson OpenScale evaluation report includes metrics for the custom monitors. For details, see Creating custom evaluations and metrics.
Integration with AI Factsheets
You can publish Watson OpenScale metrics to AI Factsheets to help track and validate your entire model development lifecycle.
Support for headless batch subscriptions
To increase security, you can configure model evaluations without providing a model endpoint.
Evaluate text and image models
You can now also evaluate text and image models using the model risk management feature in Watson OpenScale.
IBM Analytics Engine Powered by Apache Spark scheduling improvements.
When you submit Watson OpenScale Spark jobs to IBM Analytics Engine Powered by Apache Spark, you can optionally schedule the jobs to run later. For example, you might want to run the jobs when you have more resources available.
Display all evaluations during a specific time period
You can now see all of the monitor evaluations that occur during a time period that you specify.

Version 4.5.0 of the Watson OpenScale service includes various fixes.

For details, see What's new and changed in Watson OpenScale.

Related documentation:
Watson OpenScale
Watson Speech services 4.5.0

The 4.5.0 release of the Watson Speech services includes various features and enhancements, such as language updates.

For a list of new features in Watson Speech services, see:

Version 4.5.0 of the Watson Speech services includes various fixes.

Related documentation:
Watson Speech services
Watson Studio 4.5.0

The 4.5.0 release of Watson Studio includes the following features and updates:

New data connections are available
You can now create connections for the following data sources:
Support for Apache Spark 3.2 in Jupyter Notebooks and JupyterLab
You can start using Apache Spark 3.2 to run your notebooks and scripts. Spark 3.2 is supported with Python 3.9, R 3.6 and Scala 2.12.
Access more data sources by using the Insert to code function in notebooks
You can now access and load data in notebooks by using the Insert to code function for the following data sources:
File types
  • Excel .xlsm files
Connection types
  • Amazon Redshift
  • Amazon S3
  • Apache Derby
  • IBM Cloud Compose for MySQL
  • IBM Data Virtualization Manager for z/OS
  • Looker
  • MinIO

The generated code for all of the new file and connection types uses the Flight service API.

Real-time logs when running notebook and script jobs
When you run notebooks or scripts as a job, the log on the Job details page is now updated for each cell after the cell runs. You don't have to wait until the entire job finishes running before you can see the output. Also, you can see the number of the cell that just ran, which gives you feedback on the progress of code execution.
Screen capture of the real-time log showing the progress of the code execution
Import project into existing deployment space
You can move assets from a project with default Git integration to a deployment space by creating a Git archive file and importing this ZIP file to an existing deployment space. For details, see:
Visualize your data with Dataview visualizations

Now you can use Dataview visualizations to explore data from different perspectives so you can identify patterns, connections, and relationships to quickly understand large amounts of information.

To create and work with visualizations in your project, you select a data asset from the Assets tab and click the Visualization tab. Select a chart type and create and save the visualization. Your saved Dataview visualizations are listed as Visualization assets in your project. Graphical charts are generated based on a sample data set of up to 5000 records.

For details, see Visualizing your data in Data Refinery.

Version 4.5.0 of the Watson Studio service includes various fixes.

For details, see What's new and changed in Watson Studio.

Related documentation:
Watson Studio
Watson Studio Runtimes 5.0.0

The 5.0.0 release of Watson Studio Runtimes includes the following features and updates:

Support for Apache Spark 3.2 in Jupyter Notebooks and JupyterLab
You can start using Apache Spark 3.2 to run your notebooks and scripts. Spark 3.2 is supported with Python 3.9, R 3.6 and Scala 2.12.
Access more data sources by using the Insert to code function in notebooks
You can now access and load data in notebooks by using the Insert to code function for the following data sources:
File types
  • Excel .xlsm files
Connection types
  • Amazon Redshift
  • Amazon S3
  • Apache Derby
  • IBM Cloud Compose for MySQL
  • IBM Data Virtualization Manager for z/OS
  • Looker
  • MinIO

The generated code for all of the new file and connection types uses the Flight service API.

Real-time logs when running notebook and script jobs
When you run notebooks or scripts as a job, the log on the Job details page is now updated for each cell after the cell runs. You don't have to wait until the entire job finishes running before you can see the output. Also, you can see the number of the cell that just ran, which gives you feedback on the progress of code execution.
Screen capture of the real-time log showing the progress of the code execution

Version 5.0.0 of the Watson Studio Runtimes service includes various fixes.

For details, see What's new and changed in Watson Studio Runtimes.

Related documentation:
Jupyter Notebook runtimes for Watson Studio

Installation enhancements

What's new What does it mean for me?
Red Hat OpenShift Container Platform support
You can deploy Cloud Pak for Data Version 4.5 on the following versions of Red Hat OpenShift Container Platform:
  • Version 4.6.29 or later fixes
  • Version 4.8.0 or later fixes
  • Version 4.10.0 or later fixes
Installation and upgrade simplification
The new cpd-cli manage commands simplify the process of installing and upgrading Cloud Pak for Data. For example, with the cpd-cli manage commands, you can:
  • Optionally install or upgrade all of the required components and services at the same time.
  • Create the required catalog source and operator subscriptions with a single command
  • Validate that you have access to the required images in the IBM Entitled Registry.
  • Validate that the images were mirrored to your private container registry.
  • Adjust the CRI-O and kernel parameter settings on your cluster.
  • Update the global image pull secret.
  • Update the route to the platform
  • Create the recommended storage classes for Portworx and NFS

Removals and deprecations

What's changed What does it mean for me?
Support for the Cloud Pak for Data volume backup and restore utility
It is recommended that you use the Cloud Pak for Data OADP backup and restore utility to back up and restore Cloud Pak for Data.
Submitting a request for data
The data requests (Data > Data requests) feature is deprecated and will be removed in a future release. You should consider using workflows instead.

New partner services

IBM Cloud Pak for Data partners with third-party vendors to provide additional services that you can use to extract meaningful insights from your mountains of data. You can find these partner services in the IBM Cloud Pak for Data Community: Partners catalog.

The following partner services were added in September 2022:
Partner service Category What does it mean for me?
Cloudera DataFlow Developer tools Use universal data distribution tools to connect data from any source to any destination.
IBM Data Privacy Risk Assessment for Cloud Pak for Data Data governance Automate privacy risk analysis to mitigate risk and extract maximum value from privacy-protected data.

Previous releases

Looking for information about previous releases? See the following topics in IBM Documentation: