Feature differences between Cloud Pak for Data deployments

Cloud Pak for Data as a Service and Cloud Pak for Data software have some differences in features and implementation. Cloud Pak for Data as a Service is a set of IBM Cloud services. Cloud Pak for Data 5.2 is offered as software that you must install and maintain on IBM Software Hub. Services that are available on both deployments also have differences in features on Cloud Pak for Data as a Service compared to Cloud Pak for Data 5.2, 5.1, and 5.0.

Platform differences
Common features across services
watsonx.ai Studio compared to Watson Studio
watsonx.ai Runtime compared to Watson Machine Learning
watsonx.governance
IBM Knowledge Catalog
DataStage
Data Virtualization

Platform differences

Cloud Pak for Data as a Service and Cloud Pak for Data software share a common code base, however, they differ in the following key ways:

Platform differences
Features	As a service	Software
Software, hardware, and installation	Cloud Pak for Data as a Service is fully managed by IBM on IBM Cloud. Software updates are automatic. Scaling of compute resources and storage is automatic. You sign up at Try Cloud Pak for Data as a Service.	You provide and maintain hardware. You install, maintain, and upgrade the software. See Software requirements.
Storage	You provision a IBM Cloud Object Storage service instance to provide storage. See IBM Cloud Object Storage.	You provide persistent storage on a Red Hat OpenShift cluster. See Storage requirements.
Compute resources for running workloads	Users choose the appropriate runtime for their jobs. Compute usage is billed based on the rate for the runtime environment and the duration of the job. See Monitor account resource usage.	You set up the number of Red Hat OpenShift nodes with the appropriate number of vCPUs. See Hardware requirements and Monitoring the platform.
Cost	You buy each service that you need at the appropriate plan level. Many services bill for compute resource consumption. See each service page in the IBM Cloud catalog or in the services catalog on Cloud Pak for Data as a Service, by selecting Services > Services catalog from the navigation menu.	You buy a software license based on the services that you need. For example, the Cloud Pak for Data Enterprise Edition license includes entitlement to services such as watsonx.ai Studio and IBM Knowledge Catalog. See Licenses and entitlements.
Security, compliance, and isolation	The data security, network security, security standards compliance, and isolation of Cloud Pak for Data as a Service are managed by IBM Cloud. You can set up extra security and encryption options. See Security of Cloud Pak for Data as a Service.	Red Hat OpenShift Container Platform provides basic security features. Cloud Pak for Data is assessed for various Privacy and Compliance regulations and provides features that you can use in preparation for various privacy and compliance assessments. You are responsible for additional security features, encryption, and network isolation. See Security considerations.
Available services	Most data fabric services are available in both deployment environments. See Services for Cloud Pak for Data as a Service.	Includes many other services. See Services.
User management	You add users and user groups and manage their account roles and permissions with IBM Cloud Identity and Access Management. See Add users to the account. You can also set up SAML federation on IBM Cloud. See IBM Cloud docs: How IBM Cloud IAM works.	You can add users and create user groups from the Administration menu. You can use the Identity and Access Management Service or use your existing SAML SSO or LDAP provider for identity and password management. You can create dynamic, attribute-based user groups. See User management.

Common core functionality across services

The following core functionality that is provided with the platform is effectively the same for services on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.2, 5.1, and 5.0:

Intelligent search for assets and artifacts across the platform
The Platform assets catalog for sharing connections across the platform
Role-based user management within collaborative workspaces across the platform
Common infrastructure for assets and workspaces
A services catalog for adding services
View compute usage from the Administration menu

The following table describes differences in core functionality across services between Cloud Pak for Data as a Service and Cloud Pak for Data software versions 5.2, 5.1, and 5.0.

Differences in common features across services
Feature	As a service	Software
Manage all projects	Users with the Manage projects permission from the IAM service access Manager role for the IBM Cloud Pak for Data service can join any project with the Admin role and then manage or delete the project.	Users with the Manage projects permission can join any project with the Admin role and then manage or delete the project.
Connections to remote data sources	Most supported data sources are common to both deployment environments. See Connectors.	See Supported data sources.
Connection credentials that are personal or shared	Connections in projects and catalogs can require personal credentials or allow shared credentials. Shared credentials can be disabled at the account level.	Platform connections can require personal credentials or allow shared credentials. Shared credentials can be disabled at the platform level.
Connection credentials from secrets in a vault	Not available	Available
Kerberos authentication	Not available	Available for some services and connections
Sample assets and projects from the Resource hub app	Available	Not available
Custom JDBC connector	Not available	Available
Data source definitions	Available	Available starting in 5.0. See Data protection with data source definitions.

IBM watsonx.ai Studio compared to Watson Studio

The following watsonx.ai Studio features on Cloud Pak for Data as a Service are effectively the same as the Watson Studio features on Cloud Pak for Data software, versions 5.2, 5.1, and 5.0:

Collaboration in projects and deployment spaces
Accessing project assets programmatically
Project import and export by using a project ZIP file
Jupyter notebooks
Job scheduling
Data Refinery
Watson Natural Language Processing for Python
Generative AI capabilities

This table describes the feature differences between the watsonx.ai Studio service on the as-a-service deployment environment and the Watson Studio service on the software deployment environment, the differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see watsonx.ai Studio offering plans.

Differences in watsonx.ai Studio
Feature	As a service	Software
Create project	Create: • An empty project • A project from a sample in the Resource hub • A project from file	Create: • An empty project • A project from file • A project with Git integration
Git integration	• Publish notebooks on GitHub • Publish notebooks as gist	• Integrate a project with Git • sync assets to repository in one project and use those assets into another project
Project terminal for advanced Git operations	Not available	Available in projects with default Git integration
JupyterLab	Not available	Available in projects with Git integration
Visual Studio Code integration	Not available	Available in projects with Git integration
RStudio	Cannot integrate with Git	Can integrate with Git. Requires an RStudio Server Runtimes service.
Python scripts	Not available	Work with Python scripts in JupyterLab. Requires a Watson Studio Runtimes service.
Load data to a notebook by using code (Flight service)	Not available	Available
Manage notebook lifecycle	Not available	Use CPDCTL for notebook lifecycle management
Code package assets (set of dependent files in a folder structure)	Not available	Use CPDCTL to create code package assets in a deployment space
Promote notebooks to spaces	Not available	Available manually from the project's Assets page or programmatically by using CPDCTL
Python with GPU	Support available for a single GPU type only	Support available for multiple Nvidia GPU types. Requires a Watson Studio Runtimes service.
Create and use custom images	Not available	Create custom images for Python (with and without GPU), R, JupyterLab (with and without GPU), RStudio, and SPSS environments. Requires a Watson Studio Runtimes and other applicable services.
Anaconda Repository	Not available	Use to create custom environments and custom images
Hadoop integration	Not available	Build and train models, and run Data Refinery flows on a Hadoop cluster. Requires the Execution Engine for Apache Hadoop service.
Decision Optimization	Available	Requires the Decision Optimization service.
SPSS Modeler	Available	Requires the SPSS Modeler service.
Orchestration Pipelines	Available	Requires the Orchestration Pipelines service.

watsonx.ai Runtime compared to Watson Machine Learning

The following watsonx.ai Runtime features on on Cloud Pak for Data as a Service are effectively the same as the Watson Machine Learning features on Cloud Pak for Data software, versions 5.2, 5.1, and 5.0:

Collaboration in projects and deployment spaces
Deploy models
Deploy functions
watsonx.ai Runtime REST API and Watson Machine Learning REST API
watsonx.ai Python client
Create online deployments
Scale and update deployments
Define and use custom components
Monitor deployments across spaces
Updated forms for testing online deployment
Use nested pipelines
AutoAI data imputation
AutoAI fairness evaluation
AutoAI time series supporting features

This table describes the differences in features between the watsonx.ai Runtime service on the as-a-service deployment environment and the Watson Machine Learning service on the software deployment environment, the differences between offering plans, and whether additional services are required. For details about functionality differences between offering plans on Cloud Pak for Data as a Service, see watsonx.ai Runtime offering plans.

Feature differences between watsonx.ai Runtime deployments
Feature	As a service	Software
AutoAI training input	Current supported data sources	Supported data sources change by release
AutoAI experiment compute configuration	Different sizes available	Different sizes available
AutoAI limits on data size and number of prediction targets	Set limits	Limits differ by compute configuration
AutoAI incremental learning	Not available	Available
Deploy using popular frameworks and software specifications	Check for latest supported versions	Supported versions differ by release
Connect to databases for batch deployments	Check for support by deployment type	Check for support by deployment type and by version
Deploy and score Python scripts	Available via Python client	Create scripts in JupyterLab or Python client, then deploy
Deploy and batch score R Scripts	Not available	Available
Deploy Shiny apps	Not available	Create and deploy Shiny apps Deploy from code package
Evaluate jobs for fairness, or drift	Requires the watsonx.governance service	Requires the Watson OpenScale or watsonx.governance service
Evaluate online deployments in a space for fairness, drift or explainability	Not available	Available Requires the Watson OpenScale or watsonx.governance service
Evaluate deployed prompt templates in a space		Available
Evaluate detached prompt templates in a space	Available
Control space creation	No restrictions by role	Use permissions to control who can view and create spaces
Import from GIT project to space	Not available	Available
Code package automatically created when importing from Git project to space	Not available	Available
Update RShiny app from code package	Not available	Available
Create and use custom images	Not available	Create custom images for Python or SPSS
Notify collaborators about Pipeline events	Not available	Use Send Mail to notify collaborators
Deep Learning Experiments	Not available	Requires the IBM Scheduler service
Provision and manage IBM Cloud service instances	Add instances for watsonx.ai Runtime or Watson OpenScale	Services are provisioned on the cluster by the administrator

watsonx.governance

The following governance features are effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.2, 5.1, and 5.0:

Evaluate deployments for fairness
Evaluate the quality of deployments
Monitor deployments for drift
View and compare model results in an Insights dashboard
Add deployments from the machine learning provider of your choice
Set alerts to trigger when evaluations fall below a specified threshold
Evaluate deployments in a user interface or notebook
Custom evaluations and metrics
View details about evaluations in model factsheets

This table describes the differences in features between the watsonx.governance service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required.

Feature differences between watsonx.governance deployments
Feature	As a service	Software
Evaluate machine learning models	Yes	Yes
Upload pre-scored test data	Not available	Available
IBM SPSS Collaboration and Deployment Services	Not available	Available
Batch processing	Not available	Available
Support access control by user groups	Not available	Available
Free database and Postgres plans	Available	Postgres available

IBM Knowledge Catalog

The following features are effectively the same for IBM Knowledge Catalog on Cloud Pak for Data as a Service and on Cloud Pak for Data software, versions 5.2, 5.1, and 5.0:

Collaboration in projects and catalogs
Project import and export by using a project ZIP file
AI powered search in catalogs
Rating and reviewing assets in catalogs
Data Refinery tool in projects
Categories with collaborator roles
Predefined and custom classifications
Predefined and custom data classes
Governance rules
Policies
Data protection rules
Data quality SLA rules
Manual profiling of individual relational data assets in a project or a catalog
Automatic profiling of relational data assets added to a governed catalog
Metadata enrichment tool in projects for running profiling, term assignment, quality analysis, and key or relationship analysis on large sets of data assets
Custom asset types, custom properties for assets, and custom relationships between assets in catalogs
Monitor workflow tasks
Deliver masked data sets in projects with masking flows
Detailed data quality information for data assets in projects and catalogs, and as part of metadata enrichment results
Remediation workflows for data quality issues
Create connected data assets and segmented data assets with SQL queries
Run metrics dashboard and execution windows for metadata enrichment jobs

This table describes the differences in features between the IBM Knowledge Catalog service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see IBM Knowledge Catalog offering plans.

Starting in Cloud Pak for Data version 5.0, you can install the IBM Knowledge Catalog Premium Cartridge or the IBM Knowledge Catalog Standard Cartridge instead of the IBM Knowledge Catalog service. IBM Knowledge Catalog Premium provides the same features as the IBM Knowledge Catalog service plus generative AI features. IBM Knowledge Catalog Standard provides a subset of IBM Knowledge Catalog features plus generative AI features.

Differences in IBM Knowledge Catalog
Feature	As a service	Software
Organize assets in projects with folders	Available (beta)	Available
Metadata import tool in projects - discovery	Import data assets into projects or catalogs. Support for a subset of project and catalog connections. See Supported data sources for curation and data quality.	Import different types of assets: • Import data assets into projects or catalogs. Most supported connections are the same in both deployment environments. • Import business intelligence reports, assets with their associated transformation scripts, ETL jobs, or data models into catalogs. Requires installation of MANTA Automated Data Lineage without a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality.
Metadata import tool in projects - lineage	Available. Data lineage must be enabled. Limits per plan.	• Import lineage of data assets into catalogs. Requires installation of IBM Manta Data Lineage or MANTA Automated Data Lineage with a license key. • Capture and access lineage of ETL jobs in MANTA Automated Data Lineage. Requires installation of MANTA Automated Data Lineage with a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality.
Enhanced enrichment using generative AI	Available.	Not available. Starting in 5.0, install IBM Knowledge Catalog Premium or IBM Knowledge Catalog Standard instead.
Automatically generate and run data quality checks as part of metadata enrichment	Not available.	Available starting in 5.2.
Data quality rules in projects	Available Requires the DataStage service.	Available. Requires the DataStage service.
Add multiple assets to a catalog with a file	Not available.	Available.
Asset activities	Requires a paid plan. Available in projects and catalogs.	Available in projects and catalogs.
Business lineage	Not available	Available.
Technical data lineage	Not available.	Available. Requires that a licensed version of MANTA Automated Data Lineage for IBM Cloud Pak for Data is installed. Generated by running the metadata import tool. Can be accessed from catalogs.
Data lineage	Requires enabling.	Requires enabling and the IBM Manta Data Lineage service.
Business terms	Limits for some plans.	Available.
Predefined business terms	Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a IBM Knowledge Catalog service instance with a Lite or Standard plan after 7 October 2022.	Not available.
Reference data sets	Limits per plan.	Available.
Custom relationships for artifacts	Requires a paid plan.	Available.
Knowledge Accelerators	Requires an Enterprise plan. Download from Resource hub.	Provided with the platform.
Custom workflow configurations for governance artifacts and requests	Available for governance artifacts.	Available.
Custom category roles	Limits per plan.	Available.
Export and import data protection rules	To export data protection rules from any system and import the rules into the same system or a different system, you can use APIs. For details, see Migrating data protection rules.	To export data protection rules from any system and import the rules into the same system or a different system, you can use either APIs or cpd-cli commands. For details, see Migrating data protection rules.
Administrative reports	Requires a paid plan.	Available.
Relationship explorer	Available.	Available starting in 5.0. Requires installing the optional knowledge graph component with IBM Knowledge Catalog.

DataStage

The following table describes differences in features between DataStage on Cloud Pak for Data as a Service and DataStage on Cloud Pak for Data software, versions 5.2, 5.1, and 5.0.

Differences in DataStage
Feature	As a service	Software
PX instance management	You can provision instances from a set of pre-defined sizes. Custom sizing is available with DataStage-aaS Anywhere.	You can provision instances more flexibly by using Cloud Pak for Data Instance administration.
Job compilation	OSH is generated during compilation. Transformer is compiled at runtime.	OSH is generated during compilation. Transformer is compiled during compilation time and is made available to the `/ds-storage` mount. Compilation is done synchronously.
Job runtime	You can submit as many jobs as you want, subject to queueing.	Concurrent job runs are supported. Concurrency is determined by instance capacity and the settings in the `/px-storage/config/wlm.config.xml` file.
Asset management	For files of type .xls, .xlsx, .xml, and .json, only simple structures are supported. Multi-level/nested schemas may not be parsed.	Full support of files of type .csv, .txt, .xls, .xlsx, .xml, and .json is available.
Storage	POSIX-type file-based real storage is not available. Storage is emulated by the use of a Cloud Object Storage project bucket.	Real storage is available in `/px-storage` and `/ds-storage`. You can mount more storage into the PX-runtime pod. See Setting up an NFS mount in DataStage.
Java Integration stage	Available with DataStage-aaS Anywhere	Available
Java library component	Available with DataStage-aaS Anywhere	Available
Generic JDBC connection	Available with DataStage-aaS Anywhere	Available
Excel	Available with DataStage-aaS Anywhere	Available
AVI	Available with DataStage-aaS Anywhere	Available
External Source stage	Available with DataStage-aaS Anywhere	Available
External Target stage	Available with DataStage-aaS Anywhere	Available
Hierarchical stage	Single file or File set option for XML Parser and JSON Parser is not available. Single file, File set, and Large Object option for XML Composer and JSON Composer are not available.	Available
SMP	S, M, L are single node, SMP configuration. Use a remote runtime engine to set up an alternative configuration.	Parallel work loads are managed through logical partitions, which are configured with the APT_CONFIG_FILE option.
SAP Bulk Extract connection	Not available	Available
SAP Delta Extract connection	Not available	Available
Wrapped stage	Available with DataStage-aaS Anywhere	Available
SAP HANA connection	Not available	Available
Text data source in ODBC connection	Not available	Available
Build stage	Available with DataStage-aaS Anywhere	Available
Send reports by using before/after-job subroutines	Available with DataStage-aaS Anywhere	Available
Custom stage	Available with DataStage-aaS Anywhere	Available
Apache HBase connection	Available with DataStage-aaS Anywhere	Available
Kerberos authentication for Apache Hive connections	Not available	Available
User-defined functions	Available with DataStage-aaS Anywhere	Available
User-created APT_CONFIG_FILEs	Available with DataStage-aaS Anywhere	Available
Before/after-job properties	Available with DataStage-aaS Anywhere	Available
Data service connector	Not available	Available
Db2 database sequence in Slowly Changing Dimension stage, Surrogate Key Generator stage, and Transformer stage	Available with DataStage-aaS Anywhere	Available
Use the Apache Hive connection as a target. (Available when Use DataStage properties is selected in the connector.)	Available with DataStage-aaS Anywhere	Available
Parameterize properties with local connections	Not available	Available
Operational Decision Manager stage	Available with DataStage-aaS Anywhere	Available
Deployment spaces	Not available	Available

Data Virtualization

On Cloud Pak for Data as a Service, data virtualization functionality is provided by the Data Virtualization service. The following data virtualization functionality is effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data software, versions 5.2, 5.1, and 5.0.

Connecting to supported data sources
Virtualizing data
Governing virtual data using policies and data protection rules
Monitoring and exploring the service
Using the SQL interface
Caching
Column masking
Explore view and reloading of tables
Data sampling in statistics collection
Metadata enrichment

The following Data virtualization functionality appears to be different in the user interface but provides the same basic functionality:

This table describes the differences in features between Data Virtualization (formerly Watson Query) on Cloud Pak for Data as a Service and Data Virtualization (formerly Watson Query) on Cloud Pak for Data software.

Differences in Data Virtualization
Feature	As a service	Software
Use the Cloud Pak for Data Data Source Definitions (DSD) to enforce IBM Knowledge Catalog data protection rules	Not applicable for SaaS	Available starting in 5.0
Query data in REST API data sources	Not applicable for SaaS	Available starting in 5.0
Query tables from previous Presto and Databricks catalogs with multiple catalog support	Not applicable for SaaS	Available starting in 5.0
Automatically scale service instances	Not applicable for SaaS	Available starting in 5.0
Mask multibyte characters for enhanced privacy of sensitive data	Not applicable for SaaS	Available starting in 5.0
View the data protection rules that are applied to a user	Not applicable for SaaS	Available starting in 5.0
Enhanced security for profiling results in Data Virtualization views	Not applicable for SaaS	Available starting in 5.0
Data Virtualization connections in catalogs now reference the platform connection	Not applicable for SaaS	Available starting in 5.0
Data Virtualization connections in catalogs now reference the platform connection	Not applicable for SaaS	Available starting in 5.0
Enhanced security for the Admin role: The Admin role does not have default access to all data.	Not applicable for SaaS	Available
IBM Knowledge Catalog data protection rules are always enabled for Watson Query data	Not applicable for SaaS	Available
Secure your ungoverned objects: With IBM Knowledge Catalog data protection rules in Watson Query, virtualized objects that are not published in a governed catalog follow the Default data access convention setting from your rule settings.	Not applicable for SaaS	Available
Query Presto data: You can create a connection to Presto to access and query data in Presto.	Not applicable for SaaS	Available
Audit logging to monitor user activity and data access	Available	Available
Integration with IBM Knowledge Catalog	Required	Optional
Group-based authorization and object-level access for groups	Not available	Available
Support for remote connectors	Not applicable for SaaS	Available
Support for file system based data sources, except in Cloud Object Storage	Not applicable for SaaS	Available
Connecting to data sources that require an uploaded JDBC driver, for example, SAP HANA, Generic JDBC	Not applicable for SaaS	Available
Collecting statistics in the user interface	Not available	Available
Automatic statistics collection during object virtualization	Not available	Available
Access management for multiple groups	Not available	Available
Support for CSV or TSV files in Cloud Object Storage	Not applicable for SaaS	Available
Credentials in vaults for connections in Cloud Object Storage	Not applicable for SaaS	Available
Autocaching of queries	Not available	Available starting in 5.0.3