Feature differences between Cloud Pak for Data deployments

Cloud Pak for Data as a Service and Cloud Pak for Data software have some differences in features and implementation. Cloud Pak for Data as a Service is a set of IBM Cloud services. Cloud Pak for Data 5.0 is offered as software that you must install and maintain. Services that are available on both deployments also have differences in features on Cloud Pak for Data as a Service compared to Cloud Pak for Data 5.0, 4.8, and 4.7.

Platform differences

Cloud Pak for Data as a Service and Cloud Pak for Data software share a common code base, however, they differ in the following key ways:

Platform differences
Features As a service Software
Software, hardware, and installation Cloud Pak for Data as a Service is fully managed by IBM on IBM Cloud. Software updates are automatic. Scaling of compute resources and storage is automatic. You sign up at https://dataplatform.cloud.ibm.com. You provide and maintain hardware. You install, maintain, and upgrade the software. See Software requirements.
Storage You provision a IBM Cloud Object Storage service instance to provide storage. See IBM Cloud Object Storage. You provide persistent storage on a Red Hat OpenShift cluster. See Storage requirements.
Compute resources for running workloads Users choose the appropriate runtime for their jobs. Compute usage is billed based on the rate for the runtime environment and the duration of the job. See Monitor account resource usage. You set up the number of Red Hat OpenShift nodes with the appropriate number of vCPUs. See Hardware requirements and Monitoring the platform.
Cost You buy each service that you need at the appropriate plan level. Many services bill for compute resource consumption. See each service page in the IBM Cloud catalog or in the services catalog on Cloud Pak for Data as a Service, by selecting Services > Services catalog from the navigation menu. You buy a software license based on the services that you need. For example, the Cloud Pak for Data Enterprise Edition license includes entitlement to services such as watsonx.ai Studio or IBM Knowledge Catalog. See Cloud Pak for Data.
Security, compliance, and isolation The data security, network security, security standards compliance, and isolation of Cloud Pak for Data as a Service are managed by IBM Cloud. You can set up extra security and encryption options. See Security of Cloud Pak for Data as a Service. Red Hat OpenShift Container Platform provides basic security features. Cloud Pak for Data is assessed for various Privacy and Compliance regulations and provides features that you can use in preparation for various privacy and compliance assessments. You are responsible for additional security features, encryption, and network isolation. See Security considerations.
Available services Most data fabric services are available in both deployment environments.
See Services for Cloud Pak for Data as a Service.
Includes many other services. See Services for Cloud Pak for Data 5.0.
User management You add users and user groups and manage their account roles and permissions with IBM Cloud Identity and Access Management. See Add users to the account.
You can also set up SAML federation on IBM Cloud. See IBM Cloud docs: How IBM Cloud IAM works.
You can add users and create user groups from the Administration menu. You can use the Identity and Access Management Service or use your existing SAML SSO or LDAP provider for identity and password management. You can create dynamic, attribute-based user groups. See User management.

Common core functionality across services

The following core functionality that is provided with the platform is effectively the same for services on Cloud Pak for Data as a Service, Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

  • Global search for assets and artifacts across the platform
  • The Platform assets catalog for sharing connections across the platform
  • Role-based user management within collaborative workspaces across the platform
  • Common infrastructure for assets and workspaces
  • A services catalog for adding services
  • View compute usage from the Administration menu

The following table describes differences in core functionality across services between Cloud Pak for Data as a Service and Cloud Pak for Data software versions 5.0, 4.8, and 4.7.

Differences in common features across services
Feature As a service Software
Manage all projects Users with the Manage projects permission from the IAM service access Manager role for the IBM Cloud Pak for Data service can join any project with the Admin role and then manage or delete the project. Users with the Manage projects permission can join any project with the Admin role and then manage or delete the project.
Connections to remote data sources Most supported data sources are common to both deployment environments.
See Supported connections.
See Supported data sources.
Connection credentials that are personal or shared Connections in projects and catalogs can require personal credentials or allow shared credentials. Shared credentials can be disabled at the account level. Platform connections can require personal credentials or allow shared credentials. Shared credentials can be disabled at the platform level.
Connection credentials from secrets in a vault Not available Available
Kerberos authentication Not available Available for some services and connections
Sample assets and projects from the Resource hub app Available Not available
Custom JDBC connector Not available Available starting in 4.8.0
Data source definitions Not available Available starting in 5.0.
See Data protection with data source definitions.

watsonx.ai Studio compared to Watson Studio

The following watsonx.ai Studio features on Cloud Pak for Data as a Service are effectively the same as the Watson Studio features on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

  • Collaboration in projects and deployment spaces
  • Accessing project assets programmatically
  • Project import and export by using a project ZIP file
  • Jupyter notebooks
  • Job scheduling
  • Data Refinery
  • Watson Natural Language Processing for Python

This table describes the feature differences between the watsonx.ai Studio service on the as-a-service deployment environment and the Watson Studio service on the software deployment environment, the differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see watsonx.ai Studio offering plans.

Differences in watsonx.ai Studio
Feature As a service Software
Create project Create:
• An empty project
• A project from a sample in the Resource hub
• A project from file
Create:
• An empty project
• A project from file
• A project with Git integration
Git integration • Publish notebooks on GitHub
• Publish notebooks as gist
• Integrate a project with Git
• sync assets to repository in one project and use those assets into another project
Project terminal for advanced Git operations Not available Available in projects with default Git integration
Organize assets in projects with folders Not available Available starting with 4.8.0
JupyterLab Not available Available in projects with Git integration
Visual Studio Code integration Not available Available in projects with Git integration
RStudio Cannot integrate with Git Can integrate with Git. Requires an RStudio Server Runtimes service.
Python scripts Not available Work with Python scripts in JupyterLab. Requires a Watson Studio Runtimes service.
Generate code to load data to a notebook by using the Flight service Not available Available
Manage notebook lifecycle Not available Use CPDCTL for notebook lifecycle management
Code package assets (set of dependent files in a folder structure) Not available Use CPDCTL to create code package assets in a deployment space
Promote notebooks to spaces Not available Available manually from the project's Assets page or programmatically by using CPDCTL
Python with GPU Support available for a single GPU type only Support available for multiple Nvidia GPU types. Requires a Watson Studio Runtimes service.
Create and use custom images Not available Create custom images for Python (with and without GPU), R, JupyterLab (with and without GPU), RStudio, and SPSS environments. Requires a Watson Studio Runtimes and other applicable services.
Anaconda Repository Not available Use to create custom environments and custom images
Hadoop integration Not available Build and train models, and run Data Refinery flows on a Hadoop cluster. Requires the Execution Engine for Apache Hadoop service.
Decision Optimization Available Requires the Decision Optimization service.
SPSS Modeler Available Requires the SPSS Modeler service.
Orchestration Pipelines Available Requires the Orchestration Pipelines service.

watsonx.ai Runtime compared to Watson Machine Learning

The following watsonx.ai Runtime features on on Cloud Pak for Data as a Service are effectively the same as the Watson Machine Learning features on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

  • Collaboration in projects and deployment spaces
  • Deploy models
  • Deploy functions
  • watsonx.ai Runtime REST APIs
  • watsonx.ai Runtime Python client
  • Create online deployments
  • Scale and update deployments
  • Define and use custom components
  • Use Federated Learning to train a common model with separate and secure data sources
  • Monitor deployments across spaces
  • Updated forms for testing online deployment
  • Use nested pipelines
  • AutoAI data imputation
  • AutoAI fairness evaluation
  • AutoAI time series supporting features

This table describes the differences in features between the watsonx.ai Runtime service on the as-a-service deployment environment and the Watson Machine Learning service on the software deployment environment, the differences between offering plans, and whether additional services are required. For details about functionality differences between offering plans on Cloud Pak for Data as a Service, see watsonx.ai Runtime offering plans.

Feature As a service Software
AutoAI training input Current supported data sources Supported data sources change by release
AutoAI experiment compute configuration Different sizes available Different sizes available
AutoAI limits on data size
and number of prediction targets
Set limits Limits differ by compute configuration
AutoAI incremental learning Not available Available
Deploy using popular frameworks
and software specifications
Check for latest supported versions Supported versions differ by release
Connect to databases for batch deployments Check for support by deployment type Check for support by deployment type
and by version
Deploy and score Python scripts Available via Python client Create scripts in JupyterLab or Python client, then deploy
Deploy and batch score R Scripts Not available Available
Deploy Shiny apps Not available Create and deploy Shiny apps
Deploy from code package
Evaluate jobs for fairness, or drift Requires the watsonx.governance service Requires the Watson OpenScale or watsonx.governance service
Evaluate online deployments in a space
for fairness, drift or explainability
Not available Available starting in 4.7
Requires the Watson OpenScale or watsonx.governance service
Evaluate deployed prompt templates in a space Available
Evaluate detached prompt templates in a space Not available Available starting in 5.0
Control space creation No restrictions by role Use permissions to control who can view and create spaces
Import from GIT project to space Not available Available
Code package automatically created when importing
from Git project to space
Not available Available
Update RShiny app from code package Not available Available
Create and use custom images Not available Create custom images for Python or SPSS
Notify collaborators about Pipeline events Not available Use Send Mail to notify collaborators
Deep Learning Experiments Not available Requires the IBM Scheduler service
Provision and manage IBM Cloud service instances Add instances for watsonx.ai Runtime
or Watson OpenScale
Services are provisioned on the cluster
by the administrator
n

watsonx.governance

The following governance features are effectively the same on IBM watsonx as a Service and watsonx software on Cloud Pak for Data 4.8 and 5.0:

  • Evaluate deployments for fairness
  • Evaluate the quality of deployments
  • Monitor deployments for drift
  • View and compare model results in an Insights dashboard
  • Add deployments from the machine learning provider of your choice
  • Set alerts to trigger when evaluations fall below a specified threshold
  • Evaluate deployments in a user interface or notebook
  • Custom evaluations and metrics
  • View details about evaluations in model factsheets

This table describes the differences in features between the watsonx.governance service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required.

Feature differences between watsonx.governance deployments
Feature As a service Software
Evaluate machine learning models Yes Yes
Upload pre-scored test data Not available Available
IBM SPSS Collaboration and Deployment Services Not available Available
Batch processing Not available Available
Support access control by user groups Not available Available
Free database and Postgres plans Available Postgres available starting in 4.8

IBM Knowledge Catalog

The following features are effectively the same for IBM Knowledge Catalog on Cloud Pak for Data as a Service and on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7:

  • Collaboration in projects and catalogs
  • AI powered search and recommendations in catalogs
  • Rating and reviewing assets in catalogs
  • Data Refinery tool in projects
  • Categories with collaborator roles
  • Predefined and custom classifications
  • Predefined and custom data classes
  • Governance rules
  • Policies
  • Data protection rules
  • Manual profiling of individual relational data assets in a project or a catalog
  • Automatic profiling of relational data assets added to a governed catalog
  • Custom asset types, custom properties for assets, and custom relationships between assets in catalogs
  • Monitor workflow tasks
  • Deliver masked data sets in projects with masking flows

This table describes the differences in features between the IBM Knowledge Catalog service on the as-a-service and software deployment environments, differences between offering plans, and whether additional services are required. For more information about feature differences between offering plans on Cloud Pak for Data as a Service, see IBM Knowledge Catalog offering plans.

Starting in Cloud Pak for Data version 5.0, you can install the IBM Knowledge Catalog Premium Cartridge or the IBM Knowledge Catalog Standard Cartridge instead of the IBM Knowledge Catalog service. IBM Knowledge Catalog Premium provides the same features as the IBM Knowledge Catalog service plus generative AI features. IBM Knowledge Catalog Standard provides a subset of IBM Knowledge Catalog features plus generative AI features.

Differences in IBM Knowledge Catalog
Feature As a service Software
Metadata import tool in projects - discovery Import data assets into projects or catalogs. Support for a subset of project and catalog connections. See Supported data sources for curation and data quality. Import different types of assets:
• Import data assets into projects or catalogs. Most supported connections are the same in both deployment environments.
• Import business intelligence reports, assets with their associated transformation scripts, ETL jobs, or data models into catalogs. Requires installation of MANTA Automated Data Lineage without a license key. Support for a subset of catalog connections.

See Supported data sources for curation and data quality.
Metadata import tool in projects - lineage Not available. • Import lineage of data assets into catalogs.
• Capture and access lineage of ETL jobs in MANTA Automated Data Lineage (starting in 4.7)
Requires installation of MANTA Automated Data Lineage with a license key. Support for a subset of catalog connections. See Supported data sources for curation and data quality.
Legacy UI tools Not available. Use tools in projects instead. Not available starting in 4.7. Use tools in projects instead.
Metadata enrichment tool in projects Run profiling, term assignment, quality analysis, and key or relationship analysis on large sets of data assets. Available.
Enhanced enrichment using generative AI Available. Not available.
Starting in 5.0, install IBM Knowledge Catalog Premium or IBM Knowledge Catalog Standard instead.
Data quality scores Data quality scores are shown in:
• Data quality information for assets in projects and catalogs
• Metadata enrichment results
Data quality scores are shown in:
• Data quality information for assets in projects and catalogs
• Metadata enrichment results
• Asset profiles in projects and catalogs. Not available in 4.7 and later.
• Quick scan results with the legacy UI. Not available in 4.7 and later.
• Data quality projects with the legacy UI. Not available in 4.7 and later.
Detailed data quality information Data quality page in projects and catalogs, and as part of metadata enrichment results Available starting in 4.7.
Data quality rules in projects Available
Requires the DataStage service.
Available.
Requires the DataStage service.
Data quality SLA rules Not available. Monitor data quality and report violations. SLA compliance reports are shown on a data asset's Data quality page in projects.
Available starting in 4.7.3.
Remediation workflows for data quality issues Not available. Available starting in 4.7.3.
Add multiple assets to a catalog with a file Not available. Available starting in 4.7.3.
Asset activities Requires a paid plan.
Available in projects and catalogs.
Available in projects and catalogs.
Business lineage Not available Available.
Technical data lineage Not available Available
Requires that a licensed version of MANTA Automated Data Lineage for IBM Cloud Pak for Data is installed. Generated by running the metadata import tool. Can be accessed from catalogs.
Data lineage Requires enabling. Not available
Business terms Limits for some plans. Available.
Predefined business terms Predefined business terms and the Knowledge Accelerator Sample Personal Data category that includes them are available only if you create a IBM Knowledge Catalog service instance with a Lite or Standard plan after 7 October 2022. Not available.
Reference data sets Limits per plan. Available.
Custom relationships for artifacts Requires a paid plan. Available.
Knowledge Accelerators Requires an Enterprise plan.
Download from Resource hub.
Provided with the platform.
Custom workflow configurations for governance artifacts and requests Available for governance artifacts. Available.
Custom category roles Limits per plan. Available.
Export and import data protection rules To export data protection rules from any system and import the rules into the same system or a different system, you can use APIs. For details, see Migrating data protection rules. To export data protection rules from any system and import the rules into the same system or a different system, you can use either APIs or cpd-cli commands. For details, see Migrating data protection rules.
Administrative reports Requires a paid plan. Available.
Migrate data from InfoSphere Information Server Not available. Available starting in 4.8.
Relationship explorer Not available. Available starting in 5.0.
Requires installing the optional knowledge graph component with IBM Knowledge Catalog.
Run metrics dashboard for metadata enrichment jobs Available. Not available.

DataStage

The following table describes differences in features between DataStage on Cloud Pak for Data as a Service and DataStage on Cloud Pak for Data software, versions 5.0, 4.8, and 4.7.

Differences in DataStage
Feature As a service Software
PX instance management You can provision instances from a set of pre-defined sizes. You can provision instances more flexibly by using Cloud Pak for Data Instance administration.
Job compilation
  • OSH is generated during compilation.
  • Transformer is compiled at runtime.
  • OSH is generated during compilation.
  • Transformer is compiled during compilation time and is made available to the /ds-storage mount.
  • Compilation is done synchronously.
Job runtime You can submit as many jobs as you want, subject to queueing.
  • Concurrent job runs are supported.
  • Concurrency is determined by instance capacity and the settings in the /px-storage/config/wlm.config.xml file.
Asset management For files of type .xls, .xlsx, .xml, and .json, only simple structures are supported. Multi-level/nested schemas may not be parsed. Full support of files of type .csv, .txt, .xls, .xlsx, .xml, and .json is available.
Storage
  • POSIX-type file-based real storage is not available.
  • Storage is emulated by the use of a Cloud Object Storage project bucket.
Java Integration stage Available with DataStage-aaS Anywhere Available
Java library component Available with DataStage-aaS Anywhere Available
Generic JDBC connection Available with DataStage-aaS Anywhere Available
Excel Available with DataStage-aaS Anywhere Available
AVI Available with DataStage-aaS Anywhere Available
External Source stage Available with DataStage-aaS Anywhere Available
External Target stage Available with DataStage-aaS Anywhere Available
Hierarchical stage
  • Single file or File set option for XML Parser and JSON Parser is not available.
  • Single file, File set, and Large Object option for XML Composer and JSON Composer are not available.
Available
SMP S, M, L are single node, SMP configuration. Use a remote runtime engine to set up an alternative configuration. Parallel work loads are managed through logical partitions, which are configured with the APT_CONFIG_FILE option.
SAP Bulk Extract connection Not available Available
SAP Delta Extract connection Not available Available
Wrapped stage Available with DataStage-aaS Anywhere Available
SAP HANA connection Not available Available
Text data source in ODBC connection Not available Available
Build stage Available with DataStage-aaS Anywhere Available
Send reports by using before/after-job subroutines Available with DataStage-aaS Anywhere Available
Custom stage Available with DataStage-aaS Anywhere Available
Apache HBase connection Available with DataStage-aaS Anywhere Available
Kerberos authentication for Apache Hive connections Not available Available
User-defined functions Available with DataStage-aaS Anywhere Available
User-created APT_CONFIG_FILEs Available with DataStage-aaS Anywhere Available
Before/after-job properties Available with DataStage-aaS Anywhere Available
Data service connector Not available Available
Db2 database sequence in Slowly Changing Dimension stage, Surrogate Key Generator stage, and Transformer stage Available with DataStage-aaS Anywhere Available
Use the Apache Hive connection as a target. (Available when Use DataStage properties is selected in the connector.) Available with DataStage-aaS Anywhere Available
Parameterize properties with local connections Not available Available
Operational Decision Manager stage Available with DataStage-aaS Anywhere Available
Deployment spaces Not available Available starting in 4.7.0

Data Virtualization

On Cloud Pak for Data as a Service, data virtualization functionality is provided by the Data Virtualization service. The following data virtualization functionality is effectively the same on Cloud Pak for Data as a Service and Cloud Pak for Data 5.0, 4.8, and 4.7.

  • Connecting to supported data sources
  • Virtualizing data
  • Governing virtual data using policies and data protection rules
  • Monitoring and exploring the service
  • Using the SQL interface
  • Caching
  • Column masking
  • Explore view and reloading of tables
  • Data sampling in statistics collection
  • Metadata enrichment

The following Data virtualization functionality appears to be different in the user interface but provides the same basic functionality:

This table describes the differences in features between Data Virtualization (formerly Watson Query) on Cloud Pak for Data as a Service and Data Virtualization (formerly Watson Query) on Cloud Pak for Data software.

Differences in Data Virtualization
Feature As a service Software
Use the Cloud Pak for Data Data Source Definitions (DSD) to enforce IBM Knowledge Catalog data protection rules Not applicable for SaaS Available starting in 5.0
Query data in REST API data sources Not applicable for SaaS Available starting in 5.0
Query tables from previous Presto and Databricks catalogs with multiple catalog support Not applicable for SaaS Available starting in 5.0
Automatically scale service instances Not applicable for SaaS Available starting in 5.0
Mask multibyte characters for enhanced privacy of sensitive data Not applicable for SaaS Available starting in 5.0
View the data protection rules that are applied to a user Not applicable for SaaS Available starting in 5.0
Enhanced security for profiling results in Data Virtualization views Not applicable for SaaS Available starting in 5.0
Data Virtualization connections in catalogs now reference the platform connection Not applicable for SaaS Available starting in 5.0
Data Virtualization connections in catalogs now reference the platform connection Not applicable for SaaS Available starting in 5.0
Enhanced security for the Admin role: The Admin role does not have default access to all data. Not applicable for SaaS Available starting in 4.8
IBM Knowledge Catalog data protection rules are always enabled for Watson Query data Not applicable for SaaS Available starting in 4.8
Secure your ungoverned objects: With IBM Knowledge Catalog data protection rules in Watson Query, virtualized objects that are not published in a governed catalog follow the Default data access convention setting from your rule settings. Not applicable for SaaS Available starting in 4.8
Query Presto data: You can create a connection to Presto to access and query data in Presto. Not applicable for SaaS Available starting in 4.8
Audit logging to monitor user activity and data access Available Available starting in 4.7
Integration with IBM Knowledge Catalog Required Optional
Group-based authorization and object-level access for groups Not available Available
Support for remote connectors Not applicable for SaaS Available
Support for file system based data sources, except in Cloud Object Storage Not applicable for SaaS Available
Connecting to data sources that require an uploaded JDBC driver, for example, SAP HANA, Generic JDBC Not applicable for SaaS Available
Collecting statistics in the user interface Not available Available
Automatic statistics collection during object virtualization Not available Available
Access management for multiple groups Not available Available
Support for CSV or TSV files in Cloud Object Storage Not applicable for SaaS Available
Credentials in vaults for connections in Cloud Object Storage Not applicable for SaaS Available

Learn more

Parent topic: Cloud Pak for Data as a Service