Cloud deployment environments

You can choose to deploy IBM® Cloud Pak for Data in the environment that suits your business needs.

Important: If you don't already have an IBM Cloud Pak for Data license, you can:

Purchase a software license through IBM Passport Advantage®.
When you purchase Cloud Pak for Data, Cloud Pak for Data includes entitlement to both the Red Hat® OpenShift® Container Platform and Red Hat Enterprise Linux®.

You can use this entitlement to install Red Hat OpenShift Container Platform on the environment of your choice.
Register for a 60-day trial license of Cloud Pak for Data
The trial is for Cloud Pak for Data software only. The trial does not include entitlement to the Red Hat OpenShift Container Platform.

You can use either license to install Cloud Pak for Data software on the environment of your choice.

Cloud Pak for Data can be deployed in various private cloud and public cloud environments. The following list describes each deployment environment in more detail.

On premises, private cloud
If you want to ensure that your environment is running securely behind your firewall, or you have an existing on-premises Red Hat OpenShift Container Platform cluster, you can deploy Cloud Pak for Data in your own private cloud.
Red Hat OpenShift

You must deploy an instance of Red Hat OpenShift Container Platform on your cluster.
For more information, see System requirements for IBM Cloud Pak for Data.

Cluster architecture

Cloud Pak for Data is deployed on a multi-node cluster. Although you can deploy Cloud Pak for Data on a 3-node cluster for development or proof of concept environments, it is strongly recommended that you deploy your production environment on a larger, highly available cluster with multiple dedicated master and worker nodes. This configuration provides better performance, better cluster stability, and increased ease of scaling the cluster to support workload growth. The specific requirements for a production-level cluster are identified in System requirements for IBM Cloud Pak for Data. For more information, see Architecture for IBM Cloud Pak for Data.

Services
After you install Cloud Pak for Data, you can decide which services you want to install. For a list of available services, see:
The storage that you use determines the services that you can run. For more information, see System requirements for services.
Special requirements for services
- The Watson™ Assistant service is supported on Microsoft Azure machine series that use CPUs that support AVX 2.
- AVX is required for AutoAI experiments in Watson Machine Learning.
Supported storage
You can use one or more of the following types of storage in your cluster:
- Red Hat OpenShift Container Storage
- NFS
- Portworx
Installation documentation

See Installing IBM Cloud Pak for Data.
IBM Cloud Pak for Data System
If you don't have an existing on-premises cluster and you want to get up and running quickly, consider IBM Cloud Pak for Data System. IBM Cloud Pak for Data System is an all-in-one cloud-native data and AI platform in a box that provides a preconfigured, governed, and secure environment to collect, organize, and analyze data.
Red Hat OpenShift

Red Hat OpenShift is already deployed on your IBM Cloud Pak for Data System.

Cluster architecture

The system lets you flexibly expand or reduce storage and compute with plug-and-play nodes. For more information, see Cloud Pak for Data System overview.

Services
After you install Cloud Pak for Data, you can decide which services you want to install. For a list of available services, see:
The storage that you use determines the services that you can run. For more information, see System requirements for services.
Supported storage

Portworx is installed for you. However, you can optionally add other types of storage to your environment.

Installation documentation

Cloud Pak for Data is already installed on your IBM Cloud Pak for Data System hardware. However, you must install the appropriate services for your needs. For more information, see Installing services in the IBM Cloud Pak for Data System documentation.
IBM Cloud
If you already use IBM Cloud to run business-critical applications, or if you don't want to set up and manage your own hardware, you can deploy Cloud Pak for Data on IBM Cloud.
Red Hat OpenShift

Cloud Pak for Data includes entitlement to the Red Hat OpenShift Container Platform. You can optionally apply this entitlement when you create your cluster.
You can deploy Cloud Pak for Data on the following Red Hat OpenShift environments:
Managed OpenShift (Recommended)

To use managed OpenShift, you must deploy an instance of Red Hat OpenShift Container Platform on IBM Cloud.
Your cluster must meet one of the following criteria:

Classic infrastructure single zone (IBM Cloud File Storage)

Classic infrastructure single or multi zone (Portworx Enterprise storage)

Virtual Private Cloud (VPC) Gen2 single or multi zone (Portworx Enterprise storage)

For additional prerequisites, see Getting started with IBM Cloud Pak for Data.

Restriction: Some services are supported only on Bare Metal clusters with extra local storage for software-defined storage (SDS). These services also require Portworx Enterprise storage. If you plan to deploy any of these services. ensure that you choose this infrastructure when you configure your worker pool.

Self-managed OpenShift

To use self-managed OpenShift, contact IBM Software Support.
Cluster architecture

At a minimum, your cluster must contain 3 master nodes and 3 worker nodes. You can deploy additional worker nodes to support your workload. For details, see Adding worker nodes and zones to clusters.

Services
After you install Cloud Pak for Data, you can decide which services you want to install. For a list of available services, see:
The storage that you use determines the services that you can run. For more information, see System requirements for services.
You can optionally automatically deploy the following services as part of your Cloud Pak for Data installation:
- Analytics Engine Powered by Apache Spark
- Cognos® Dashboards
- Data Refinery (installed with Watson Knowledge Catalog or Watson Studio.)
- Data Virtualization
- Db2® Data Gate
- Db2 Data Management Console
- Db2 Warehouse
- RStudio® Server with R 3.6
- Streams
- Streams Flows
- Watson Knowledge Catalog
- Watson Machine Learning
- Watson OpenScale
- Watson Studio
Special requirements for services
The following services are supported on IBM Cloud only if your worker pool includes Bare Metal machines with extra storage for SDS. These services also require Portworx Enterprise storage.

Watson Assistant

Watson Assistant for Voice Interaction

Watson Discovery

Watson Knowledge Studio

Watson Language Translator

Watson Speech to Text

Watson Text to Speech
The Virtual Data Pipeline service is not supported on IBM Cloud.
Supported storage
The storage that is supported depends on your Red Hat OpenShift environment:
Managed OpenShift

If you are running managed OpenShift, the following storage is supported:

IBM Cloud File Storage
For information about which services support IBM Cloud File Storage, see System requirements for services.

Portworx Enterprise
For use on the following Red Hat OpenShift Container Platform deployments:

Classic Infrastructure - single or multi zone

Virtual Private Cloud (VPC) Gen2 single or multi zone

Some services require Portworx Enterprise storage to run on IBM Cloud. You can install the Portworx Enterprise service on IBM Cloud.

Self-managed OpenShift

If you are running self-managed OpenShift, the following storage is supported:

NFS
For information about which services support NFS, see System requirements for services.

Portworx
For information about which services support Portworx, see System requirements for services.
Installation documentation

See Getting started with IBM Cloud Pak for Data.
This video provides an overview of the installation on IBM Cloud.

Terraform installation

You can optionally use Terraform to install Cloud Pak for Data on a Virtual Private Cloud (VPC) Gen2 cluster.
See Cloud Pak for Data 3.5 on Red Hat OpenShift on IBM Cloud.

This video provides an overview of using Terraform to install Cloud Pak for Data on IBM Cloud.

Amazon Web Services (AWS) infrastructure as a service

If you already use AWS and you don't want to set up and manage your own hardware, you can deploy Cloud Pak for Data on AWS.

Red Hat OpenShift

When you install Cloud Pak for Data on AWS, a self-managed Red Hat OpenShift cluster is automatically installed on your AWS infrastructure.

Cluster architecture

Cloud Pak for Data is deployed on an AWS Virtual Private Cloud. When you install Cloud Pak for Data, you can specify whether you want a single-zone cluster or a multi-zone cluster.

The installation method that you use determines how much control you have over the configuration of your cluster:

AWS Quick Start: If you use the Cloud Pak for Data AWS Quick Start, your Virtual Private Cloud is configured with the default parameters.
Terraform: If you use the Terraform installation, you can optionally use scripts to override the default Virtual Private Cloud parameters.

Services

After you install Cloud Pak for Data, you can decide which services you want to install. For a list of available services, see:

The storage that you use determines the services that you can run. For more information, see System requirements for services.

You can optionally automatically deploy the following services as part of your Cloud Pak for Data installation:

Service	Terraform	Quick Start
Analytics Engine Powered by Apache Spark	✓	✓
Cognos Analytics	✓
Cognos Dashboards	✓	✓
Data Refinery (Installed with Watson Knowledge Catalog or Watson Studio.)	✓	✓
Data Virtualization	✓	✓
DataStage®	✓
Db2	✓
IBM Db2 Warehouse	✓
Decision Optimization	✓
Execution Engine for Apache Hadoop	✓
SPSS® Modeler	✓
Streams	✓
Watson Knowledge Catalog	✓	✓
Watson Machine Learning	✓	✓
Watson OpenScale	✓	✓
Watson Studio	✓	✓

Supported storage

If you are running self-managed OpenShift, the following storage is supported:

Red Hat OpenShift Container Storage
NFS
Portworx

Installation documentation

Follow the appropriate documentation for your use case:

AWS Quick Start

Use this installation method for a one-click installation with default Virtual Private Cloud parameters.

See:

Cloud Pak for Data on AWS Quick Start guide (overview)
Cloud Pak for Data on the AWS Cloud deployment guide (PDF)

This video provides an overview of the quick start installation on AWS.

Terraform

Use this installation method to deploy a customized Virtual Private Cloud.

See Cloud Pak for Data 3.5 on AWS and Azure.

Microsoft Azure infrastructure as a service

If you already use Microsoft Azure and you don't want to set up and manage your own hardware, you can deploy Cloud Pak for Data on Azure.

Red Hat OpenShift

When you install Cloud Pak for Data on Azure, a self-managed Red Hat OpenShift cluster is automatically installed on your Azure infrastructure.

Restriction: Some services are supported only on machine series that use CPUs that support the AVX 2 instruction set.

See Virtual Machine series for information on the processors that are supported for each type of VM.

Cluster architecture

Cloud Pak for Data is deployed on an Azure Virtual Network (VNet). When you install Cloud Pak for Data, you can specify whether you want a single-zone cluster or a multi-zone cluster.

The installation method that you use determines how much control you have over the configuration of your cluster:

Azure Quick Start: If you use the Cloud Pak for Data Azure Quick Start, your VNet is configured with the default parameters.
Terraform: If you use the Terraform installation, you can optionally use scripts to override the default VNet parameters.

Services

After you install Cloud Pak for Data, you can decide which services you want to install. For a list of available services, see:

The storage that you use determines the services that you can run. For more information, see System requirements for services.

You can optionally automatically deploy the following services as part of your Cloud Pak for Data installation:

Service	Terraform	Quick Start
Analytics Engine Powered by Apache Spark	✓	✓
Cognos Analytics	✓
Cognos Dashboards	✓	✓
Data Refinery (Installed with Watson Knowledge Catalog or Watson Studio.)	✓	✓
Data Virtualization	✓	✓
DataStage	✓
Db2	✓
IBM Db2 Warehouse	✓
Decision Optimization	✓
Execution Engine for Apache Hadoop	✓
SPSS Modeler	✓
Streams	✓
Watson Knowledge Catalog	✓	✓
Watson Machine Learning	✓	✓
Watson OpenScale	✓	✓
Watson Studio	✓	✓

Special requirements for services

The following services are supported on Microsoft Azure machine series that use CPUs that support AVX 2:

Watson Assistant
Watson Assistant for Voice Interaction
Watson Knowledge Studio
Watson Language Translator
Watson Speech to Text
Watson Text to Speech

Supported storage

If you are running self-managed OpenShift, the following storage is supported:

NFS
Portworx

Installation documentation

Follow the appropriate documentation for your use case:

Azure Quick Start

Use this installation method for a one-click installation with default VNet parameters.

See:

Cloud Pak for Data on the Azure Marketplace

This video provides an overview of the quick start installation on Azure.

Terraform

Use this installation method to deploy a customized VNet.

See Cloud Pak for Data 3.5 on AWS and Azure.

Google Cloud
If you already use Google Cloud and you don't want to set up and manage your own hardware, you can deploy Cloud Pak for Data on Google Cloud.
Red Hat OpenShift
You must deploy an instance of Red Hat OpenShift Container Platform on your cluster.
For more information, see:
- System requirements for IBM Cloud Pak for Data
- Red Hat OpenShift Container Platform partner solution on Google Cloud
Cluster architecture

Cloud Pak for Data is deployed on a multi-node cluster. Although you can deploy Cloud Pak for Data on a 3-node cluster for development or proof of concept environments, it is strongly recommended that you deploy your production environment on a larger, highly available cluster with multiple dedicated master and worker nodes. This configuration provides better performance, better cluster stability, and increased ease of scaling the cluster to support workload growth. The specific requirements for a production-level cluster are identified in System requirements for IBM Cloud Pak for Data. For more information, see Architecture for IBM Cloud Pak for Data.

Services
After you install Cloud Pak for Data, you can decide which services you want to install. For a list of available services, see:
The storage that you use determines the services that you can run. For more information, see System requirements for services.
Supported storage
If you are running self-managed OpenShift, the following storage is supported:
- OpenShift Container Storage
- NFS
- Portworx
Installation documentation

See Installing IBM Cloud Pak for Data.

In addition to Cloud Pak for Data software, IBM offers IBM Cloud Pak® for Data as a Service. IBM Cloud Pak for Data as a Service might be right for you if you already use IBM Cloud to run business-critical applications and you don't want to set up and manage your own deployment of Cloud Pak for Data. IBM Cloud Pak for Data as a Service differs from the Cloud Pak for Data software in several ways. For details, see the IBM Cloud Pak for Data as a Service documentation.