Hardware requirements
Before you install IBM® Cloud Pak for Data, review the hardware requirements for the control plane, the shared cluster components, and the services that you plan to install.
Cloud Pak for Data platform hardware requirements
You must install Cloud Pak for Data on a Red Hat® OpenShift® Container Platform cluster. For information about the supported versions of Red Hat OpenShift Container Platform, see Software requirements.
It is strongly recommended that you deploy Cloud Pak for Data on a highly available cluster.
The following requirements are the minimum recommendations for a small, stable deployment of Cloud Pak for Data. Use the minimum recommended configuration as a starting point for your cluster configuration. If you use fewer resources, you are likely to encounter stability problems. (If you are sizing a cluster for a proof of concept or for very limited use, you can use the guidance in Limited-use configurations for Cloud Pak for Data.)
The following configuration has been tested and validated by IBM. However, Red Hat OpenShift Container Platform supports other configurations. If the configuration in the following table does not work in your environment, you can adapt the configuration based on the guidance in the Red Hat OpenShift documentation. (Links to the relevant Red Hat OpenShift documentation are available in Software requirements.) In general, Cloud Pak for Data is primarily concerned with the resources that are available on your worker nodes.
- The shared components that you need to install
- The services that you plan to
install
The sizing requirements for services are available in Service hardware requirements. If you install only a few services with small vCPU and memory requirements, you might not need additional resources. However, if you plan to install multiple services or services with large footprints, add the appropriate amount of vCPU and memory to the minimum recommendations below.
- The types of workloads that you plan to run
For example, if you plan to run complex analytics workloads in addition to other resource-intensive workloads, such as ETL jobs, you can expect reduced concurrency levels if you don't add additional computing power to your cluster.
Because workloads vary based on a number of factors, use measurements from running real workloads with realistic data to size your cluster.
Node role | Hardware | Number of servers | Minimum available vCPU | Minimum memory | Minimum storage |
---|---|---|---|---|---|
Master + infra |
|
3 master (for high availability) and 3 infrastructure on the same 3 nodes | 8 vCPU per node | 32 GB RAM per node | No additional storage is needed. For sizing guidance, refer to the Red Hat OpenShift Container Platform documentation. |
Worker/compute |
|
3+ worker/compute nodes | 16 vCPU per node |
|
300 GB of storage space per node for storing container images locally. See Cloud Pak for Data platform storage requirements for details. |
Load balancer |
|
2 load balancer nodes | 4 vCPU per node | 4 GB RAM per node Add another 4 GB of RAM for access restrictions and security control. |
Add 100 GB of root storage for access restrictions and security control. |
- s390x hardware
- s390x is supported only on Red Hat OpenShift Container Platform
Version 4.8.
Not all services support s390x. For details, see Service hardware requirements.
- POWER® hardware
- POWER is supported only on Red Hat OpenShift Container Platform Version 4.8.
The platform supports Power® 9 and Power 10, but does not take advantage of Power 10 optimizations.
Not all services support POWER. Some services that support POWER support only Power 9 or Power 10. For details, see Service hardware requirements.
On POWER hardware the maximum supported configuration for each worker node is:
- 160 vCPU
- 512 GB RAM
- Load balancer
- A load balancer is required when using three master nodes. The load balancer distributes the traffic load of the master and proxy nodes, securely isolates the master and compute node IP addresses, and facilitates external communication, including accessing the management console and API or making other requests to the master and proxy nodes.
- Bastion node
- You can optionally use a bastion node to download the required software images. For more
information on the various options that are available for accessing the software images, see Mirroring images to your
container registry.
If you choose to use a bastion node, use an x86-64 system with Red Hat Enterprise Linux®. The bastion node must be able to run the following software:
Prerequisite Purpose OpenShift CLI Required to interact with your Red Hat OpenShift Container Platform cluster. IBM Cloud Pak® CLI (cloudctl) Required to download images from the IBM Entitled Registry. httpd-tools
Required to run the IBM Cloud Pak CLI (cloudctl). skopeo
Version 1.2.0 or laterRequired to run the IBM Cloud Pak CLI (cloudctl). For details on installing this software on your bastion node, see Mirroring images with a bastion node.
Cluster node settings
The time on all of the nodes must be synchronized within 500 ms.
To ensure that services can run correctly on your cluster, ensure that your nodes meet the requirements specified in Changing required node settings. You must change the node settings before you install Cloud Pak for Data.
Installation node
You can run the Cloud Pak for Data installation from a
workstation that can connect to the Red Hat OpenShift Container Platform. The workstation does not have to be part
of the cluster. However, you must be able to run oc login
from the workstation to
connect to the cluster.
- IBM Entitled Registry
- You can use any workstation, provided:
- You can install the Red Hat OpenShift Container Platform CLI:
- The workstation can connect to the internet and to the cluster
- Private container registry (recommended)
- You must use a Linux x86-64 system with Red Hat Enterprise Linux. For additional requirements, see:
Disk requirements
To prepare your storage disks, ensure that you have good I/O performance, and prepare the disks for encryption.
- I/O performance
- When I/O performance is not sufficient, services can experience poor performance or cluster
instability, such as functional failures with timeouts. This is especially true when you are running
a heavy workload.
The I/O performance requirements for Cloud Pak for Data are based extensive testing in various cloud environments. The tests validate the I/O performance in these environments. The requirements are based on the performance of writing data to representative storage classes using the following block size and thread count combinations:
- To evaluate disk latency, the I/O tests use a small block (4 KB) with 8 threads
- To evaluate disk throughput, the I/O tests us a large block (1 GB) with 2 threads
To evaluate the storage performance on the cluster where you plan to install Cloud Pak for Data, run the Cloud Pak for Data storage performance validation playbook. Ensure that the results are comparable to the following recommended minimum values:
- Disk latency (4 KB block with 8 threads)
- For disk latency tests, 18 MB/s has been found to provide sufficient performance.
- Disk throughput (1 GB block with 2 thread)
- For disk throughput tests, 226 MB/s has been found to provide sufficient performance.
To ensure sufficient performance, both requirements should be satisfied.
Some storage types might have more stringent I/O requirements. For details, see Storage considerations.
Important: It is recommended that you run the validation playbook several times to account for variations in workloads, access patterns, and network traffic.In addition, if your storage volumes are remote, network speed can be a key factor in your I/O performance. For good I/O performance, ensure that you have sufficient network speed, as described in Storage considerations.
- Encryption with Linux Unified Key Setup
- To ensure that your data within Cloud Pak for Data is stored securely, you can encrypt your disks. If you use Linux Unified Key Setup-on-disk-format (LUKS), you must enable LUKS when you install Red Hat OpenShift Container Platform. For more information, see Encrypting disks during installation in the Red Hat OpenShift Container Platform documentation.
Service hardware requirements
Use this table to determine whether you have the appropriate hardware to install each service that you want to use.
Service | x86-64 | POWER | Z | vCPU | Memory | Storage | Notes and additional requirements |
---|---|---|---|---|---|---|---|
Anaconda Repository for IBM Cloud Pak for Data | ✓ | 4 vCPU | 8 GB RAM | 500 GB |
This service cannot be installed on your Red Hat OpenShift cluster. For details, see the Anaconda
documentation.
|
||
Analytics Engine Powered by Apache Spark | ✓ | ✓ |
|
|
Local Disk storage (SSDs) on OpenShift nodes. | Spark jobs use
emptyDir volumes for temporary storage and shuffling. If your Spark jobs use a lot of disk space for temporary storage
or shuffling, make sure that you have sufficient space on the local disk where
emptyDir volumes are created. On OpenShift 4.6, the recommended location is a partition
in /var/lib. For details, see Understanding ephemeral storage.If you don't have sufficient space on the local disk, Spark jobs might run slowly and some of the executors might evict jobs. A minimum of 50 GB of temporary storage for each vCPU request is recommended. Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
|
Cognos® Analytics | ✓ |
|
|
|
When you provision the Cognos Analytics service, you specify the size of the instance. The information here is for the smallest instance. For other sizes, see Provisioning the Cognos Analytics service. |
||
Cognos Dashboards | ✓ |
|
|
Not applicable |
Minimum resources for an installation with a single replica per service. |
||
Data Privacy | ✓ |
|
|
Not applicable |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Data Refinery | ✓ |
|
|
Not applicable |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. This service is installed when you install Watson™ Knowledge
Catalog or Watson Studio
|
||
Data Virtualization | ✓ |
|
|
220 GB (default):
|
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. When you provision the service, you can specify:
|
||
DataStage® | ✓ |
|
|
300 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Db2® | ✓ | ✓* | ✓ |
|
|
200 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. A dedicated node is recommended for production deployments of Db2. For details, see Setting up dedicated nodes. * Support includes Power 9 and Power 10. However, this software does not take advantage of Power 10 optimizations. |
Db2 Big SQL | ✓ |
|
|
410 GB (default):
|
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. When you provision the service, you can specify:
|
||
Db2 Data Gate | ✓ | ✓ |
|
|
50 GB per instance |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
|
Db2 Data Management Console | ✓ | ✓* | ✓ |
|
|
10 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. For information on sizing the provisioned instance, see Provisioning the service. * Support includes Power 9 and Power 10. However, this software does not take advantage of Power 10 optimizations. |
Db2 Event Store | ✓ | 8 vCPU per node | 64 GB RAM per node |
|
Each Db2 Event Store database must have
its own set of dedicated nodes.
|
||
Db2 Warehouse | ✓ | ✓* | ✓ |
|
|
200 GB |
Minimum resources for an installation with a single replica per service. Use dedicated nodes for:
For detail, see Setting up dedicated nodes.
* Support includes Power 9 and Power 10. However, this software does not take advantage of Power 10 optimizations. |
Decision Optimization | ✓ |
|
|
12 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
EDB Postgres | ✓ |
|
|
100 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Execution Engine for Apache Hadoop | ✓ | ✓ |
|
|
2 GB per image pushed |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. Each image that is pushed to the remote Hadoop cluster requires disk space where image tgz file can be stored. Execution Engine for Apache Hadoop requires an Execution Engine for Hadoop RPM installation on the Apache Hadoop or IBM Spectrum® Conductor cluster. For details, see:
|
|
Financial Services Workbench | ✓ | This information is not currently available. | This information is not currently available. | This information is not currently available. | Requires 3 or more nodes. For details, see the Financial Services Workbench documentation. |
||
Guardium® External S-TAP® | ✓ |
|
|
|
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
IBM Match 360 with Watson | ✓ |
|
|
190 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Informix® | ✓ |
|
|
20 GB |
Minimum resources for an installation with a single replica per service. |
||
MongoDB | ✓ |
|
|
100 GB |
Minimum resources for an installation with a single replica per service. Dedicated nodes are recommended. For details, see Setting up dedicated nodes.
|
||
Open Data for Industries | ✓ | 48 vCPU | 144 GB RAM | 4 TB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. Minimum recommended configuration.
Reccommended configuration for worker nodes: The minimum requirement
of 48 vCPU and 144 GB RAM applies only to worker nodes.
|
||
OpenPages® | ✓ |
|
|
250 GB | When you provision the OpenPages
service, you specify the size of the instance and the storage class to use. You also specify whether
to use the database that is provided with the OpenPages service or a database that is on an external
server. These values represent the minimum resources for OpenPages with a Db2 database on Cloud Pak for Data.
|
||
Planning Analytics | ✓ |
|
|
20 GB |
Work with IBM Sales to get a more accurate sizing based on your expected workload. Select the size of your instance when you provision Planning Analytics. For details, see Provisioning the Planning Analytics service. |
||
Product Master | ✓ |
|
|
200 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
RStudio® Server with R 3.6 | ✓ |
|
|
Not applicable |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
SPSS® Modeler | ✓ |
|
|
Not applicable |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Virtual Data Pipeline | ✓ | This information is not currently available. | This information is not currently available. | This information is not currently available. | This service cannot be installed on your Red Hat OpenShift cluster. For details, see Installing Virtual Data Pipeline. | ||
Voice Gateway | ✓ |
|
|
Not applicable |
Minimum resources for a system that can provide voice-only support for up to 11 concurrent calls. Work with IBM Sales to get a more accurate sizing based on your expected workload. Dedicated nodes are recommended for production environments. |
||
Watson Assistant | ✓ |
|
|
282 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. Your hardware must meet the following additional requirements:
|
||
Watson Assistant for Voice Interaction | Watson Assistant for Voice
Interaction is comprised of the following services:
Refer to the system requirements for the services that you plan to install. |
||||||
Watson Discovery | ✓ |
|
|
508 GB |
Development installations have a single replica per service. Productions installations have multiple replicas per service. CPUs must support the AVX2 instruction set. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Watson Knowledge Catalog | ✓ | 34 vCPU required to operate small service. CPU requests:
|
136 GB required to operate small service. Memory requests:
|
900 GB |
Minimum resources to install and operate the service with the smallest possible configuration. Work with IBM Sales to get a more accurate sizing based on your expected workload.
If Data Refinery is not installed, add the vCPU and memory required for Data Refinery to the information listed for Watson Knowledge Catalog. |
||
Watson Knowledge Studio | ✓ |
|
|
360 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
||
Watson Machine Learning | ✓ | ✓* |
|
|
150 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. AVX2 requirements:
* For a list of the features that are available on s390x hardware (IBM Z® and LinuxONE) see Capabilities on IBM Z. |
|
Watson Machine Learning Accelerator | ✓ | ✓* |
|
|
120 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. Power support is limited to Power 9. Power 10 is not supported. GPU support is limited to NVIDIA V100, A100 and T4 GPUs. |
|
Watson OpenScale | ✓ | ✓ |
|
|
100 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. |
|
Watson Speech to Text | ✓ |
|
|
900 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. CPUs must support the AVX2 instruction set. |
||
Watson Studio | ✓ | ✓* |
|
|
Not applicable |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. If Data Refinery is not installed, add the vCPU and memory required for Data Refinery to the information listed for Watson Studio. * For a list of the features that are available on s390x hardware (IBM Z and LinuxONE) see Capabilities on IBM Z. |
|
Watson Studio Runtimes | ✓ | ✓* |
|
|
Not applicable | Runtimes use on-demand vCPU and memory. Watson Studio Runtimes includes the following runtimes:
* For a list of the features that are available on s390x hardware (IBM Z and LinuxONE) see Capabilities on IBM Z. |
|
Watson Text to Speech | ✓ |
|
|
900 GB |
Minimum resources for an installation with a single replica per service. Work with IBM Sales to get a more accurate sizing based on your expected workload. CPUs must support the AVX2 instruction set. |