Installing IBM Cloud Pak for Data

A Red Hat® OpenShift® Container Platform cluster administrator and project administrator can work together to prepare the cluster and install IBM® Cloud Pak for Data.

Before you begin

Ensure that you review the following information before you install Cloud Pak for Data:
- Planning
- System requirements
  Ensure that you install the software on a system that has sufficient resources and that aligns with the guidance in the System requirements. For example, if you do not follow the specified disk requirements, you can run into out of memory errors.
Determine which services you want to install.
Some of the pre-installation tasks, such as creating catalog source and operator subscriptions, include steps for the services as well as the Cloud Pak for Data platform. If you know which services you plan to install, you can streamline your installation by batching these tasks.
Use the following information to ensure that you complete the appropriate tasks for your environment.

1. Do you have an existing Red Hat OpenShift Container Platform cluster?

Cloud Pak for Data is installed on a Red Hat OpenShift Container Platform Version 4.6 or Version 4.8 cluster.

Options	What to do
You already have an OpenShift 4.6 or 4.8 cluster	Go to 3. Do you already have supported persistent storage on your cluster?
You have an older version of OpenShift	Upgrade your cluster. For details, see the Red Hat OpenShift Container Platform documentation. Then, go to 3. Do you already have supported persistent storage on your cluster?
You don't have an OpenShift cluster	Decide where you want to host your Cloud Pak for Data. Go to 2. Where do you want to host your Cloud Pak for Data installation?

2. Where do you want to host your Cloud Pak for Data installation?

You can deploy Cloud Pak for Data on-premises or on the cloud. Your deployment environment determines how you can install Red Hat OpenShift Container Platform:

Options	What to do
You want to deploy Cloud Pak for Data on-premises	Follow the Red Hat OpenShift Container Platform 4.6 documentation to install OpenShift. Additional guidance on setting up OpenShift is available in the IBM Cloud Paks documentation. Alternative: If you don't have existing hardware, you can purchase IBM Cloud Pak for Data System, which comes with Red Hat OpenShift Container Platform and Cloud Pak for Data already installed. Go to 3. Do you already have supported persistent storage on your cluster?
You want to deploy Cloud Pak for Data on cloud	Decide which cloud provider you want to use. Decide how you want to install and manage Red Hat OpenShift Container Platform. For details, see Installing Red Hat OpenShift Container Platform. Go to 3. Do you already have supported persistent storage on your cluster?

Options

What to do

You want to deploy Cloud Pak for Data on-premises

Follow the Red Hat OpenShift Container Platform 4.6 documentation to install OpenShift.
Additional guidance on setting up OpenShift is available in the IBM Cloud Paks documentation.

Alternative: If you don't have existing hardware, you can purchase IBM Cloud Pak for Data System, which comes with Red Hat OpenShift Container Platform and Cloud Pak for Data already installed.
Go to 3. Do you already have supported persistent storage on your cluster?

You want to deploy Cloud Pak for Data on cloud

Decide which cloud provider you want to use.
Decide how you want to install and manage Red Hat OpenShift Container Platform. For details, see Installing Red Hat OpenShift Container Platform.
Go to 3. Do you already have supported persistent storage on your cluster?

3. Do you already have supported persistent storage on your cluster?

The Cloud Pak for Data platform supports the following storage:

Red Hat OpenShift Container Storage: Version: 4.6 or later fixes; Available in the IBM Storage Suite for IBM Cloud® Paks.
IBM Spectrum® Fusion: Version: 2.1.2 or later fixes
IBM Spectrum Scale Container Native storage: IBM Spectrum Scale Container Native Storage Access Version: 5.1.1.3 or later fixes; Container Storage Interface Version: 2.3.0 or later fixes; Available in the IBM Storage Suite for IBM Cloud Paks.
Network File System (NFS): Version: 4
Portworx: Version: 2.7.0 or later fixes
IBM Cloud File Storage: Version: Not applicable

Ensure that you have storage that works with the services that you plan to install.

Options	What to do
You have the supported storage	Review Setting up shared persistent storage to determine whether you need to complete any additional tasks to configure the storage for Cloud Pak for Data. Go to 4. Do you have the required OpenShift projects on your cluster?
You don't have supported storage	Decide which storage you want to use. Ensure that you choose storage that works with the services that you plan to install. Follow the guidance in Setting up shared persistent storage for installing and configuring the storage. Go to 4. Do you have the required OpenShift projects on your cluster?

4. Do you have the required OpenShift projects on your cluster?

At a minimum, you must have a project where you will install the Cloud Pak for Data operators and service operators and a project where you will install an instance of Cloud Pak for Data. You might need additional projects depending on whether you want to:

Separate the IBM Cloud Pak® foundational services operators from the Cloud Pak for Data operators
Install multiple instances of Cloud Pak for Data on the cluster

Options	What to do
You know which projects you plan to use when you install the software	Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to: Ensure that you have the necessary projects on your cluster Determine whether you need to create operator groups for the projects Go to 5. Do you have your API key?
You don't know which projects you plan to use when you install the software	Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to: Determine which projects you need to create on your cluster Set up the required operator groups for the projects Go to 5. Do you have your API key?

5. Do you have your API key?

The Cloud Pak for Data software images are hosted on the IBM Entitled Registry. To access the images, you must have your IBM entitlement API key.

Options	What to do
You have your API key	Go to 6. How are you going to access the required software images?
You don't have your API key	Follow the guidance in Obtaining your IBM entitlement API key. Go to 6. How are you going to access the required software images?

6. How are you going to access the required software images?

Cloud Pak for Data images are accessible from the IBM Entitled Registry. In most situations, it is strongly recommended that you mirror the necessary software images from the IBM Entitled Registry to a private container registry.

The only situation in which you might consider pulling images directly from the IBM Entitled Registry is when your cluster is not air-gapped, your network is extremely reliable, and latency is not a concern. However, for predictable and reliable performance, you should mirror the images to a private container registry.

Important:

You must mirror the necessary images to your private container registry in the following situations:

Your cluster is air-gapped (also called an offline or disconnected cluster)
Your cluster uses an allowlist to permit direct access by specific sites and the allowlist does not include the IBM Entitled Registry
Your cluster uses a blocklist to prevent direct access by specific sites and the blocklist includes the IBM Entitled Registry

Options	What to do
You are pulling images from the IBM Entitled Registry	Go to 7. Configuring your cluster to pull software images.
You are mirroring images to a private container registry	Review the guidance in Mirroring images to your private container registry to ensure you have a private container registry that meets the minimum requirements. Determine how you will mirror the images and complete the appropriate task: Mirroring images with a bastion node Mirroring images with an intermediary container registry Go to 7. Configuring your cluster to pull software images.

7. Configuring your cluster to pull software images

You must ensure that your cluster is configured to pull the software images from the appropriate location.

Options	What to do
You are pulling images from the IBM Entitled Registry	Follow the guidance in Configuring your cluster to pull Cloud Pak for Data images to configure the global image pull secret to include your IBM entitlement API key. Go to 8. Creating catalog sources.
You are pulling images from a private container registry	Follow the guidance in Configuring your cluster to pull Cloud Pak for Data images to: Configure the global image pull secret to include the credentials of an account that can pull images from the registry. Configure an image content source policy. Go to 8. Creating catalog sources.

8. Creating catalog sources

You must create catalog sources to ensure that your cluster uses the correct software images for your environment.

Options	What to do
You are pulling images from the IBM Entitled Registry	Review the guidance in Creating catalog sources to determine which method is appropriate for your environment. Go to 9. Are the IBM Cloud Pak foundational services already installed on your cluster?
You are pulling images from a private container registry	Follow the guidance in Creating catalog sources for a private container registry. Go to 9. Are the IBM Cloud Pak foundational services already installed on your cluster?

9. Are the IBM Cloud Pak foundational services already installed on your cluster?

The IBM Cloud Pak foundational services are a prerequisite for Cloud Pak for Data. However, in some situations the IBM Cloud Pak for Data platform operator can automatically install the IBM Cloud Pak foundational services operators and services on the cluster.

For information about supported versions of IBM Cloud Pak foundational services, see the Cloud Pak for Data platform software requirements.

Options	What to do
IBM Cloud Pak foundational services Version 3.18.0 or later is already installed	Go to 10. Creating operator subscriptions
An earlier version of IBM Cloud Pak foundational services is installed	Follow the guidance in Installing IBM Cloud Pak foundational services. Go to 10. Creating operator subscriptions.
IBM Cloud Pak foundational services is not installed and you are using the express installation method	With the express installation method, all of the operators are in the same OpenShift project and the IBM Cloud Pak for Data platform operator can automatically install IBM Cloud Pak foundational services. Go to 10. Creating operator subscriptions.
IBM Cloud Pak foundational services is not installed and you are using the specialized installation method	With the specialized installation method, the IBM Cloud Pak foundational services operators and the Cloud Pak for Data operators are in separate OpenShift projects. To ensure IBM Cloud Pak foundational services is installed in the correct project, you must manually install it. Follow the guidance in Installing IBM Cloud Pak foundational services. Go to 10. Creating operator subscriptions.

10. Creating operator subscriptions

An operator subscription tells the cluster where to install a given operator and gives information about the operator to Operator Lifecycle Manager (OLM).

Complete the appropriate steps for your environment in Creating operator subscriptions.
Go to 11. Do you plan to install services that require custom SCCs?

11. Do you plan to install services that require custom SCCs?

The following services require custom security context constraints:

Data Virtualization
Db2®
Db2 Big SQL
Db2 Warehouse
OpenPages®
Watson Knowledge Catalog

Options	What to do
You plan to install one or more of these services	Create the appropriate SCCs for your environment. For details, see Custom security context constraints for services. Go to 12. Do you plan to install services that require specific node settings?
You don't plan to install any of these services	Go to 12. Do you plan to install services that require specific node settings?

12. Do you plan to install services that require specific node settings?

The following services require specific node settings:

Data Virtualization
Db2
Db2 Big SQL
Db2 Warehouse
Jupyter Notebooks with Python 3.7 for GPU
OpenPages
Watson Discovery
Watson Knowledge Catalog
Watson Machine Learning Accelerator
Watson Studio

You might also need to adjust some node settings if you are working with large data sets or you have slower network speeds.

Options	What to do
You plan to install one or more of these services, you have large data sets, or you have slower network speeds	Change the appropriate node settings. For details, see Changing required node settings. Go to 13. Do you need to install the scheduling service?
You don't plan to install any of these services	Go to 13. Do you need to install the scheduling service?

13. Do you need to install the scheduling service?

The scheduling service is required if you plan to install Watson Machine Learning Accelerator.

However, it is strongly recommended that you install the scheduling service so that you can programmatically enforce the quotas that you set on the platform and on individual services.

Options	What to do
You need to install the scheduling service	Follow the guidance in Installing the scheduling service. Go to 14. Installing Cloud Pak for Data.
You don't plan to install the scheduling service	Go to 14. Installing Cloud Pak for Data.

14. Installing Cloud Pak for Data

Depending on the number of OpenShift projects you created, you can install one or more instances of Cloud Pak for Data on your cluster.

15. Completing post-installation tasks

After you install Cloud Pak for Data, make sure your cluster is secure and complete tasks that will impact how users interact with Cloud Pak for Data, such as configuring SSO or changing the route to the platform.

Complete the appropriate tasks for your environment in Post-installation tasks.
Go to 16. Installing services.

16. Installing services

You are ready to install services on your cluster.

Instructions for installing IBM services are available in Services.