Installing IBM Cloud Pak for Data

A Red Hat® OpenShift® Container Platform cluster administrator and project administrator can work together to prepare the cluster and install IBM® Cloud Pak for Data.

Before you begin

  1. Ensure that you review the following information before you install Cloud Pak for Data:
  2. Determine which services you want to install.

    Some of the pre-installation tasks, such as creating catalog source and operator subscriptions, include steps for the services as well as the Cloud Pak for Data platform. If you know which services you plan to install, you can streamline your installation by batching these tasks.

  3. Use the following information to ensure that you complete the appropriate tasks for your environment.

1. Do you have an existing Red Hat OpenShift Container Platform cluster?

Cloud Pak for Data is installed on a Red Hat OpenShift Container Platform Version 4.6 or Version 4.8 cluster.

Options What to do
You already have an OpenShift 4.6 or 4.8 cluster
  1. Go to 3. Do you already have supported persistent storage on your cluster?
You have an older version of OpenShift
  1. Upgrade your cluster. For details, see the Red Hat OpenShift Container Platform documentation.
  2. Then, go to 3. Do you already have supported persistent storage on your cluster?
You don't have an OpenShift cluster
  1. Decide where you want to host your Cloud Pak for Data. Go to 2. Where do you want to host your Cloud Pak for Data installation?

2. Where do you want to host your Cloud Pak for Data installation?

You can deploy Cloud Pak for Data on-premises or on the cloud. Your deployment environment determines how you can install Red Hat OpenShift Container Platform:

Options What to do
You want to deploy Cloud Pak for Data on-premises
  1. Follow the Red Hat OpenShift Container Platform 4.6 documentation to install OpenShift.

    Additional guidance on setting up OpenShift is available in the IBM Cloud Paks documentation.

    Alternative: If you don't have existing hardware, you can purchase IBM Cloud Pak for Data System, which comes with Red Hat OpenShift Container Platform and Cloud Pak for Data already installed.
  2. Go to 3. Do you already have supported persistent storage on your cluster?
You want to deploy Cloud Pak for Data on cloud
  1. Decide which cloud provider you want to use.
  2. Decide how you want to install and manage Red Hat OpenShift Container Platform. For details, see Installing Red Hat OpenShift Container Platform.
  3. Go to 3. Do you already have supported persistent storage on your cluster?

3. Do you already have supported persistent storage on your cluster?

The Cloud Pak for Data platform supports the following storage:
Red Hat OpenShift Container Storage
Version: 4.6 or later fixes
Available in the IBM Storage Suite for IBM Cloud® Paks.
IBM Spectrum® Fusion
Version: 2.1.2 or later fixes
IBM Spectrum Scale Container Native storage
IBM Spectrum Scale Container Native Storage Access Version: 5.1.1.3 or later fixes
Container Storage Interface Version: 2.3.0 or later fixes
Available in the IBM Storage Suite for IBM Cloud Paks.
Network File System (NFS)
Version: 4
Portworx
Version: 2.7.0 or later fixes
IBM Cloud File Storage
Version: Not applicable

Ensure that you have storage that works with the services that you plan to install.

Options What to do
You have the supported storage
  1. Review Setting up shared persistent storage to determine whether you need to complete any additional tasks to configure the storage for Cloud Pak for Data.
  2. Go to 4. Do you have the required OpenShift projects on your cluster?
You don't have supported storage
  1. Decide which storage you want to use. Ensure that you choose storage that works with the services that you plan to install.
  2. Follow the guidance in Setting up shared persistent storage for installing and configuring the storage.
  3. Go to 4. Do you have the required OpenShift projects on your cluster?

4. Do you have the required OpenShift projects on your cluster?

At a minimum, you must have a project where you will install the Cloud Pak for Data operators and service operators and a project where you will install an instance of Cloud Pak for Data. You might need additional projects depending on whether you want to:
  • Separate the IBM Cloud Pak® foundational services operators from the Cloud Pak for Data operators
  • Install multiple instances of Cloud Pak for Data on the cluster
Options What to do
You know which projects you plan to use when you install the software
  1. Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to:
    • Ensure that you have the necessary projects on your cluster
    • Determine whether you need to create operator groups for the projects
  2. Go to 5. Do you have your API key?
You don't know which projects you plan to use when you install the software
  1. Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to:
    • Determine which projects you need to create on your cluster
    • Set up the required operator groups for the projects
  2. Go to 5. Do you have your API key?

5. Do you have your API key?

The Cloud Pak for Data software images are hosted on the IBM Entitled Registry. To access the images, you must have your IBM entitlement API key.

Options What to do
You have your API key
  1. Go to 6. How are you going to access the required software images?
You don't have your API key
  1. Follow the guidance in Obtaining your IBM entitlement API key.
  2. Go to 6. How are you going to access the required software images?

6. How are you going to access the required software images?

Cloud Pak for Data images are accessible from the IBM Entitled Registry. In most situations, it is strongly recommended that you mirror the necessary software images from the IBM Entitled Registry to a private container registry.

The only situation in which you might consider pulling images directly from the IBM Entitled Registry is when your cluster is not air-gapped, your network is extremely reliable, and latency is not a concern. However, for predictable and reliable performance, you should mirror the images to a private container registry.

Important:
You must mirror the necessary images to your private container registry in the following situations:
  • Your cluster is air-gapped (also called an offline or disconnected cluster)
  • Your cluster uses an allowlist to permit direct access by specific sites and the allowlist does not include the IBM Entitled Registry
  • Your cluster uses a blocklist to prevent direct access by specific sites and the blocklist includes the IBM Entitled Registry
Options What to do
You are pulling images from the IBM Entitled Registry
  1. Go to 7. Configuring your cluster to pull software images.
You are mirroring images to a private container registry
  1. Review the guidance in Mirroring images to your private container registry to ensure you have a private container registry that meets the minimum requirements.
  2. Determine how you will mirror the images and complete the appropriate task:
  3. Go to 7. Configuring your cluster to pull software images.

7. Configuring your cluster to pull software images

You must ensure that your cluster is configured to pull the software images from the appropriate location.

Options What to do
You are pulling images from the IBM Entitled Registry
  1. Follow the guidance in Configuring your cluster to pull Cloud Pak for Data images to configure the global image pull secret to include your IBM entitlement API key.
  2. Go to 8. Creating catalog sources.
You are pulling images from a private container registry
  1. Follow the guidance in Configuring your cluster to pull Cloud Pak for Data images to:
    1. Configure the global image pull secret to include the credentials of an account that can pull images from the registry.
    2. Configure an image content source policy.
  2. Go to 8. Creating catalog sources.

8. Creating catalog sources

You must create catalog sources to ensure that your cluster uses the correct software images for your environment.

Options What to do
You are pulling images from the IBM Entitled Registry
  1. Review the guidance in Creating catalog sources to determine which method is appropriate for your environment.
  2. Go to 9. Are the IBM Cloud Pak foundational services already installed on your cluster?
You are pulling images from a private container registry
  1. Follow the guidance in Creating catalog sources for a private container registry.
  2. Go to 9. Are the IBM Cloud Pak foundational services already installed on your cluster?

9. Are the IBM Cloud Pak foundational services already installed on your cluster?

The IBM Cloud Pak foundational services are a prerequisite for Cloud Pak for Data. However, in some situations the IBM Cloud Pak for Data platform operator can automatically install the IBM Cloud Pak foundational services operators and services on the cluster.

For information about supported versions of IBM Cloud Pak foundational services, see the Cloud Pak for Data platform software requirements.

Options What to do
IBM Cloud Pak foundational services Version 3.18.0 or later is already installed
  1. Go to 10. Creating operator subscriptions
An earlier version of IBM Cloud Pak foundational services is installed
  1. Follow the guidance in Installing IBM Cloud Pak foundational services.
  2. Go to 10. Creating operator subscriptions.
IBM Cloud Pak foundational services is not installed and you are using the express installation method With the express installation method, all of the operators are in the same OpenShift project and the IBM Cloud Pak for Data platform operator can automatically install IBM Cloud Pak foundational services.
  1. Go to 10. Creating operator subscriptions.
IBM Cloud Pak foundational services is not installed and you are using the specialized installation method With the specialized installation method, the IBM Cloud Pak foundational services operators and the Cloud Pak for Data operators are in separate OpenShift projects. To ensure IBM Cloud Pak foundational services is installed in the correct project, you must manually install it.
  1. Follow the guidance in Installing IBM Cloud Pak foundational services.
  2. Go to 10. Creating operator subscriptions.

10. Creating operator subscriptions

An operator subscription tells the cluster where to install a given operator and gives information about the operator to Operator Lifecycle Manager (OLM).

  1. Complete the appropriate steps for your environment in Creating operator subscriptions.
  2. Go to 11. Do you plan to install services that require custom SCCs?

11. Do you plan to install services that require custom SCCs?

The following services require custom security context constraints:
  • Data Virtualization
  • Db2®
  • Db2 Big SQL
  • Db2 Warehouse
  • OpenPages®
  • Watson™ Knowledge Catalog
Options What to do
You plan to install one or more of these services
  1. Create the appropriate SCCs for your environment. For details, see Custom security context constraints for services.
  2. Go to 12. Do you plan to install services that require specific node settings?
You don't plan to install any of these services
  1. Go to 12. Do you plan to install services that require specific node settings?

12. Do you plan to install services that require specific node settings?

The following services require specific node settings:
  • Data Virtualization
  • Db2
  • Db2 Big SQL
  • Db2 Warehouse
  • Jupyter Notebooks with Python 3.7 for GPU
  • OpenPages
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Machine Learning Accelerator
  • Watson Studio

You might also need to adjust some node settings if you are working with large data sets or you have slower network speeds.

Options What to do
You plan to install one or more of these services, you have large data sets, or you have slower network speeds
  1. Change the appropriate node settings. For details, see Changing required node settings.
  2. Go to 13. Do you need to install the scheduling service?
You don't plan to install any of these services
  1. Go to 13. Do you need to install the scheduling service?

13. Do you need to install the scheduling service?

The scheduling service is required if you plan to install Watson Machine Learning Accelerator.

However, it is strongly recommended that you install the scheduling service so that you can programmatically enforce the quotas that you set on the platform and on individual services.

Options What to do
You need to install the scheduling service
  1. Follow the guidance in Installing the scheduling service.
  2. Go to 14. Installing Cloud Pak for Data.
You don't plan to install the scheduling service
  1. Go to 14. Installing Cloud Pak for Data.

14. Installing Cloud Pak for Data

Depending on the number of OpenShift projects you created, you can install one or more instances of Cloud Pak for Data on your cluster.
  1. Install Cloud Pak for Data.
  2. Go to 15. Completing post-installation tasks.

15. Completing post-installation tasks

After you install Cloud Pak for Data, make sure your cluster is secure and complete tasks that will impact how users interact with Cloud Pak for Data, such as configuring SSO or changing the route to the platform.
  1. Complete the appropriate tasks for your environment in Post-installation tasks.
  2. Go to 16. Installing services.

16. Installing services

You are ready to install services on your cluster.
  1. Instructions for installing IBM services are available in Services.