Installing IBM Cloud Pak for Data

A Red Hat® OpenShift® Container Platform cluster administrator and project administrator can work together to prepare the cluster and install IBM Cloud Pak® for Data.

Before you begin

Before you install Cloud Pak for Data, review the information in the Planning section.

Specifically, ensure that you review the System requirements. You must install the software on a cluster that has sufficient resources and that aligns with the guidance in the System requirements. For example, if you do not follow the specified disk requirements, you can run into out of memory errors.

1. Setting up a client workstation

To install IBM Cloud Pak for Data, you must have a client workstation that can connect to the Red Hat OpenShift Container Platform cluster.

Tip: You can set up multiple client workstations if you want to enable multiple people to work on the installation.
The client workstation must be a Windows, Mac OS, or Linux® machine with the following software installed:
  • Cloud Pak for Data command-line interface (cpd-cli) Version 12.0.6 or later.
  • OpenShift command-line interface (oc) at a version that is compatible with your cluster.
Options What to do
You already have a client workstation set up
  1. Go to 2. Collecting required information.
You don't have a client workstation set up
  1. Review the guidance in Setting up a client workstation.
  2. Complete the following tasks to install the required software on the client workstation:
    1. Installing the IBM Cloud Pak for Data command-line interface
    2. Installing the OpenShift command-line interface
  3. Go to 2. Collecting required information.

2. Collecting required information

To successfully install IBM Cloud Pak for Data, you must have specific information about your environment.

a. Obtaining your IBM entitlement API key
All IBM Cloud Pak for Data images are accessible from the IBM® Entitled Registry. The IBM entitlement API key enables you to pull software images from the IBM Entitled Registry, either for installation or for mirroring to a private container registry.
Options What to do
You already have your API key
  1. Go to b. Determining the list of components that you plan to install.
You don't have your API key
  1. Complete Obtaining your IBM entitlement API key.
  2. Go to b. Determining the list of components that you plan to install.
b. Determining the list of components that you plan to install
IBM Cloud Pak for Data is comprised of numerous components so that you can install the specific services that support your needs. Before you install Cloud Pak for Data, determine which components you need to install.
What to do
  1. Review Determining which components to install to ensure that you:
    • Install all the required components
    • Know which tasks you must complete to prepare your cluster (some services have additional pre-installation configuration)
  2. Go to c. Collecting information about your cluster that can be used to set up environment variables.
c. Collecting information about your cluster that can be used to set up environment variables
The commands for installing and upgrading IBM Cloud Pak for Data use variables with the format ${VARIABLE_NAME}. You can create a script to automatically export the appropriate values as environment variables before you run the installation commands. After you source the script, you will be able to copy most install and upgrade commands from the documentation and run them without making any changes.
What to do
  1. Complete Setting up installation environment variables.
  2. Go to 3. Preparing your cluster.

3. Preparing your cluster

Before you install Cloud Pak for Data, you must prepare your cluster.

a. Do you have an existing Red Hat OpenShift Container Platform cluster?

Supported versions of Red Hat OpenShift Container Platform

Cloud Pak for Data can be installed on the following versions of Red Hat OpenShift Container Platform:

  • Version 4.8.0 or later fixes

    4.6.0 - 4.6.2 only

  • Version 4.10.0 or later fixes

    4.6.x

  • Version 4.12.0 or later fixes

    4.6.4 or later


Options What to do
You are running a supported version of OpenShift
  1. Go to b. Do you need to run the installation in a restricted environment?
You have an older version of OpenShift
  1. Upgrade your cluster.
  2. Go to b. Do you need to run the installation in a restricted environment?
You don't have an OpenShift cluster
  1. Complete Installing Red Hat OpenShift Container Platform.
  2. Go to b. Do you need to run the installation in a restricted environment?
b. Do you need to run the installation in a restricted environment?
If you need to run cpd-cli manage commands against a cluster in a restricted network, you must make the olm-utils image available inside the cluster network.
Options What to do
Your cluster is not in a restricted network
  1. Go to c. Do you have supported persistent storage on your cluster?
Your cluster is in a restricted network
  1. Review the guidance in Running cpd-cli manage commands in a restricted network to determine which method to use to make the required image available to one or more workstations in the cluster.
  2. Go to c. Do you have supported persistent storage on your cluster?
c. Do you have supported persistent storage on your cluster?

Supported storage for the Cloud Pak for Data platform

The Cloud Pak for Data platform supports the following storage:

Storage option Version Notes
OpenShift Data Foundation
  • Version 4.8 or later fixes

    4.6.0 - 4.6.2 only

  • Version 4.10 or later fixes

    4.6.x

  • Version 4.12 or later fixes

    4.6.4 or later

Available in either:
  • IBM Storage Fusion
  • Red Hat OpenShift Platform Plus

Ensure that you install a version of OpenShift Data Foundation that is compatible with the version of Red Hat OpenShift Container Platform that you are running. For details, see https://access.redhat.com/articles/4731161.

IBM Storage Fusion
  • Version 2.4.0 or later fixes
  • Version 2.5.2 or later fixes (Recommended)
Available in IBM Storage Fusion.
IBM Storage Scale Container Native (with IBM Storage Scale Container Storage Interface) Version 5.1.5 or later fixes

CSI Version 2.6.x or later fixes

Available in either:
  • IBM Storage Fusion
  • IBM Storage Suite for IBM Cloud® Paks
Portworx
  • Version 2.9.1.3 or later fixes
  • Version 2.12.2 or later fixes
 
NFS Version 3 or 4 Version 3 is recommended if you are using any of the following services:
  • Db2®
  • Db2 Big SQL
  • Db2 Warehouse
  • Watson™ Knowledge Catalog
  • Watson Query

If you use Version 4, ensure that your storage class uses NFS Version 3 as the mount option. For details, see Setting up dynamic provisioning.

Amazon Elastic Block Store (EBS) Not applicable In addition to EBS storage, your environment must also include EFS storage.
Amazon Elastic File System (EFS) Not applicable It is recommended that you use both EBS and EFS storage.
IBM Cloud Block Storage Not applicable In addition to IBM Cloud Block Storage, your environment must also include IBM Cloud File Storage.
IBM Cloud File Storage Not applicable It is recommended that you use both IBM Cloud Block Storage and IBM Cloud File Storage storage.
NetApp Trident Version 22.4.0 or later fixes  

Options What to do
You have supported storage
  1. Ensure that you have storage that works with the services that you plan to install.
  2. Review Setting up persistent storage to determine whether you need to complete any additional tasks to configure the storage for Cloud Pak for Data.
  3. Go to d. Do you have the required OpenShift projects on your cluster?
You don't have supported storage
  1. Decide which storage you want to use. Ensure that you choose storage that works with the services that you plan to install.
  2. Follow the guidance in Setting up persistent storage for installing and configuring the storage.
  3. Go to d. Do you have the required OpenShift projects on your cluster?
d. Do you have the required OpenShift projects on your cluster?
At a minimum, you must have a project where you will install the Cloud Pak for Data operators and a project where you will install an instance of Cloud Pak for Data. You will need additional projects if you want to:
  • Separate the Cloud Pak for Data operators from the IBM Cloud Pak foundational services operators
  • Install multiple instances of Cloud Pak for Data
  • Deploy service instances or workloads in tethered projects

For details, see Supported project (namespace) configurations.

Options What to do
You know which projects you plan to use when you install the software
  1. Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to:
    • Ensure that you have the necessary projects on your cluster
    • Determine whether you need to label any projects
    • Set up tethered projects, if applicable
  2. Go to e. Do you plan to install any services that require custom SCCs?
You don't know which projects you plan to use when you install the software
  1. Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to determine which projects you need to create on your cluster and then create the appropriate projects.
  2. Go to e. Do you plan to install any services that require custom SCCs?
e. Do you plan to install any services that require custom SCCs?

Services that require custom SCCs

If you plan to install any of the following Cloud Pak for Data services, you must create the appropriate custom SCCs:

Service Required SCCs
Db2
Db2 requires a custom SCC.

By default, the SCC is created automatically; however, you can choose to create the SCC manually.

For details, see Creating the custom security context constraint for Db2.

Db2 Big SQL
Db2 Big SQL embeds an instance of Db2, which requires a custom SCC. This SCC is used only by the instance of Db2 Big SQL that embeds the Db2 database.

The required SCC is created automatically.

For details, see Creating the custom security context constraint for embedded Db2 databases.

Db2 Warehouse
Db2 Warehouse requires a custom SCC.

By default, the SCC is created automatically; however, you can choose to create the SCC manually.

For details, see Creating the custom security context constraint for Db2 Warehouse.

Informix®

Informix requires a custom SCC.

You must create this SCC manually.

For details, see Creating the custom security context constraint for Informix.

OpenPages®
The OpenPages service can optionally embed an instance of Db2.

If you chose to use an embedded instance of Db2, OpenPages requires a custom SCC for the Db2 database. This SCC is used only by the instance of OpenPages that embeds the Db2 database.

The required SCC is created automatically.

For details, see Creating the custom security context constraint for embedded Db2 databases.

If you choose to use an external database, the custom SCC is not required.

Watson Knowledge Catalog Watson Knowledge Catalog requires two custom SCCs:

If you install Data Privacy, the service uses the Watson Knowledge Catalog SCC.

Watson Query
Watson Query embeds an instance of Db2, which requires a custom SCC. This SCC is used only by the instance of Watson Query that embeds the Db2 database.

The required SCC is created automatically.

For details, see Creating the custom security context constraint for embedded Db2 databases.


Options What to do
You plan to install one or more of these services
  1. Create the appropriate SCCs for your environment. For details, see Creating custom security context constraints for services.
  2. Go to f. Do you plan to install any services that require specific node settings?
You don't plan to install any of these services
  1. Go to f. Do you plan to install any services that require specific node settings?
f. Do you plan to install any services that require specific node settings?

Services that require node settings
Node setting Services that require changes to the setting
Load balancer timeout settings
  • Db2
  • Db2 Data Gate
  • Db2 Warehouse
  • OpenPages
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Query
  • Watson Speech services
  • Watson Studio
CRI-O container settings
  • Cognos® Analytics
  • Db2
  • Db2 Big SQL
  • Db2 Warehouse
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Query
  • Watson Studio
  • Watson Machine Learning Accelerator
Kernel parameter settings
  • Db2
  • Db2 Big SQL
  • Db2 Warehouse
  • Watson Knowledge Catalog
  • Watson Query
GPU settings
  • Runtime 22.1 with Python 3.9 for GPU
  • Runtime 22.2 with Python 3.10 for GPU
  • Watson Machine Learning Accelerator

Options What to do
You plan to install one or more of these services
  1. Update the node settings. For details, see Changing required node settings.
  2. Go to g. How are you going to access the software images?
You don't plan to install any of these services
  1. Go to g. How are you going to access the software images?
g. How are you going to access the software images?
Cloud Pak for Data images are accessible from the IBM Entitled Registry. In most situations, it is strongly recommended that you mirror the necessary software images from the IBM Entitled Registry to a private container registry.
Where should you pull images from?
Important:
You must mirror the necessary images to your private container registry in the following situations:
  • Your cluster is air-gapped (also called an offline or disconnected cluster).
  • Your cluster uses an allowlist to permit direct access by specific sites, and the allowlist does not include the IBM Entitled Registry.
  • Your cluster uses a blocklist to prevent direct access by specific sites, and the blocklist includes the IBM Entitled Registry.
Even if these situations do not apply to your environment, you should consider using a private container registry if you want to:
  • Run security scans against the software images before you install them on your cluster
  • Ensure that you have the same images available for multiple deployments, such as development or test environments and production environments

The only situation in which you might consider pulling images directly from the IBM Entitled Registry is when your cluster is not air-gapped, your network is extremely reliable, and latency is not a concern. However, for predictable and reliable performance, you should mirror the images to a private container registry.


Options What to do
You are pulling images from the IBM Entitled Registry
  1. Complete Updating the global image pull secret.
  2. Go to 4. Installing the Cloud Pak for Data platform and services.
You are pulling images from a private container registry
  1. Complete Updating the global image pull secret.
  2. Complete Mirroring images to a private container registry.
  3. Go to 4. Installing the Cloud Pak for Data platform and services.

4. Installing the Cloud Pak for Data platform and services

After you prepare your cluster, you can install the Cloud Pak for Data platform and services.

What to do
  1. Complete the appropriate tasks for your environment in Installing the IBM Cloud Pak for Data platform and services.
  2. Go to 5. Completing post-installation tasks.

5. Completing post-installation tasks

After you install Cloud Pak for Data, make sure your cluster is secure and complete tasks that will impact how users interact with Cloud Pak for Data, such as configuring SSO or changing the route to the platform.

What to do
Complete the appropriate tasks for your environment in Post-installation setup (Day 1 operations).

6. Installing services

Options What to do
You installed the services when you installed the platform Best practice Review Getting started with Cloud Pak for Data.
You didn't install the services when you installed the platform Install the services that you want to use. See the instructions for installing each service individually in Services.