Installing IBM Cloud Pak for Data

A Red Hat® OpenShift® Container Platform cluster administrator and project administrator can work together to prepare the cluster and install IBM Cloud Pak® for Data.

Before you begin

Before you install Cloud Pak for Data, review the information in the Planning section.

Specifically, ensure that you review the System requirements. You must install the software on a cluster that has sufficient resources and that aligns with the guidance in the System requirements. For example, if you do not follow the specified disk requirements, you can run into out of memory errors.

1. Setting up a client workstation

To install IBM Cloud Pak for Data, you must have a client workstation that can connect to the Red Hat OpenShift Container Platform cluster.

Tip: You can set up multiple client workstations if you want to enable multiple people to work on the installation.

The client workstation must be a Windows, Mac OS, or Linux® machine with the following software installed:

Cloud Pak for Data command-line interface (cpd-cli) Version 12.0.6 or later.
OpenShift command-line interface (oc) at a version that is compatible with your cluster.

Options	What to do
You already have a client workstation set up	Go to 2. Collecting required information.
You don't have a client workstation set up	Review the guidance in Setting up a client workstation. Complete the following tasks to install the required software on the client workstation: Installing the IBM Cloud Pak for Data command-line interface Installing the OpenShift command-line interface Go to 2. Collecting required information.

2. Collecting required information

To successfully install IBM Cloud Pak for Data, you must have specific information about your environment.

a. Obtaining your IBM entitlement API key

All IBM Cloud Pak for Data images are accessible from the IBM® Entitled Registry. The IBM entitlement API key enables you to pull software images from the IBM Entitled Registry, either for installation or for mirroring to a private container registry.

Options	What to do
You already have your API key	Go to b. Determining the list of components that you plan to install.
You don't have your API key	Complete Obtaining your IBM entitlement API key. Go to b. Determining the list of components that you plan to install.

b. Determining the list of components that you plan to install

IBM Cloud Pak for Data is comprised of numerous components so that you can install the specific services that support your needs. Before you install Cloud Pak for Data, determine which components you need to install.

What to do
Review Determining which components to install to ensure that you: Install all the required components Know which tasks you must complete to prepare your cluster (some services have additional pre-installation configuration) Go to c. Collecting information about your cluster that can be used to set up environment variables.

c. Collecting information about your cluster that can be used to set up environment variables

The commands for installing and upgrading IBM Cloud Pak for Data use variables with the format ${VARIABLE_NAME}. You can create a script to automatically export the appropriate values as environment variables before you run the installation commands. After you source the script, you will be able to copy most install and upgrade commands from the documentation and run them without making any changes.

What to do
Complete Setting up installation environment variables. Go to 3. Preparing your cluster.

3. Preparing your cluster

Before you install Cloud Pak for Data, you must prepare your cluster.

a. Do you have an existing Red Hat OpenShift Container Platform cluster?

Supported versions of Red Hat OpenShift Container Platform

Cloud Pak for Data can be installed on the following versions of Red Hat OpenShift Container Platform:

Version 4.8.0 or later fixes
4.6.0 - 4.6.2 only
Version 4.10.0 or later fixes
4.6.x
Version 4.12.0 or later fixes
4.6.4 or later

Options	What to do
You are running a supported version of OpenShift	Go to b. Do you need to run the installation in a restricted environment?
You have an older version of OpenShift	Upgrade your cluster. If you are using self-managed OpenShift, see the Red Hat OpenShift Container Platform documentation. If you are using managed OpenShift, refer to the documentation for your OpenShift provider. Go to b. Do you need to run the installation in a restricted environment?
You don't have an OpenShift cluster	Complete Installing Red Hat OpenShift Container Platform. Go to b. Do you need to run the installation in a restricted environment?

b. Do you need to run the installation in a restricted environment?

If you need to run

cpd-cli
manage

commands against a cluster in a restricted network, you must make the olm-utils image available inside the cluster network.

Options	What to do
Your cluster is not in a restricted network	Go to c. Do you have supported persistent storage on your cluster?
Your cluster is in a restricted network	Review the guidance in Running cpd-cli manage commands in a restricted network to determine which method to use to make the required image available to one or more workstations in the cluster. Go to c. Do you have supported persistent storage on your cluster?

c. Do you have supported persistent storage on your cluster?

Supported storage for the Cloud Pak for Data platform

The Cloud Pak for Data platform supports the following storage:

Storage option	Version	Notes
OpenShift Data Foundation	Version 4.8 or later fixes 4.6.0 - 4.6.2 only Version 4.10 or later fixes 4.6.x Version 4.12 or later fixes 4.6.4 or later	Available in either: IBM Storage Fusion Red Hat OpenShift Platform Plus Ensure that you install a version of OpenShift Data Foundation that is compatible with the version of Red Hat OpenShift Container Platform that you are running. For details, see https://access.redhat.com/articles/4731161.
IBM Storage Fusion	Version 2.4.0 or later fixes Version 2.5.2 or later fixes (Recommended)	Available in IBM Storage Fusion.
IBM Storage Scale Container Native (with IBM Storage Scale Container Storage Interface)	Version 5.1.5 or later fixes CSI Version 2.6.x or later fixes	Available in either: IBM Storage Fusion IBM Storage Suite for IBM Cloud® Paks
Portworx	Version 2.9.1.3 or later fixes Version 2.12.2 or later fixes
NFS	Version 3 or 4	Version 3 is recommended if you are using any of the following services: Db2® Db2 Big SQL Db2 Warehouse Watson Knowledge Catalog Watson Query If you use Version 4, ensure that your storage class uses NFS Version 3 as the mount option. For details, see Setting up dynamic provisioning.
Amazon Elastic Block Store (EBS)	Not applicable	In addition to EBS storage, your environment must also include EFS storage.
Amazon Elastic File System (EFS)	Not applicable	It is recommended that you use both EBS and EFS storage.
IBM Cloud Block Storage	Not applicable	In addition to IBM Cloud Block Storage, your environment must also include IBM Cloud File Storage.
IBM Cloud File Storage	Not applicable	It is recommended that you use both IBM Cloud Block Storage and IBM Cloud File Storage storage.
NetApp Trident	Version 22.4.0 or later fixes

Options	What to do
You have supported storage	Ensure that you have storage that works with the services that you plan to install. Review Setting up persistent storage to determine whether you need to complete any additional tasks to configure the storage for Cloud Pak for Data. Go to d. Do you have the required OpenShift projects on your cluster?
You don't have supported storage	Decide which storage you want to use. Ensure that you choose storage that works with the services that you plan to install. Follow the guidance in Setting up persistent storage for installing and configuring the storage. Go to d. Do you have the required OpenShift projects on your cluster?

d. Do you have the required OpenShift projects on your cluster?

At a minimum, you must have a project where you will install the Cloud Pak for Data operators and a project where you will install an instance of Cloud Pak for Data. You will need additional projects if you want to:

Separate the Cloud Pak for Data operators from the IBM Cloud Pak foundational services operators
Install multiple instances of Cloud Pak for Data
Deploy service instances or workloads in tethered projects

For details, see Supported project (namespace) configurations.

Options	What to do
You know which projects you plan to use when you install the software	Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to: Ensure that you have the necessary projects on your cluster Determine whether you need to label any projects Set up tethered projects, if applicable Go to e. Do you plan to install any services that require custom SCCs?
You don't know which projects you plan to use when you install the software	Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to determine which projects you need to create on your cluster and then create the appropriate projects. Go to e. Do you plan to install any services that require custom SCCs?

e. Do you plan to install any services that require custom SCCs?

Services that require custom SCCs

If you plan to install any of the following Cloud Pak for Data services, you must create the appropriate custom SCCs:

Service	Required SCCs
Db2	Db2 requires a custom SCC. By default, the SCC is created automatically; however, you can choose to create the SCC manually. For details, see Creating the custom security context constraint for Db2.
Db2 Big SQL	Db2 Big SQL embeds an instance of Db2, which requires a custom SCC. This SCC is used only by the instance of Db2 Big SQL that embeds the Db2 database. The required SCC is created automatically. For details, see Creating the custom security context constraint for embedded Db2 databases.
Db2 Warehouse	Db2 Warehouse requires a custom SCC. By default, the SCC is created automatically; however, you can choose to create the SCC manually. For details, see Creating the custom security context constraint for Db2 Warehouse.
Informix®	Informix requires a custom SCC. You must create this SCC manually. For details, see Creating the custom security context constraint for Informix.
OpenPages®	The OpenPages service can optionally embed an instance of Db2. If you chose to use an embedded instance of Db2, OpenPages requires a custom SCC for the Db2 database. This SCC is used only by the instance of OpenPages that embeds the Db2 database. The required SCC is created automatically. For details, see Creating the custom security context constraint for embedded Db2 databases. If you choose to use an external database, the custom SCC is not required.
Watson Knowledge Catalog	Watson Knowledge Catalog requires two custom SCCs: An SCC for Watson Knowledge Catalog. You must create this SCC manually. For details, see Creating the custom security context constraint for Watson Knowledge Catalog. An SCC for the instance of Db2 that is embedded in Watson Knowledge Catalog. This SCC is used only by the instance of Watson Knowledge Catalog that embeds the Db2 database. The required SCC is created automatically when you install Watson Knowledge Catalog. For details, see Creating the custom security context constraint for embedded Db2 databases. If you install Data Privacy, the service uses the Watson Knowledge Catalog SCC.
Watson Query	Watson Query embeds an instance of Db2, which requires a custom SCC. This SCC is used only by the instance of Watson Query that embeds the Db2 database. The required SCC is created automatically. For details, see Creating the custom security context constraint for embedded Db2 databases.

Options	What to do
You plan to install one or more of these services	Create the appropriate SCCs for your environment. For details, see Creating custom security context constraints for services. Go to f. Do you plan to install any services that require specific node settings?
You don't plan to install any of these services	Go to f. Do you plan to install any services that require specific node settings?

f. Do you plan to install any services that require specific node settings?

Services that require node settings

Node setting	Services that require changes to the setting
Load balancer timeout settings	Db2 Db2 Data Gate Db2 Warehouse OpenPages Watson Discovery Watson Knowledge Catalog Watson Query Watson Speech services Watson Studio
CRI-O container settings	Cognos® Analytics Db2 Db2 Big SQL Db2 Warehouse Watson Discovery Watson Knowledge Catalog Watson Query Watson Studio Watson Machine Learning Accelerator
Kernel parameter settings	Db2 Db2 Big SQL Db2 Warehouse Watson Knowledge Catalog Watson Query
GPU settings	Runtime 22.1 with Python 3.9 for GPU Runtime 22.2 with Python 3.10 for GPU Watson Machine Learning Accelerator

Options	What to do
You plan to install one or more of these services	Update the node settings. For details, see Changing required node settings. Go to g. How are you going to access the software images?
You don't plan to install any of these services	Go to g. How are you going to access the software images?

g. How are you going to access the software images?

Cloud Pak for Data images are accessible from the IBM Entitled Registry. In most situations, it is strongly recommended that you mirror the necessary software images from the IBM Entitled Registry to a private container registry.

Where should you pull images from?

Important:

You must mirror the necessary images to your private container registry in the following situations:

Your cluster is air-gapped (also called an offline or disconnected cluster).
Your cluster uses an allowlist to permit direct access by specific sites, and the allowlist does not include the IBM Entitled Registry.
Your cluster uses a blocklist to prevent direct access by specific sites, and the blocklist includes the IBM Entitled Registry.

Even if these situations do not apply to your environment, you should consider using a private container registry if you want to:

Run security scans against the software images before you install them on your cluster
Ensure that you have the same images available for multiple deployments, such as development or test environments and production environments

The only situation in which you might consider pulling images directly from the IBM Entitled Registry is when your cluster is not air-gapped, your network is extremely reliable, and latency is not a concern. However, for predictable and reliable performance, you should mirror the images to a private container registry.

Options	What to do
You are pulling images from the IBM Entitled Registry	Complete Updating the global image pull secret. Go to 4. Installing the Cloud Pak for Data platform and services.
You are pulling images from a private container registry	Complete Updating the global image pull secret. Complete Mirroring images to a private container registry. Go to 4. Installing the Cloud Pak for Data platform and services.

4. Installing the Cloud Pak for Data platform and services

After you prepare your cluster, you can install the Cloud Pak for Data platform and services.

What to do
Complete the appropriate tasks for your environment in Installing the IBM Cloud Pak for Data platform and services. Go to 5. Completing post-installation tasks.

5. Completing post-installation tasks

After you install Cloud Pak for Data, make sure your cluster is secure and complete tasks that will impact how users interact with Cloud Pak for Data, such as configuring SSO or changing the route to the platform.

What to do
Complete the appropriate tasks for your environment in Post-installation setup (Day 1 operations).

6. Installing services

Options	What to do
You installed the services when you installed the platform	Best practice Review Getting started with Cloud Pak for Data.
You didn't install the services when you installed the platform	Install the services that you want to use. See the instructions for installing each service individually in Services.