Installing IBM Cloud Pak for Data
A Red Hat® OpenShift® Container Platform cluster administrator and project administrator can work together to prepare the cluster and install IBM Cloud Pak® for Data.
Before you begin
Before you install Cloud Pak for Data, review the information in the Planning section.
Specifically, ensure that you review the System requirements. You must install the software on a cluster that has sufficient resources and that aligns with the guidance in the System requirements. For example, if you do not follow the specified disk requirements, you can run into out of memory errors.
1. Setting up a client workstation
To install IBM Cloud Pak for Data, you must have a client workstation that can connect to the Red Hat OpenShift Container Platform cluster.
- Cloud Pak for Data command-line interface
(
cpd-cli
) Version 12.0.6 or later. - OpenShift command-line interface
(
oc
) at a version that is compatible with your cluster.
Options | What to do |
---|---|
You already have a client workstation set up | |
You don't have a client workstation set up |
|
2. Collecting required information
To successfully install IBM Cloud Pak for Data, you must have specific information about your environment.
- a. Obtaining your IBM entitlement API key
- All IBM Cloud Pak for Data images are accessible from the IBM® Entitled
Registry. The IBM entitlement API key enables you to pull
software images from the IBM Entitled
Registry, either for
installation or for mirroring to a private container registry.
Options What to do You already have your API key You don't have your API key - b. Determining the list of components that you plan to install
- IBM Cloud Pak for Data is comprised of numerous components so
that you can install the specific services that support your needs. Before you install Cloud Pak for Data, determine which components you need to
install.
What to do - Review Determining which components to install to ensure that you:
- Install all the required components
- Know which tasks you must complete to prepare your cluster (some services have additional pre-installation configuration)
- Go to c. Collecting information about your cluster that can be used to set up environment variables.
- Review Determining which components to install to ensure that you:
- c. Collecting information about your cluster that can be used to set up environment variables
- The commands for installing and upgrading IBM Cloud Pak for Data use variables with the format
${VARIABLE_NAME}
. You can create a script to automatically export the appropriate values as environment variables before you run the installation commands. After you source the script, you will be able to copy most install and upgrade commands from the documentation and run them without making any changes.What to do - Complete Setting up installation environment variables.
- Go to 3. Preparing your cluster.
3. Preparing your cluster
Before you install Cloud Pak for Data, you must prepare your cluster.
- a. Do you have an existing Red Hat OpenShift Container Platform cluster?
-
Supported versions of Red Hat OpenShift Container Platform
Cloud Pak for Data can be installed on the following versions of Red Hat OpenShift Container Platform:
- Version 4.8.0 or later fixes
4.6.0 - 4.6.2 only
- Version 4.10.0 or later fixes
4.6.x
- Version 4.12.0 or later fixes
4.6.4 or later
Options What to do You are running a supported version of OpenShift You have an older version of OpenShift - Upgrade your cluster.
- If you are using self-managed OpenShift, see the Red Hat OpenShift Container Platform documentation.
- If you are using managed OpenShift, refer to the documentation for your OpenShift provider.
- Go to b. Do you need to run the installation in a restricted environment?
You don't have an OpenShift cluster - Version 4.8.0 or later fixes
- b. Do you need to run the installation in a restricted environment?
- If you need to run
cpd-cli manage
commands against a cluster in a restricted network, you must make theolm-utils
image available inside the cluster network.Options What to do Your cluster is not in a restricted network Your cluster is in a restricted network - Review the guidance in Running cpd-cli manage commands in a restricted network to determine which method to use to make the required image available to one or more workstations in the cluster.
- Go to c. Do you have supported persistent storage on your cluster?
- c. Do you have supported persistent storage on your cluster?
-
Supported storage for the Cloud Pak for Data platform
The Cloud Pak for Data platform supports the following storage:
Storage option Version Notes OpenShift Data Foundation - Version 4.8 or later fixes
4.6.0 - 4.6.2 only
- Version 4.10 or later fixes
4.6.x
- Version 4.12 or later fixes
4.6.4 or later
Available in either: - IBM Storage Fusion
- Red Hat OpenShift Platform Plus
Ensure that you install a version of OpenShift Data Foundation that is compatible with the version of Red Hat OpenShift Container Platform that you are running. For details, see https://access.redhat.com/articles/4731161.
IBM Storage Fusion - Version 2.4.0 or later fixes
- Version 2.5.2 or later fixes (Recommended)
Available in IBM Storage Fusion. IBM Storage Scale Container Native (with IBM Storage Scale Container Storage Interface) Version 5.1.5 or later fixes CSI Version 2.6.x or later fixes
Available in either: - IBM Storage Fusion
- IBM Storage Suite for IBM Cloud® Paks
Portworx - Version 2.9.1.3 or later fixes
- Version 2.12.2 or later fixes
NFS Version 3 or 4 Version 3 is recommended if you are using any of the following services: - Db2®
- Db2 Big SQL
- Db2 Warehouse
- Watson™ Knowledge Catalog
- Watson Query
If you use Version 4, ensure that your storage class uses NFS Version 3 as the mount option. For details, see Setting up dynamic provisioning.
Amazon Elastic Block Store (EBS) Not applicable In addition to EBS storage, your environment must also include EFS storage. Amazon Elastic File System (EFS) Not applicable It is recommended that you use both EBS and EFS storage. IBM Cloud Block Storage Not applicable In addition to IBM Cloud Block Storage, your environment must also include IBM Cloud File Storage. IBM Cloud File Storage Not applicable It is recommended that you use both IBM Cloud Block Storage and IBM Cloud File Storage storage. NetApp Trident Version 22.4.0 or later fixes
Options What to do You have supported storage - Ensure that you have storage that works with the services that you plan to install.
- Review Setting up persistent storage to determine whether you need to complete any additional tasks to configure the storage for Cloud Pak for Data.
- Go to d. Do you have the required OpenShift projects on your cluster?
You don't have supported storage - Decide which storage you want to use. Ensure that you choose storage that works with the services that you plan to install.
- Follow the guidance in Setting up persistent storage for installing and configuring the storage.
- Go to d. Do you have the required OpenShift projects on your cluster?
- Version 4.8 or later fixes
- d. Do you have the required OpenShift projects on your cluster?
- At a minimum, you must have a project where you will install the Cloud Pak for Data operators and a project where you will install
an instance of Cloud Pak for Data. You will need
additional projects if you want to:
- Separate the Cloud Pak for Data operators from the IBM Cloud Pak foundational services operators
- Install multiple instances of Cloud Pak for Data
- Deploy service instances or workloads in tethered projects
For details, see Supported project (namespace) configurations.
Options What to do You know which projects you plan to use when you install the software - Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to:
- Ensure that you have the necessary projects on your cluster
- Determine whether you need to label any projects
- Set up tethered projects, if applicable
- Go to e. Do you plan to install any services that require custom SCCs?
You don't know which projects you plan to use when you install the software - Review the guidance in Setting up projects (namespaces) on Red Hat OpenShift Container Platform to determine which projects you need to create on your cluster and then create the appropriate projects.
- Go to e. Do you plan to install any services that require custom SCCs?
- e. Do you plan to install any services that require custom SCCs?
-
Services that require custom SCCs
If you plan to install any of the following Cloud Pak for Data services, you must create the appropriate custom SCCs:
Service Required SCCs Db2 Db2 requires a custom SCC.By default, the SCC is created automatically; however, you can choose to create the SCC manually.
For details, see Creating the custom security context constraint for Db2.
Db2 Big SQL Db2 Big SQL embeds an instance of Db2, which requires a custom SCC. This SCC is used only by the instance of Db2 Big SQL that embeds the Db2 database.The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
Db2 Warehouse Db2 Warehouse requires a custom SCC.By default, the SCC is created automatically; however, you can choose to create the SCC manually.
For details, see Creating the custom security context constraint for Db2 Warehouse.
Informix® Informix requires a custom SCC.
You must create this SCC manually.
For details, see Creating the custom security context constraint for Informix.
OpenPages® The OpenPages service can optionally embed an instance of Db2.If you chose to use an embedded instance of Db2, OpenPages requires a custom SCC for the Db2 database. This SCC is used only by the instance of OpenPages that embeds the Db2 database.
The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
If you choose to use an external database, the custom SCC is not required.
Watson Knowledge Catalog Watson Knowledge Catalog requires two custom SCCs: - An SCC for Watson Knowledge
Catalog.
You must create this SCC manually.
For details, see Creating the custom security context constraint for Watson Knowledge Catalog.
- An SCC for the instance of Db2 that is
embedded in Watson Knowledge
Catalog. This SCC is used only
by the instance of Watson Knowledge
Catalog that embeds the
Db2 database.
The required SCC is created automatically when you install Watson Knowledge Catalog.
For details, see Creating the custom security context constraint for embedded Db2 databases.
If you install Data Privacy, the service uses the Watson Knowledge Catalog SCC.
Watson Query Watson Query embeds an instance of Db2, which requires a custom SCC. This SCC is used only by the instance of Watson Query that embeds the Db2 database.The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
Options What to do You plan to install one or more of these services - Create the appropriate SCCs for your environment. For details, see Creating custom security context constraints for services.
- Go to f. Do you plan to install any services that require specific node settings?
You don't plan to install any of these services - An SCC for Watson Knowledge
Catalog.
- f. Do you plan to install any services that require specific node settings?
-
Services that require node settings
Node setting Services that require changes to the setting Load balancer timeout settings- Db2
- Db2 Data Gate
- Db2 Warehouse
- OpenPages
- Watson Discovery
- Watson Knowledge Catalog
- Watson Query
- Watson Speech services
- Watson Studio
CRI-O container settings- Cognos® Analytics
- Db2
- Db2 Big SQL
- Db2 Warehouse
- Watson Discovery
- Watson Knowledge Catalog
- Watson Query
- Watson Studio
- Watson Machine Learning Accelerator
Kernel parameter settings- Db2
- Db2 Big SQL
- Db2 Warehouse
- Watson Knowledge Catalog
- Watson Query
GPU settings- Runtime 22.1 with Python 3.9 for GPU
- Runtime 22.2 with Python 3.10 for GPU
- Watson Machine Learning Accelerator
Options What to do You plan to install one or more of these services - Update the node settings. For details, see Changing required node settings.
- Go to g. How are you going to access the software images?
You don't plan to install any of these services - g. How are you going to access the software images?
- Cloud Pak for Data images are accessible from the
IBM Entitled
Registry. In most situations, it is
strongly recommended that you mirror the necessary software images from the IBM Entitled
Registry to a private container registry.
Where should you pull images from?
Important:You must mirror the necessary images to your private container registry in the following situations:- Your cluster is air-gapped (also called an offline or disconnected cluster).
- Your cluster uses an allowlist to permit direct access by specific sites, and the allowlist does not include the IBM Entitled Registry.
- Your cluster uses a blocklist to prevent direct access by specific sites, and the blocklist includes the IBM Entitled Registry.
Even if these situations do not apply to your environment, you should consider using a private container registry if you want to:- Run security scans against the software images before you install them on your cluster
- Ensure that you have the same images available for multiple deployments, such as development or test environments and production environments
The only situation in which you might consider pulling images directly from the IBM Entitled Registry is when your cluster is not air-gapped, your network is extremely reliable, and latency is not a concern. However, for predictable and reliable performance, you should mirror the images to a private container registry.
Options What to do You are pulling images from the IBM Entitled Registry You are pulling images from a private container registry
4. Installing the Cloud Pak for Data platform and services
After you prepare your cluster, you can install the Cloud Pak for Data platform and services.
What to do |
---|
|
5. Completing post-installation tasks
After you install Cloud Pak for Data, make sure your cluster is secure and complete tasks that will impact how users interact with Cloud Pak for Data, such as configuring SSO or changing the route to the platform.
What to do |
---|
Complete the appropriate tasks for your environment in Post-installation setup (Day 1 operations). |
6. Installing services
Options | What to do |
---|---|
You installed the services when you installed the platform | Best practice Review Getting started with Cloud Pak for Data. |
You didn't install the services when you installed the platform | Install the services that you want to use. See the instructions for installing each service individually in Services. |