Installing IBM Cloud Pak for Data
A Red Hat® OpenShift® Container Platform cluster administrator and an instance administrator can work together to prepare the cluster and install IBM Cloud Pak® for Data.
Upgrade to IBM Cloud Pak for Data Version 5.0 before Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to Version 5.0.
Before you begin
Before you install Cloud Pak for Data, review the information in the Planning section. Specifically ensure that you review the following sections:
Section | What you need to know |
---|---|
System requirements | You must install the software on a cluster that has sufficient resources and that aligns with the guidance in the System requirements. For example, if you do not follow the specified disk requirements, you can encounter out of memory errors. |
Storage considerations | You must install the software on persistent storage that is accessible to your cluster, meets the stated requirements, and works with the services that you plan to install. For example, if your network speeds are too slow or if the disks do not have sufficient I/O performance, services can experience poor performance or cluster instability. |
Installation overview
The Cloud Pak for Data installation is broken up into the following phases:
- 1. Setting up a client workstation
- 2. Setting up a cluster
- 3. Collecting required information
- 4. Preparing to run installs in a restricted network
- 5. Preparing to run installs from a private container registry
- 6. Preparing the cluster for Cloud Pak for Data
- 7. Preparing to install an instance of Cloud Pak for Data
- 8. Installing an instance of Cloud Pak for Data
- 9. Completing post-installation tasks
- 10. Installing services
1. Setting up a client workstation
To install IBM Cloud Pak for Data, you must have a client workstation that can connect to the Red Hat OpenShift Container Platform cluster.
All administrators Repeat as needed
Client workstation requirements
- Cloud Pak for Data command-line interface
(
cpd-cli
) Version 13.1.7 or later. - OpenShift command-line interface
(
oc
) at a version that is compatible with your cluster.
What to do |
---|
|
2. Setting up a cluster
Before you install Cloud Pak for Data, you must install and setup a Red Hat OpenShift Container Platform cluster.
Cluster administrator One-time setup
- a. Do you have an existing Red Hat OpenShift Container Platform cluster?
-
Supported versions of Red Hat OpenShift Container Platform
Cloud Pak for Data can be installed on the following versions of Red Hat OpenShift Container Platform:
- Version 4.12 or later fixes
- Version 4.14 or later fixes
- Version 4.15 or later fixes
- Version 4.16 or later fixes
Options What to do You are running a supported version of OpenShift You have an older version of OpenShift - Upgrade your cluster.
- If you are using self-managed OpenShift, see the Red Hat OpenShift Container Platform documentation.
- If you are using managed OpenShift, refer to the documentation for your OpenShift provider.
- Go to b. Do you have supported persistent storage on your cluster?
You don't have an OpenShift cluster - b. Do you have supported persistent storage on your cluster?
-
Cloud Pak for Data software uses persistent storage. You must have persistent storage that is accessible from your cluster.
Supported storage for the Cloud Pak for Data platform
Storage option Version Notes OpenShift Data Foundation - Version 4.12 or later fixes
- Version 4.14 or later fixes
- Version 4.15 or later fixes
- Version 4.16 or later fixes
Available in Red Hat OpenShift Platform Plus. Ensure that you install a version of OpenShift Data Foundation that is compatible with the version of Red Hat OpenShift Container Platform that you are running. For details, see https://access.redhat.com/articles/4731161.
IBM® Storage Fusion Data Foundation Available in IBM Storage Fusion. Ensure that you install a version of IBM Storage Fusion Data Foundation that is compatible with the version of Red Hat OpenShift Container Platform that you are running.
If you are upgrading to IBM Cloud Pak for Data Version 4.8, upgrade your storage after you upgrade IBM Cloud Pak for Data.
IBM Storage Fusion Global Data Platform Available in IBM Storage Fusion or IBM Storage Fusion HCI System. If you are upgrading to IBM Cloud Pak for Data Version 4.8, upgrade your storage after you upgrade IBM Cloud Pak for Data.
IBM Storage Scale Container Native (with IBM Storage Scale Container Storage Interface) Version 5.1.7 or later fixes, with CSI Version 2.9.0 or later fixes Available in the following storage: - IBM Storage Fusion
- IBM Storage Suite for IBM Cloud® Paks
Portworx - Version 2.13.3 or later fixes
- Version 3.0.2 or later fixes
If you are running Red Hat OpenShift Container Platform Version 4.12, you must use Portworx Version 2.13.3 or later. If you are running Red Hat OpenShift Container Platform Version 4.14, you must use Portworx Version 3.0.2 or later.
NFS Version 3 or 4 Version 3 is recommended if you are using any of the following services:- Db2®
- Db2 Big SQL
- Db2 Warehouse
- IBM Knowledge Catalog
- Watson™ Query
If you use Version 4, ensure that your storage class uses NFS Version 3 as the mount option. For details, see Setting up dynamic provisioning.
Amazon Elastic Block Store (EBS) Not applicable In addition to EBS storage, your environment must also include EFS storage. Amazon Elastic File System (EFS) Not applicable It is recommended that you use both EBS and EFS storage. NetApp Trident Version 22.4.0 or later fixes This information applies to both self-managed and managed NetApp Trident.
Options What to do You have supported storage You don't have supported storage - c. Do you have a private container registry?
-
IBM Cloud Pak for Data software images are accessible from the IBM Entitled Registry. In most situations, it is strongly recommended that you mirror the necessary software images from the IBM Entitled Registry to a private container registry.
Where should you pull images from?
Important:You must mirror the Cloud Pak for Data software images to your private container registry in the following situations:- Your cluster is air-gapped (also called an offline or disconnected cluster).
- Your cluster uses an allowlist to permit direct access by specific sites, and the allowlist does not include the IBM Entitled Registry.
- Your cluster uses a blocklist to prevent direct access by specific sites, and the blocklist includes the IBM Entitled Registry.
Even if these situations do not apply to your environment, you should consider using a private container registry if you want to:- Run security scans against the software images before you install them on your cluster
- Ensure that you have the same images available for multiple deployments, such as development or test environments and production environments
The only situation in which you might consider pulling images directly from the IBM Entitled Registry is when your cluster is not air-gapped, your network is extremely reliable, and latency is not a concern. However, for predictable and reliable performance, you should mirror the images to a private container registry.
Options What to do You plan to pull images from the IBM Entitled Registry You plan to pull images from your existing private container registry You plan to pull images from a private container registry but don't have one yet
3. Collecting required information
To successfully install IBM Cloud Pak for Data, you must have specific information about your environment.
Cloud Pak for Data operations team Cluster administrator Repeat as needed
- a. Obtaining your IBM entitlement API key
- All IBM Cloud Pak for Data images are accessible from the IBM Entitled Registry. The IBM entitlement API key enables you to pull
software images from the IBM Entitled Registry, either for
installation or for mirroring to a private container registry.
Options What to do You already have your API key You don't have your API key - b. Determining the list of components that you plan to install
- IBM Cloud Pak for Data is composed of numerous components so that
you can install the specific services that support your needs. Before you install Cloud Pak for Data, determine which components you need to install
to support your business requirements.
What to do - c. Collecting information about your cluster that can be used to set up environment variables
- The commands for installing and upgrading IBM Cloud Pak for Data use variables with the format
${VARIABLE_NAME}
. You can create a script to automatically export the appropriate values as environment variables before you run the installation commands. After you source the script, you will be able to copy most install and upgrade commands from the documentation and run them without making any changes.What to do - Complete Setting up installation environment variables.
- Go to the appropriate section based on your environment:
- If your cluster is in a restricted network, go to 4. Preparing to run installs in a restricted network.
- If you plan to mirror images to a private container registry, go to 5. Preparing to run installs from a private container registry.
- If neither of the preceding scenarios apply, go to 6. Preparing the cluster for Cloud Pak for Data.
4. Preparing to run installs in a restricted network
If you will run the IBM Cloud Pak for Data installation commands in a restricted network, you must prepare the client workstations before you move them behind your firewall.
All administrators Repeat as needed
What to do |
---|
|
5. Preparing to run installs from a private container registry
If you plan to use a private container registry to host the IBM Cloud Pak for Data software images, you must mirror the images from the IBM Entitled Registry and configure the cluster to pull the images from the private container registry.
Different users need to complete the appropriate tasks.
Some of these tasks can be completed once, but some of the tasks must be repeated for each user involved in the installation.
- a. Mirroring the Cloud Pak for Data images to the private container registry
-
If your cluster is in a restricted network or if you want to ensure that all images are pulled from a trusted source, mirror the IBM Cloud Pak for Data images to your private container registry.
Registry administrator Repeat as needed
What to do - b. Configuring an image content source policy
-
If you mirror images to a private container registry, you must tell your cluster where to find the software images by creating an image content source policy or image digest mirror set.
Cluster administrator One-time setup
What to do - c. Do users need to pull the
olm-utils-v2
image from the private container registry? -
If the
olm-utils-v2
image is available in the private container registry, you must update thecpd-cli
to pull the image from the private container registry.All administrators Repeat as needed
Options What to do Your cluster is not in a restricted network, and users can pull the image from the IBM Entitled Registry Go to 6. Preparing the cluster for Cloud Pak for Data. Your cluster is not in a restricted network, but you want users to pull the image from the private container registry Your cluster is in a restricted network
6. Preparing the cluster for Cloud Pak for Data
Before you install Cloud Pak for Data, you must prepare your Red Hat OpenShift Container Platform cluster.
Cluster administrator One-time setup
- a. Updating the global image pull secret
- The global image pull secret ensures that your cluster has the
necessary credentials to pull images. The credentials that you add to the global image pull secret
depend on where you want to pull images from.
What to do - b. Do you want to allow the
cpd-cli
to create projects (namespaces) for you? -
The IBM Cloud Pak for Data command-line interface can automatically create any projects that don't exist on the cluster. However, you can choose to create the projects for the shared cluster components manually.
Options What to do You will allow the cpd-cli
to create projectsYou won't allow the cpd-cli
to create projects - c. Installing shared cluster components
-
Before you install IBM Cloud Pak for Data, you must install the IBM Cloud Pak foundational services Certificate manager and License Service. You can optionally install the Cloud Pak for Data scheduling service.
What to do - d. Configuring persistent storage for Cloud Pak for Data
-
Before you can install IBM Cloud Pak for Data, you must ensure that the persistent storage on your Red Hat OpenShift cluster is configured for dynamic provisioning and includes the appropriate storage classes.
What to do - Complete the appropriate tasks in Configuring persistent storage for IBM Cloud Pak for Data.
- Go to e. Do you plan to install any services that require custom SCCs?
- e. Do you plan to install any services that require custom SCCs?
-
If you plan to install services that require a custom security context constraint, you might need to create the appropriate SCCs manually. However, some custom SCCs are created automatically.
Services that require custom SCCs
Service Required SCCs Db2 Db2 requires a custom SCC.By default, the SCC is created automatically; however, you can choose to create the SCC manually.
For details, see Creating the custom security context constraint for Db2.
Db2 Big SQL Db2 Big SQL uses an embedded Db2 database, which requires a custom SCC. The SCC is used only by the instance of Db2 Big SQL that embeds the Db2 database.The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
Db2 Warehouse Db2 Warehouse requires a custom SCC.By default, the SCC is created automatically; however, you can choose to create the SCC manually.
For details, see Creating the custom security context constraint for Db2 Warehouse.
IBM Knowledge Catalog IBM Knowledge Catalog uses an embedded Db2 database, which requires a custom SCC. The SCC is used only by the instance of IBM Knowledge Catalog that embeds the Db2 database.The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
Informix® Informix requires a custom SCC.
You must create this SCC manually.
For details, see Creating the custom security context constraint for Informix.
OpenPages® The OpenPages service can optionally embed a Db2 database.If you chose to use an embedded Db2 database, OpenPages requires a custom SCC for the Db2 database. The SCC is used only by the instance of OpenPages that embeds the Db2 database.
The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
If you choose to use an external database, the custom SCC is not required.
Watson Query Watson Query uses an embedded Db2 database, which requires a custom SCC. The SCC is used only by the instance of Watson Query that embeds the Db2 database.The required SCC is created automatically.
For details, see Creating the custom security context constraint for embedded Db2 databases.
watsonx.governance If you install the OpenPages component of watsonx.governance, you can either:- Use an existing OpenPages service instance
- Use the default OpenPages service
instance that is created when you install watsonx.governance.
The default OpenPages instance uses an embedded Db2 database.
The embedded Db2 database requires a custom SCC, which is created automatically. For details, see Creating the custom security context constraint for embedded Db2 databases.
Options What to do You are not installing services that require custom SCCs You are installing services that require custom SCCs, but all of the SCCs are created automatically You are installing services that require custom SCCs, but you want to create the SCCs manually - Complete the appropriate tasks in Creating custom security context constraints for services.
- Go to f. Do you plan to install any services that require specific node settings?
You are installing services that require custom SCCs, and you must create the SCC manually - Complete the appropriate tasks in Creating custom security context constraints for services.
- Go to f. Do you plan to install any services that require specific node settings?
- f. Do you plan to install any services that require specific node settings?
- Some services that run on IBM Cloud Pak for Data require specific settings on the nodes in
the cluster. To ensure that the cluster has the required settings for these services, adjust the
settings on the appropriate nodes in the cluster.
Services that require specific node settings
Node setting Services that require changes to the setting Load balancer timeout - Cognos® Dashboards
- Db2
- Db2 Data Gate
- Db2 Warehouse
- OpenPages
- Watson Discovery
- IBM Knowledge Catalog
- Watson Query
- Watson Speech services
- Watson Studio
- watsonx.ai
Even if you don't plan to install the preceding services, you might need to adjust the timeout settings if you are working with large data sets or you have slower network speeds. For example, you might need to increase the timeout value if you receive a timeout or failure when you upload a large file.
Process IDs limit - DataStage®
- Db2
- Db2 Big SQL
- Db2 Warehouse
- Watson Discovery
- IBM Knowledge Catalog
- Watson Query
- Watson Studio
- Watson Machine Learning Accelerator
Kernel parameter settings - Db2
- Db2 Big SQL
- Db2 Warehouse
- OpenPages
- IBM Knowledge Catalog
- Watson Query
GPU settings - Runtime 23.1 on Python 3.10 for GPU
- Runtime 22.2 on Python 3.10 for GPU
- Watson Machine Learning Accelerator
- watsonx.ai
Options What to do You are not installing services that require specific node settings You are installing services that require specific node settings - Complete the appropriate tasks in Changing required node settings.
- Go to g. Do you plan to install services that have a dependency on prerequisite software?
- g. Do you plan to install services that have a dependency on prerequisite software?
-
Services with a dependency on prerequisite software:
Services that have prerequisites Prerequisite software Watson Discovery - Multicloud Object Gateway
Watson Speech services - Multicloud Object Gateway
watsonx Assistant - Multicloud Object Gateway
- Red Hat OpenShift Serverless Knative Eventing
watsonx Orchestrate® - IBM App Connect in containers
Options What to do You are not installing services with a dependency on prerequisite software You are installing services with a dependency on prerequisite software that must be installed on the cluster - Complete the appropriate tasks for your environment:
- Go to 7. Preparing to install an instance of Cloud Pak for Data.
7. Preparing to install an instance of Cloud Pak for Data
Before you can install IBM Cloud Pak for Data, you must create and configure the projects for an instance of Cloud Pak for Data.
Cluster administrator Repeat as needed
- a. Do you want to allow the
cpd-cli
to create projects (namespaces) for you? -
The IBM Cloud Pak for Data command-line interface can automatically create any projects that don't exist on the cluster. However, you can optionally create the projects manually.
Options What to do You will allow the cpd-cli
to create projectsYou won't allow the cpd-cli
to create projects - b. Applying the required permissions to projects
-
Before you install an instance of IBM Cloud Pak for Data, you must ensure that the project where the operators will be installed can watch the project where the Cloud Pak for Data control plane and services are installed.
What to do - c. Who will install and manage the instance?
-
If a user other than the cluster administrator will install IBM Cloud Pak for Data, you must give a Red Hat OpenShift Container Platform user the required roles to install the Cloud Pak for Data software in the instance projects.
Options What to do The cluster administrator will install the instance Another user will install the instance - d. Do you plan to install any services with a dependency on Multicloud Object Gateway in the instance?
-
If you plan to install services with a dependency on Multicloud Object Gateway in this instance of IBM Cloud Pak for Data, you must create the secrets that the services use to communicate with Multicloud Object Gateway.
Options What to do The instance will not include services with a dependency on Multicloud Object Gateway The instance will include one or more services with a dependency on Multicloud Object Gateway
8. Installing an instance of Cloud Pak for Data
You can install one or more instances of IBM Cloud Pak for Data on your Red Hat OpenShift Container Platform cluster. Each instance of Cloud Pak for Data has its own project for operators and its own project for the Cloud Pak for Data control plane and services (also called operands).
Instance administrator Repeat as needed
- a. Installing the IBM Cloud Pak foundational services for the instance
-
Before you can install IBM Cloud Pak for Data, you must install the IBM Cloud Pak foundational services that Cloud Pak for Data requires.
What to do - b. Do you plan to install any services with a dependency on Db2U in the instance?
-
If you install services with a dependency on Db2U, create a
db2u-product-cm ConfigMap
to specify whether Db2U runs with limited privileges or elevated privileges.
Services with a dependency on Db2U
- Db2
- Db2 Big SQL
- Db2 Warehouse
- OpenPages
- IBM Knowledge Catalog
- Watson Query
Options What to do The instance will not include services with a dependency on Db2U The instance will include one or more services with a dependency on Db2U - c. Do you plan to install services that support installation options in the instance?
- If you plan to install the services at the same time as the IBM Cloud Pak for Data control plane, create an
install-options.yml
file to specify the installation options for the services that you plan to install.
Services that support installation options
Service Details Analytics Engine powered by Apache Spark The settings are optional. If you do not specify installation options, the default values are used. Data Replication The settings are required. You must specify the license that you purchased. IBM Match 360 with Watson The settings are optional. If you do not specify installation options, the default values are used. Informix The settings are optional. If you do not specify installation options, the default values are used. Voice Gateway The settings are optional. If you do not specify installation options, the default values are used. watsonx Assistant The settings are optional. If you do not specify installation options, the default values are used. Watson Discovery The settings are optional. If you do not specify installation options, the default values are used. IBM Knowledge Catalog The settings are optional. If you do not specify installation options, the default values are used. Watson Speech services The settings are optional. If you do not specify installation options, the default values are used.
Options What to do You don't plan to install services when you install the Cloud Pak for Data control plane The instance will not include services with installation options The instance will include one or more services that support installation options - Review Specifying installation options for services to determine whether you want to adjust any of the installation options.
- Go to d. Installing Cloud Pak for Data.
The instance will include one or more services that require installation options - d. Installing Cloud Pak for Data
-
After you install IBM Cloud Pak foundational services for the instance, you can install the IBM Cloud Pak for Data control plane and services.
What to do
- e. Do you plan to tether any projects to this instance of Cloud Pak for Data?
-
You can use tethered projects to isolate service instances or workloads from the rest of your IBM Cloud Pak for Data deployment. After you install Cloud Pak for Data, you can tether projects to the Cloud Pak for Data control plane.
Remember: Not all services support tethered projects. For information about which services can deploy service instances to tethered projects, see Multitenancy support.Options What to do You don't plan to use tethered projects The instance will not include services with installation options
9. Completing post-installation tasks
After you install Cloud Pak for Data, make sure your cluster is secure and complete tasks that will impact how users interact with Cloud Pak for Data, such as configuring SSO or changing the route to the platform.
Instance administrator Repeat as needed
Options | What to do |
---|---|
You didn't install the services when you installed the platform. |
|
You installed the services when you installed the platform |
|
10. Installing services
Install the services that support your business needs.
Instance administrator Repeat as needed Repeat as needed to install additional services.
What to do |
---|
|