How to Install IBM Cloud Pak for Data into a VPC OpenShift Cluster Using OpenShift Data Foundation

5 min read

Instructions on how to install and configure OpenShift Data Foundation (ODF) on Red Hat Openshift for IBM Cloud and prepare it for IBM Cloud Pak for Data installation.

OpenShift Data Foundation is a highly available storage solution that consists of several open-source operators and technologies like Ceph, NooBaa and Rook. These operators allow you to provision and manage file, block and object storage for your containerized workloads in Red Hat® OpenShift® on IBM Cloud® clusters. Unlike other storage solutions where you might need to configure separate drivers and operators for each type of storage, ODF is a unified solution capable of adapting or scaling to your storage needs. You can also deploy ODF on any OCP cluster.

Architecture overview

The documentation has a detailed architecture overview for OpenShift Data Foundation.

How does OpenShift Data Foundation work?

OpenShift Data Foundation (ODF) uses storage volumes in multiples of three and replicates your app data across these volumes. The underlying storage volumes that you use for ODF depends on your cluster type:

  • For IBM Cloud VPC clusters, the storage volumes are dynamically provisioned block storage for VPC devices.
  • For bare metal Classic clusters, the storage volumes are local disks on your bare metal worker nodes.
  • For IBM Cloud Satellite clusters, the storage volumes are either local disks on your worker nodes, or you can dynamically provision disks by using a compatible block storage driver.

ODF uses these devices to create a virtualized storage layer, where your app data is replicated for high availability. Because ODF abstracts your underlying storage, you can use ODF to create file, block or object storage claims from the same underlying raw block storage.

For a full overview of the features and benefits, see OpenShift Data Foundation.

Step-by-step instructions

In this step-by-step guide, we will show you how to install and configure OpenShift Data Foundation (ODF) on Red Hat Openshift for IBM Cloud and prepare it for IBM Cloud Pak for Data installation.

1. OpenShift Data Foundation Installation

Prerequisites

For this tutorial, we will not demonstrate how to provision and configure a Red Hat OpenShift on IBM Cloud cluster. Before starting, you'll need to install the required CLI into your computer (ibmcloud and Openshift) or use IBM Cloud Shell in your browser.

  1. Install the Red Hat OpenShift on IBM Cloud 4.8 (ROKS) cluster (instructions here).
  2. Create a separate Worker Pool in this cluster, only for ODF installation. The ODF needs at least three worker nodes with 16vCPUx64GB. The default worker pool will be used by Cloud Pak for Data. Having a dedicated worker pool for ODF makes it easier to resize (up and down) and update workers in the application pool without compromising the ODF's installation.

Installation

  1. Install the OpenShift Data Foundation (ODF) add-on. In the Overview Section of the OpenShift cluster in IBM Cloud Portal, click on OpenShift Data Foundation Install:
    Install the OpenShift Data Foundation (ODF) add-on. In the Overview Section of the OpenShift cluster in IBM Cloud Portal, click on OpenShift Data Foundation Install:
  2. In the Install OpenShift Data Foundation panel, enter the configuration parameters that you want to use for your ODF deployment and click Install:
    In the Install OpenShift Data Foundation panel, enter the configuration parameters that you want to use for your ODF deployment and click Install:
    • osdSize: Enter the size of the block storage for VPC devices that you want to provision for the OSD pods. The default size is 250Gi. 
    • osdStorageClassName: Enter the block storage for VPC storage class that you want to use to dynamically provision storage for the OSD pods. The default storage class is ibmc-vpc-block-metro-10iops-tier.
    • osdDevicePaths: Invalid for VPC clusters. Leave this parameter as-is.
    • numOfOsd: Enter the number of block storage device sets that you want to provision for ODF. A numOfOsd value of one provisions, one device set, which includes three block storage devices. The devices are provisioned evenly across your worker nodes. For more information, see Understanding ODF.
    • workerNodes: Enter the worker nodes where you want to deploy ODF. You must have at least three worker nodes. The default setting is all. To deploy ODF only on nodes of ODF's worker pool, enter the IP addresses of the worker nodes in a comma-separated list without spaces. For example: 10.242.0.32,10.242.0.33,10.242.0.34
    • ocsUpgrade: For initial deployment, leave this setting as false. The default setting is false.
    • clusterEncryption: The default setting is false.
  3. Wait a few minutes for the add-on deployment to complete. When the deployment is complete, the add-on status is Normal - Add-on Ready.
  4. Verify your installation. Access your Red Hat OpenShift cluster.
  5. Run the following command to verify the ODF pods are running into open-shift-storage namespace/project. At this moment 3 x 250GB (data use) and 3 x 50GB (monitoring) block storage are also provisioned:
    oc get pods -n openshift-storage -o wide

2. Create OpenShift taint for ODF's worker pool

In order to install IBM Cloud Pak for Data and not have its pods be placed into OpenShift Data Foundation's (ODF's) worker pool, it is necessary to limit this by using taint and toleration configurations of OpenShift. Set Kubernetes taints for all the worker nodes in the ODF's worker pool. Taints prevent pods without matching tolerations from running on the worker nodes. To learn more about taint and toleration, check out the site

Setting taints into ODF's worker pool means that all new worker nodes (in case of an upgrade, for example) also receive the same taint configuration.

  1. In IBM Cloud CLI or Shell, login into the Red Hat OpenShift on IBM Cloud cluster, following the site.
  2. In IBM Cloud CLI or Shell, run the following commands:
    • Syntax: ibmcloud oc worker-pool taint set --worker-pool WORKER_POOL --cluster CLUSTER --taint KEY=VALUE:EFFECT [--taint KEY=VALUE2:EFFECT] [-f]
    • Example: ibmcloud oc worker-pool taint set --worker-pool odf --cluster roks-odf-test-cguarany --taint  node.ocs.openshift.io/storage=true:NoSchedule
  3. After setting the custom taint for the ODF's worker pool, confirm that the taints are set on of each the worker nodes by getting the private IP address of the worker node (`ibmcloud oc worker ls -c <cluster_name_or_ID>`) and running `oc describe node <worker_private_IP>`
    • Example: ibmcloud oc worker ls -c roks-odf-test-cguarany and oc describe node 10.242.0.33.
    • The taints section of the describe results must have the following information: node.ocs.openshift.io/storage=true:NoSchedule

3. Increase ODF's storage sizing (scale)

It's possible to scale the OpenShift Data Foundation (ODF) configuration by increasing the numOfOsd setting. When increasing the number of OSDs, ODF provisions the number of disks of the same osdSize capacity in GB in each of the worker nodes in your ODF cluster. However, the total storage that is available to your applications is equal to the osdSize multiplied by the numOfOsd.

The following is an example of ODF storage distribution:

The following is an example of ODF storage distribution:

Edit and update the ocscluster CRD to increase the numOfOsd. Run oc edit ocscluster and change from 1 to 2 in order to have 500Gi storage capacity available to Cloud Pak for Data:

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: ocs.ibm.io/v1kind: OcsCluster
metadata:
  creationTimestamp: "2022-03-29T19:38:26Z"
  finalizers:
  - finalizer.ocs.ibm.io
  generation: 3
  name: ocscluster-auto
  resourceVersion: "3621968"
  uid: 263df534-d471-4d24-864c-a1e20f77a2c1
spec:
  autoDiscoverDevices: false
  billingType: advanced
  clusterEncryption: false
  numOfOsd: 2
  ocsUpgrade: false
  osdDevicePaths:
  - ""
  osdSize: 250Gi
  osdStorageClassName: ibmc-vpc-block-metro-10iops-tier
  workerNodes:
  - 10.242.0.32
  - 10.242.0.33
  - 10.242.0.34
status:
  storageClusterStatus: Ready

4. Install Cloud Pak for Data using ODF

Now we'll show you — step-by-step — how to install IBM Cloud Pak for Data using OpenShift Data Foundation (ODF). IBM's installation official document is available on IBM docs.

Prerequisites

Before starting, you'll need to install the required CLI (oc) compatible with your OpenShift version and cloudctl cli into a bastion computer or into IBM Cloud Shell (on IBM Cloud Shell, the oc cli is already installed).

  1. In IBM Cloud Shell, install the latest release for cloudctl using wget: 
    wget https://github.com/IBM/cloud-pak-cli/releases/latest/download/cloudctl-linux-amd64.tar.gz
  2. Perform tar -xvf cloudctl-linux-amd64.tar.gz to unzip and extract the downloaded content.
  3. Finish the cloudctl configuration by performing the following steps and then check the version:
    • ln -s cloudctl-linux-amd64 cloudctl
    • export PATH=$PWD:$PATH
    • echo $PATH
    • cloudctl version
Finish the cloudctl configuration by performing the following steps and then check the version:

If you are using a bastion computer to install Cloud Pak for Data, you must also install oc. The steps are the same above for cloudctl and will not be covered in this guide because we are using IBM Cloud Shell and it already has oc installed.

To install the oc cli on your bastion computer, you can find latest oc version for Openshift v4.8 by using the following command:

wget -O oc.4.8.tar.gz https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest-4.8/openshift-client-linux.tar.gz

Installation

  1. In IBM Cloud Shell, log in to the Red Hat OpenShift for IBM Cloud cluster by following the instructions on this site.
  2. Create the yaml files used to install the Cloud Pak for Data control panel:
    • Create a yaml file for Catalog Source — IBM Operator Catalog:
      cat > catalogsource.yaml <<EOF
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      metadata:
       name: ibm-operator-catalog
       namespace: openshift-marketplace
      spec:
       displayName: "IBM Operator Catalog"
       publisher: IBM
       sourceType: grpc
       image: icr.io/cpopen/ibm-operator-catalog:latest
       updateStrategy:
          registryPoll:
            interval: 45m
      EOF
    • Create a yaml file to configure Operator Group to use the namespace ibm-common-services:
      cat > operatorgroup.ibm-common-services.yaml <<EOF
      apiVersion: operators.coreos.com/v1alpha2
      kind: OperatorGroup
      metadata:
       name: operatorgroup
       namespace: ibm-common-services
      spec:
       targetNamespaces:
       - ibm-common-services
      EOF
    • Create a yaml file for the IBM Common Services Operator:
      cat > subscription.ibm-common-services.yaml <<EOF
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
       name: ibm-common-service-operator
       namespace: ibm-common-services
      spec:
       channel: v3
       installPlanApproval: Automatic
       name: ibm-common-service-operator
       source: opencloud-operators
       sourceNamespace: openshift-marketplace
      EOF
    • Create a yaml file for the Cloud Pak for Data platform operator:
      cat > subscription.cpd-operator.yaml <<EOF
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
       name: cpd-operator
       namespace: ibm-common-services
      spec:
       channel: v2.0
       installPlanApproval: Automatic
       name: cpd-platform-operator
       source: cpd-platform
       sourceNamespace: openshift-marketplace
      EOF
    • Create a yaml file for the Cloud Pak for Data service. This yaml file is configured to install Cloud Pak for Data version 4.0.7 using ODF storage classes ocs-storagecluster-cephfs and ocs-storagecluster-ceph-rbd in Openshift NameSpace (Project) cp4d. If you want to install in another namespace, you only need to adjust the namespace parameter into the yaml file:
      cat << EOF > ibmcpd-cr.yaml
      apiVersion: cpd.ibm.com/v1
      kind: Ibmcpd
      metadata:
       name: ibmcpd-cr
       namespace: cp4d
      spec:
       license:
          accept: true
          license: Enterprise
       storageClass: ocs-storagecluster-cephfs
       zenCoreMetadbStorageClass: ocs-storagecluster-ceph-rbd
       cloudpakfordata: true
       iamIntegration: false
       generateAdminPassword: false
       cert_manager_enabled: true
       version: "4.0.7"
      EOF
  3. Apply the yaml files to install all operators used to install the IBM Cloud Pak for Data control panel:
    • Apply the yaml file to install Catalog Source - IBM Operator Catalog:
      oc apply -f catalogsource.yaml
      You can check IBM Operator Catalog status by performing the following commands to see if it is READY and get the AGE of the catalog source:
      oc get catsrc -n openshift-marketplace ibm-operator-catalog -o jsonpath='{.status.connectionState.lastObservedState}{"\n"}'
      oc get catsrc -n openshift-marketplace
    • Apply the yaml file to create the Operator Group:
      oc apply -f operatorgroup.ibm-common-services.yaml
    • Use cloudctl to save and launch CASE to install catalog for IBM Common Services.
    • Create a directory to save packages:
      mkdir -p case
    • Set package parameters and use cloudctl to save CASE — define the version you want to install:
      • GITHUBURL=https://github.com/IBM/cloud-pak/raw/master/repo/case
      • CASENAME=ibm-cp-common-services
      • CASEINVENTORY=ibmCommonServiceOperatorSetup
      • CASEVERSION=1.12.3
      • CASEARCHIVE=${CASENAME}-${CASEVERSION}.tgz
      • cloudctl case save --case $GITHUBURL/${CASEARCHIVE} --outputdir case
        Set package parameters and use cloudctl to save CASE — define the version you want to install:
    • Use clouctl to launch CASE to install the IBMCS Operator and check it:
      cloudctl case launch --case case/$CASEARCHIVE --inventory $CASEINVENTORY --namespace openshift-marketplace --action install-catalog --args "--registry icr.io --inputDir case --recursive"
      Check the IBMCS Operator catalog source installation by performing the following command:
      oc get catsrc -n openshift-marketplace
      Check the IBMCS Operator catalog source installation by performing the following command:
  4. Use cloudctl to save and launch CASE to install catalog for the Cloud Pak for Data Platform Operator:
    • Set package parameters and use cloudctl to save CASE — define the version you want to install:
      • GITHUBURL=https://github.com/IBM/cloud-pak/raw/master/repo/case
      • CASENAME=ibm-cp-datacore
      • CASEINVENTORY=cpdPlatformOperator
      • CASEVERSION=2.0.12
      • CASEARCHIVE=${CASENAME}-${CASEVERSION}.tgz
      • cloudctl case save --case $GITHUBURL/${CASEARCHIVE} --outputdir case
    • Use clouctl to launch CASE to install the Cloud Pak for Data Platform Operator and check it:
      cloudctl case launch --case case/$CASEARCHIVE --inventory $CASEINVENTORY --namespace openshift-marketplace --action install-catalog --args "--registry icr.io --inputDir case --recursive"
      Check the Cloud Pak for Data Platform Operator source installation by performing the following command and looking for cpd-platform catalog:
      oc get catsrc -n openshift-marketplace
  5. Apply the yaml file to create csv for IBM Common Services Operator:
    oc apply -f subscription.ibm-common-services.yaml
    You can check the csv for IBM Common Services status by performing the following command and checking if the PHASE is listed as Succeeded:
    oc get csv -n ibm-common-services
    You can check the csv for IBM Common Services status by performing the following command and checking if the PHASE is listed as Succeeded:
  6. Apply the yaml file to create the Cloud Pak for Data Operator:
    oc apply -f subscription.cpd-operator.yaml
    You can check the status of the installation into the Openshift Console by going to Adminstrator view > Menu Operators > Installed Operators > Status of Cloud Pak for Data Platform Operator:
    You can check the status of the installation into the Openshift Console by going to Adminstrator view > Menu Operators > Installed Operators > Status of Cloud Pak for Data Platform Operator:
    When the installation completes, you will see Succeeded into Openshift Console and can use the following command:
    oc get csv -n ibm-common-services
  7. Install the IBM Cloud Pak for Data Control Panel service:
    • Apply the yaml file to create the Cloud Pak for Data service.
    • Change the project to the namespace you define to install your Cloud Pak for Data:
      oc project cp4d
    • Apply the yaml file to start to install the Cloud Pak for Data service (it would take approximately 30 minutes or more):
      oc apply -f ibmcpd-cr.yaml
    • You can use the oc get events command to check the status of the installation:
      oc get events -A -w
    • Alternatively, you can use the oc get pods command. When the installation is finished, it will not have any pods shown:
      watch 'oc get pod -A | grep -Ev "1/1|2/2|3/3|4/4|5/5|6/6|7/7|Complete"'
    • You can also check the zenStatus of zenservices. When the installation finishes, the zenStatus will show Completed:
      oc get zenservice lite-cr -n cp4d -o jsonpath="{.status.zenStatus}"
      You can also check the zenStatus of zenservices. When the installation finishes, the zenStatus will show Completed:
    • You now have your IBM Cloud Pak For Data installed using ODF. To access Cloud Pak for Data, you first need to retrieve the URL by performing the following command:
      oc get ZenService lite-cr -n cp4d -o jsonpath="{.status.url}"
    • You also need to retrieve your initial password for user admin (in this case, the initial password for user admin is password):
      oc extract secret/admin-user-details --keys=initial_admin_password --to=-
    • Use your favorite browser and put https://{your_CP4D_url} to log in with your admin and password:
      Use your favorite browser and put https://{your_CP4D_url} to log in with your admin and password:
    • Then, access the IBM Cloud Pak for Data: 
      Then, access the IBM Cloud Pak for Data: 
    • Use the hamburger menu to view all Cloud Pak for Data services in the Catalog that you can install and use on your IBM Cloud Pak for Data platform by accessing Menu Services > Services Catalog:
      Use the hamburger menu to view all Cloud Pak for Data services in the Catalog that you can install and use on your IBM Cloud Pak for Data platform by accessing Menu Services > Services Catalog:

5. Update ODF's OpenShift worker nodes 

When necessary, follow these steps to update/upgrade the OpenShift ODF worker nodes to keep ODF storage working properly. To update your VPC worker nodes that use OpenShift Data Foundation, you must cordon, drain and replace each worker node individually. If you deployed OpenShift Data Foundation to a subset of worker nodes in your cluster, after you replace the worker node, you must then edit the ocscluster resource to include the new worker node. The detailed process can be found on the site.

  1. List your worker nodes by using oc get nodes and determine which worker nodes you want to update:
    cguarany@cloudshell:~$ oc get nodes
    NAME          STATUS   ROLES           AGE    VERSION
    10.242.0.32  Ready    master,worker   1d    v1.21.8+ee73ea2
    10.242.0.33  Ready    master,worker   1d    v1.21.8+ee73ea2
    10.242.0.34  Ready    master,worker   1d    v1.21.8+ee73ea2
    10.242.0.40   Ready    master,worker   1d   v1.21.8+ee73ea2
    10.242.0.41   Ready    master,worker   1d   v1.21.8+ee73ea2
  2. Cordon the node (for example, 10.242.0.32). Cordoning the node prevents any pods from being scheduled on this node. Run oc adm cordon 10.242.0.32.
  3. Drain the node to remove all the pods. When you drain the worker node, the pods move to the other worker nodes, ensuring there is no downtime. Draining also ensures that there is no disruption of the pod disruption budget. Run oc adm drain 10.242.0.32 --force --delete-local-data --ignore-daemonsets.
  4. Wait until the draining finishes, then replace/update the worker node. When you replace a worker node in VPC Gen 2, you get a new worker node with the latest patch updates:
    Wait until the draining finishes, then replace/update the worker node. When you replace a worker node in VPC Gen 2, you get a new worker node with the latest patch updates:
  5. List your worker nodes by using oc get nodes and determine which is the new worker node that needs to be included in the ocscluster.
  6. Edit and update the ocscluster CRD to include the new node. Run oc edit ocscluster and replace the old node with the new one. After saving, the ocscluster will automatically install necessary pods into the new node:
    • From:
        workerNodes:
        - 10.242.0.32
        - 10.242.0.33
        - 10.242.0.34
    • To:
        workerNodes:
        - 10.242.0.33
        - 10.242.0.34
        - 10.242.0.39

Learn more

To learn more about Red Hat OpenShift on IBM Cloud and IBM Cloud Pak for Data, check out the links below:

Be the first to hear about news, product updates, and innovation from IBM Cloud