Preparing to back up Cloud Pak for Data with IBM Storage Fusion

Complete various prerequisite tasks before you create an online backup of Cloud Pak for Data with IBM Storage Fusion. Some tasks are service-specific, and need to be done only when those services are installed.

Best practice: You can run the commands in this task exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

Set up a client workstation

Set up a client workstation from which you will install the required software, prepare the cluster, and create backups. Make sure that the following clients are installed.
  • Red Hat® OpenShift® command-line interface (oc)
  • Cloud Pak for Data command-line interface (cpd-cli)
    Note: Install the cpd-cli version that is specific to the Cloud Pak for Data version that you are using.

For more information, see Setting up a client workstation.

Install the required software on the source cluster

Install the following software on the source cluster:

  1. IBM Storage Fusion Version 2.7.2 with the latest hotfix or later fixes, or Version 2.8.0 with the latest hotfix or later fixes.
    Note: Backup and restore with IBM Storage Fusion 2.8.1 is not supported on OpenShift Version 4.16 or later fixes.
  2. cpdbr service for IBM Storage Fusion integration

    The version of cpdbr resources must match the Cloud Pak for Data version. For example, if you upgraded Cloud Pak for Data from version 4.8.4 to 5.0.0, you must also upgrade the cpdbr service to version 5.0.0.

    Do the following steps to check the version of cpdbr resources.

    1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
      ${OC_LOGIN}
      Remember: OC_LOGIN is an alias for the oc login command.
    2. Check the version of cpdbr-oadp by running the following command:
      oc get po -l component=cpdbr-tenant,icpdsupport/app=br-service -n ${PROJECT_CPD_INST_OPERATORS} -o jsonpath='{.items[0].spec.containers[0].image}'
      Example output:
      icr.io/cpopen/cpd/cpdbr-oadp:5.0.1-x86_64
    3. Check the version of the IBM Storage Fusion backup and restore recipe for Cloud Pak for Data by running the following command:
      oc get -n ${PROJECT_CPD_INST_OPERATORS} frcpe ibmcpd-tenant -o jsonpath={'.metadata.labels.icpdsupport/version'}
      For Cloud Pak for Data 5.0.1 to 5.0.3, check that the output of the command is:
      5.0.1

Create a volume snapshot class

To take PersistentVolumeClaim (PVC) volume backup snapshots, a volume snapshotclass is needed. If the Container Storage Interface (CSI) driver that you are using does not have one, you must create it. For details on creating a volume snapshot class, see Creating volume snapshot classes.

Clean up MongoDB resources

Remove residual MongoDB resources before you create a backup. Do the following steps:

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Edit the authentication.operator.ibm.com custom resource:
    oc edit authentication.operator.ibm.com -n ${PROJECT_CPD_INST_OPERANDS}
  3. Change the annotation authentication.operator.ibm.com/retain-migration-artifacts to false.

Expand PVCs that are smaller than 5Gi when using IBM Storage Scale Container Native storage

If your Cloud Pak for Data deployment is using IBM Storage Scale Container Native or IBM Storage Fusion Global Data Platform storage, expand Persistent Volume Claims (PVCs) that are smaller than 5Gi to at least that amount to ensure that restoring a backup is successful. For details on expanding PVCs, see Volume Expansion in the IBM Storage Scale Container Storage Interface Driver documentation.

Note: You cannot manually expand Watson OpenScale PVCs. To manage PVC sizes for Watson OpenScale, see Managing persistent volume sizes for Watson OpenScale.

Prepare IBM Storage Fusion

Prepare IBM Storage Fusion by setting up one of the clusters as the IBM Storage Fusion backup and restore hub.

  1. In IBM Storage Fusion, open the Services page and click the Backup & Restore tile.
  2. In the Install service window, select the storage class (RWO) that you want to use to deploy the service.

    The ibm-backup-restore project (namespace) is created on the cluster, and the service is installed in that project.

  3. Verify that the hub is in a healthy state by clicking Backup & restore > Topology and checking the Service status column of the hub.
Important: IBM Storage Fusion service backups must be configured to protect against a cluster failure by backing up to a location outside the IBM Storage Fusion cluster. For details, see Configuring service backups.

Check the content of the IBM Storage Fusion application for the Cloud Pak for Data operator

Check that the IBM Storage Fusion application custom resource for the Cloud Pak for Data operator includes the following information:

  • All projects (namespaces) that are members of the Cloud Pak for Data instance, including:
    • The Cloud Pak for Data operators project (${PROJECT_CPD_INST_OPERATORS}).
    • The Cloud Pak for Data operands project (${PROJECT_CPD_INST_OPERANDS}).
    • All tethered projects, if they exist.
  • The PARENT_NAMESPACE variable, which is set to ${PROJECT_CPD_INST_OPERATORS}.

Do the following steps:

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Set the PROJECT_FUSION environment variable:
    export PROJECT_FUSION=<fusion-namespace>
    Tip: By default, the IBM Storage Fusion project is ibm-spectrum-fusion-ns.
  3. To get the list of all projects that are members of the Cloud Pak for Data instance, run the following command:
    oc get -n ${PROJECT_FUSION} applications.application.isf.ibm.com ${PROJECT_CPD_INST_OPERATORS} -o jsonpath={'.spec.includedNamespaces'}
  4. To get the PARENT_NAMESPACE variable, run the following command:
    oc get -n ${PROJECT_FUSION} applications.application.isf.ibm.com ${PROJECT_CPD_INST_OPERATORS} -o jsonpath={'.spec.variables'}

Check the primary instance of every PostgreSQL cluster is in sync with its replicas

The replicas for Cloud Native PostgreSQL and EDB Postgres clusters occasionally get out of sync with the primary node. For information about diagnosing and fixing this problem, see PostgreSQL cluster replicas get out of sync.

Prepare IBM Knowledge Catalog

If large metadata enrichment jobs are running while an online backup operation is triggered, the Db2 pre-backup hooks might fail because the database cannot be put into a write-suspended state. It is recommended to have minimal enrichment workload while the online backup is scheduled.

Prepare watsonx Assistant

5.0.0- 5.0.2 If you upgraded Cloud Pak for Data from a previous release, some labels on PostgreSQL Persistent Volume Claims (PVCs) must be removed before a backup is taken. Do the following steps:

  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Set the watsonx Assistant instance name and Cloud Pak for Data instance project (namespace) environment variables:
    export INSTANCE=<watsonx Assistant instance name>
    export NAMESPACE=<Cloud Pak for Data namespace>
  3. Remove the labels:
    for pvc in $(oc get pvc -n $NAMESPACE -l app=$INSTANCE-postgres -o jsonpath='{.items[*].metadata.name}'); do
        if [ "X$(oc get pvc $pvc -o jsonpath='{.metadata.labels.velero\.io/exclude-from-backup}' -n $NAMESPACE)" != "X" ]; then 
            oc patch pvc $pvc -p '{"metadata": {"labels": {"velero.io/exclude-from-backup": null}}}'  -n $NAMESPACE
            echo "Label 'velero.io/exclude-from-backup' removed for PVC: $pvc"
        else
            echo "Label 'velero.io/exclude-from-backup' not found for PVC: $pvc"
        fi
        if [ "X$(oc get pvc $pvc -o jsonpath='{.metadata.labels.icpdsupport/empty-on-nd-backup}' -n $NAMESPACE)" != "X" ]; then 
            oc patch pvc $pvc -p '{"metadata": {"labels": {"icpdsupport/empty-on-nd-backup": null}}}'  -n $NAMESPACE
            echo "Label 'icpdsupport/empty-on-nd-backup' removed for PVC: $pvc"
        else
            echo "Label 'icpdsupport/empty-on-nd-backup' not found for PVC: $pvc"
        fi
    done

Check the status of installed services

Ensure that the status of all installed services is Completed. Do the following steps.

  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Run the following command to get the status of all services.
    cpd-cli manage get-cr-status \
    --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}

Separately back up services that do not support online backups

For services that do not support online backups, back up those services separately by using their service-specific backup process before you back up a Cloud Pak for Data instance. For more information about services that do not support online backups, see Services that support backup and restore.