Preparing to back up Cloud Pak for Data with IBM Storage Fusion
Complete various prerequisite tasks before you create an online backup of Cloud Pak for Data with IBM Storage Fusion. Some tasks are service-specific, and need to be done only when those services are installed.
Ensure that you source the environment variables before you run the commands in this task.
Set up a client workstation
- Red Hat®
OpenShift® command-line interface
(
oc) - Cloud Pak for Data command-line interface
(
cpd-cli)Note: Install the cpd-cli version that is specific to the Cloud Pak for Data version that you are using.
For more information, see Setting up a client workstation.
Install the required software on the source cluster
Install the following software on the source cluster:
- IBM Storage Fusion
Version 2.7.2 with the latest hotfix
or later fixes, or Version 2.8.0 with the latest hotfix or later fixes.Note: Backup and restore with IBM Storage Fusion 2.8.1 is not supported on OpenShift Version 4.16 or later fixes.
- cpdbr service for IBM Storage Fusion
integration
The version of cpdbr resources must match the Cloud Pak for Data version. For example, if you upgraded Cloud Pak for Data from version 4.8.4 to 5.0.0, you must also upgrade the cpdbr service to version 5.0.0.
Do the following steps to check the version of cpdbr resources.
-
Log in to Red Hat OpenShift Container Platform as a cluster administrator.
${OC_LOGIN}Remember:OC_LOGINis an alias for theoc logincommand. - Check the version of cpdbr-oadp by running the following
command:
oc get po -l component=cpdbr-tenant,icpdsupport/app=br-service -n ${PROJECT_CPD_INST_OPERATORS} -o jsonpath='{.items[0].spec.containers[0].image}'Example output:icr.io/cpopen/cpd/cpdbr-oadp:5.0.1-x86_64 - Check the version of the IBM Storage Fusion backup and restore recipe for Cloud Pak for Data by
running the following
command:
oc get -n ${PROJECT_CPD_INST_OPERATORS} frcpe ibmcpd-tenant -o jsonpath={'.metadata.labels.icpdsupport/version'}For Cloud Pak for Data 5.0.1 to 5.0.3, check that the output of the command is:5.0.1
-
Create a volume snapshot class
To take PersistentVolumeClaim (PVC) volume backup snapshots, a volume snapshotclass is needed. If the Container Storage Interface (CSI) driver that you are using does not have one, you must create it. For details on creating a volume snapshot class, see Creating volume snapshot classes.
Clean up MongoDB resources
Remove residual MongoDB resources before you create a backup. Do the following steps:
-
Log in to Red Hat OpenShift Container Platform as a cluster administrator.
${OC_LOGIN}Remember:OC_LOGINis an alias for theoc logincommand. - Edit the authentication.operator.ibm.com custom
resource:
oc edit authentication.operator.ibm.com -n ${PROJECT_CPD_INST_OPERANDS} - Change the annotation
authentication.operator.ibm.com/retain-migration-artifactstofalse.
Expand PVCs that are smaller than 5Gi when using IBM Storage Scale Container Native storage
If your Cloud Pak for Data deployment is using IBM Storage Scale Container Native or IBM Storage Fusion Global Data Platform storage, expand Persistent Volume Claims (PVCs) that are smaller than 5Gi to at least that amount to ensure that restoring a backup is successful. For details on expanding PVCs, see Volume Expansion in the IBM Storage Scale Container Storage Interface Driver documentation.
Prepare IBM Storage Fusion
Prepare IBM Storage Fusion by setting up one of the clusters as the IBM Storage Fusion backup and restore hub.
- In IBM Storage Fusion, open the Services page and click the Backup & Restore tile.
- In the Install service window, select the storage
class (RWO) that you want to use to deploy the service.
The ibm-backup-restore project (namespace) is created on the cluster, and the service is installed in that project.
- Verify that the hub is in a healthy state by clicking and checking the Service status column of the hub.
Check the content of the IBM Storage Fusion application for the Cloud Pak for Data operator
Check that the IBM Storage Fusion application custom resource for the Cloud Pak for Data operator includes the following information:
- All projects (namespaces) that are members of the Cloud Pak for Data instance, including:
- The Cloud Pak for Data operators project
(
${PROJECT_CPD_INST_OPERATORS}). - The Cloud Pak for Data operands project
(
${PROJECT_CPD_INST_OPERANDS}). - All tethered projects, if they exist.
- The Cloud Pak for Data operators project
(
- The
PARENT_NAMESPACEvariable, which is set to${PROJECT_CPD_INST_OPERATORS}.
Do the following steps:
-
Log in to Red Hat OpenShift Container Platform as a cluster administrator.
${OC_LOGIN}Remember:OC_LOGINis an alias for theoc logincommand. - Set the
PROJECT_FUSIONenvironment variable:export PROJECT_FUSION=<fusion-namespace>Tip: By default, the IBM Storage Fusion project isibm-spectrum-fusion-ns. - To get the list of all projects that are members of the Cloud Pak for Data instance, run the following
command:
oc get -n ${PROJECT_FUSION} applications.application.isf.ibm.com ${PROJECT_CPD_INST_OPERATORS} -o jsonpath={'.spec.includedNamespaces'} - To get the
PARENT_NAMESPACEvariable, run the following command:oc get -n ${PROJECT_FUSION} applications.application.isf.ibm.com ${PROJECT_CPD_INST_OPERATORS} -o jsonpath={'.spec.variables'}
Check the primary instance of every PostgreSQL cluster is in sync with its replicas
The replicas for Cloud Native PostgreSQL and EDB Postgres clusters occasionally get out of sync with the primary node. For information about diagnosing and fixing this problem, see PostgreSQL cluster replicas get out of sync.
Prepare IBM Knowledge Catalog
If large metadata enrichment jobs are running while an online backup operation is triggered, the Db2 pre-backup hooks might fail because the database cannot be put into a write-suspended state. It is recommended to have minimal enrichment workload while the online backup is scheduled.
Prepare watsonx Assistant
5.0.0- 5.0.2 If you upgraded Cloud Pak for Data from a previous release, some labels on PostgreSQL Persistent Volume Claims (PVCs) must be removed before a backup is taken. Do the following steps:
-
Log in to Red Hat OpenShift Container Platform as a cluster administrator.
${OC_LOGIN}Remember:OC_LOGINis an alias for theoc logincommand. - Set the watsonx Assistant instance name and
Cloud Pak for Data instance project (namespace)
environment
variables:
export INSTANCE=<watsonx Assistant instance name> export NAMESPACE=<Cloud Pak for Data namespace> - Remove the
labels:
for pvc in $(oc get pvc -n $NAMESPACE -l app=$INSTANCE-postgres -o jsonpath='{.items[*].metadata.name}'); do if [ "X$(oc get pvc $pvc -o jsonpath='{.metadata.labels.velero\.io/exclude-from-backup}' -n $NAMESPACE)" != "X" ]; then oc patch pvc $pvc -p '{"metadata": {"labels": {"velero.io/exclude-from-backup": null}}}' -n $NAMESPACE echo "Label 'velero.io/exclude-from-backup' removed for PVC: $pvc" else echo "Label 'velero.io/exclude-from-backup' not found for PVC: $pvc" fi if [ "X$(oc get pvc $pvc -o jsonpath='{.metadata.labels.icpdsupport/empty-on-nd-backup}' -n $NAMESPACE)" != "X" ]; then oc patch pvc $pvc -p '{"metadata": {"labels": {"icpdsupport/empty-on-nd-backup": null}}}' -n $NAMESPACE echo "Label 'icpdsupport/empty-on-nd-backup' removed for PVC: $pvc" else echo "Label 'icpdsupport/empty-on-nd-backup' not found for PVC: $pvc" fi done
Check the status of installed services
Ensure that the status of all installed services is Completed. Do the following steps.
-
Log the
cpd-cliin to the Red Hat OpenShift Container Platform cluster:${CPDM_OC_LOGIN}Remember:CPDM_OC_LOGINis an alias for thecpd-cli manage login-to-ocpcommand. - Run the following command to get the status of all
services.
cpd-cli manage get-cr-status \ --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}
Separately back up services that do not support online backups
For services that do not support online backups, back up those services separately by using their service-specific backup process before you back up a Cloud Pak for Data instance. For more information about services that do not support online backups, see Services that support backup and restore.