Offline backup prerequisite tasks

Important: IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Complete various prerequisite tasks before you create an offline backup of IBM Cloud Pak for Data. Some tasks are service-specific, and need to be done only when those services are installed.

Best practice: You can run the commands in these tasks exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in these tasks.

Set up a client workstation

Set up a client workstation from which you will install the required software, prepare the cluster, and create backups. Make sure that the following clients are installed.

Red Hat® OpenShift® command-line interface (oc)
Cloud Pak for Data command-line interface (cpd-cli)

For more information, see Setting up a client workstation.

Install the required software on the source cluster

Install and configure the Cloud Pak for Data OADP backup and restore utility.

Note: When you install the backup and restore components, ensure that the csi default plugin is specified under spec.configuration.velero.defaultPlugins when you create the DataProtectionApplication (Velero) instance.

Estimate how much storage to allocate for backups

Tech preview Estimate the amount of storage that you need to allocate for backups by doing the following steps.

Note: Before you can run the command, the cpdbr-agent must first be installed. For more information, see Installing the cpdbr-agent.

Export the following environment variable.
```
export CPDBR_ENABLE_FEATURES=volume-util
```
Run the following command.
```
cpd-cli oadp du-pv
```

For more information about this command, see oadp du-pv.

Check the status of installed services

Ensure that the status of all installed services is Completed. Do the following steps.

Run the cpd-cli manage login-to-ocp command to log in to the cluster as a user with sufficient permissions to complete this task. For example:
```
cpd-cli manage login-to-ocp \
--username=${OCP_USERNAME} \
--password=${OCP_PASSWORD} \
--server=${OCP_URL}
```
Tip: The login-to-ocp command takes the same input as the oc login command. Run oc login --help for details.

Run the following command to get the status of all services.

cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}

Separately back up services that do not support offline backups

For services that do not support offline backups, back up those services separately by using their service-specific backup process before you back up a Cloud Pak for Data instance. For more information about services that do not support online backups, see Services that support backup and restore.

Prepare Watson Machine Learning

Before you back up the Watson Machine Learning service, disable scheduled jobs and (cancel/wait to finish) existing starting/running jobs. Run the following command, replacing <timeout_in_seconds> with the timeout duration for jobs to complete before terminating them:

oc -n ${PROJECT_CPD_INST_OPERANDS} get pods -l app=wml-deployment-manager -o name | xargs -I{} oc -n <service_namespace> exec {} -- bash -c "/opt/ibm/wml-online-scoring/runtime-manager/bin/startQuiesce.sh <timeout_in_seconds>"

Prepare Watson Machine Learning Accelerator

Before you back up the Watson Machine Learning Accelerator service, do the following steps:

Stop all running jobs:

oc delete $(oc get pj -l release=wmla -o name)

Stop any deployed models, see Stop an inference service.
Stop any notebook servers, see Stopping a notebook server.
If you upgraded from IBM Cloud Pak for Data Version 4.6 and you are using IBM® Storage Fusion, Portworx or Red Hat OpenShift Data Foundation storage, delete conda content from the persistent volume (PV):
1. Get the conda pod name:
```
oc get po |grep wmla-conda
```
2. Delete conda PV data from wmla-conda pod:
```
oc exec -it wmla-conda-pod-name bash
bash-4.4$ rm -rfv /opt/conda/*
```
3. Delete the conda_synced file from the wmla-conda pod:
```
oc exec -it wmla-conda-pod-name bash
bash-4.4$ rm -rf /var/shareDir/dli/work/conda_synced
```

Prepare SPSS Modeler

Before you back up the SPSS® Modeler service, stop all active runtimes and jobs. Do the following steps:

Before you start the backup, confirm that you are logged in as cluster administrator.
To stop all active SPSS Modeler runtimes and jobs, run the following commands:
```
oc delete rta -l type=service,job -l component=spss-modeler
```
To check whether any SPSS Modeler runtime sessions are still running, run the following command:
```
oc get pod -l type=spss-modeler
```
When no pods are running, no output is produced for this command.

Prepare Data Refinery

To avoid any unnecessary data loss, it is recommended that you stop all Data Refinery runtimes and jobs. Do the following steps:

Before you start the backup, confirm that you are logged in as cluster administrator.

To stop all active Data Refinery runtimes and jobs, run the following command:

oc delete $(oc get deployment -l type=shaper -o name)
oc delete $(oc get svc -l type=shaper -o name)
oc delete $(oc get job -l type=shaper -o name)
oc delete $(oc get secrets -l type=shaper -o name)
oc delete $(oc get cronjobs -l type=shaper -o name)
oc scale -\-replicas=0 deploy wdp-shaper wdp-dataprep

Prepare Db2® Warehouse

Add a label to the Db2U cluster so that backups can successfully complete. Do the following steps:

Retrieve the names of the Cloud Pak for Data deployment's Db2U clusters:

oc get db2ucluster -A -ojsonpath='{.items[?(@.spec.environment.dbType=="db2wh")].metadata.name}'

For each Db2U cluster, do the following substeps:

Export the Db2U cluster name:
```
export DB2UCLUSTER=<db2ucluster_name>
```

Label the cluster:

oc label db2ucluster ${DB2UCLUSTER} db2u/cpdbr=db2u --overwrite

Verify that the Db2U cluster now contains the new label:
```
oc get db2ucluster ${DB2UCLUSTER} --show-labels
```

Prepare Db2

Add a label to the Db2U cluster so that backups can successfully complete. Do the following steps:

Retrieve the names of the Cloud Pak for Data deployment's Db2U clusters:

oc get db2ucluster -A -ojsonpath='{.items[?(@.spec.environment.dbType=="db2oltp")].metadata.name}'

For each Db2U cluster, do the following substeps:

Export the Db2U cluster name:
```
export DB2UCLUSTER=<db2ucluster_name>
```

Label the cluster:

oc label db2ucluster ${DB2UCLUSTER} db2u/cpdbr=db2u --overwrite

Verify that the Db2U cluster now contains the new label:
```
oc get db2ucluster ${DB2UCLUSTER} --show-labels
```

Prepare Watson Discovery

Before you back up a cluster where the Watson Discovery service is installed, back up the Watson Discovery data separately by running the Watson Discovery backup script. For more information, see Backing up and restoring data.