Post-restore tasks after restoring an offline backup with the OADP utility
For the Cloud Pak for Data control plane and some services, additional tasks must be done after you restore Cloud Pak for Data from an offline backup with the OADP utility.
Cloud Pak for Data control plane
If you use node lists to pin pods to nodes, you must re-run the cpd-cli manage
apply-entitlement command after you restore Cloud Pak for Data on the target cluster. Any pods that need to be
rescheduled will be unavailable while they are moved to different nodes. For more information, see
Passing node information to Cloud Pak for Data.
- Start the evictor
job:
cpd-cli manage enable-rsi --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} - Patch all pods that did not get the proxy
patch:
cpd-cli manage apply-rsi-patches --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} -vvv
Restoring Data Replication
After Cloud Pak for Data is restored, do the following steps:
- If you restored Cloud Pak for Data to a different cluster, stop the replication on the source cluster to avoid having two streams of data flowing from the same data source to the same destination when the service is restarted on the restored cluster.
- Connect to the restored Cloud Pak for Data instance.
- Go to the restored replications and stop them.
- Restart the replications.
Restoring Db2
Restoring Db2 Warehouse
Restoring Execution Engine for Apache Hadoop
After restoring a Cloud Pak for Data deployment that
includes Execution Engine for Apache Hadoop, the
hadoop-cr custom resource might be stuck in the InProgress
state. If that state does not change for more than 20 minutes, the
hadoop-addon-translation-job job might be stuck. Do the following steps:
- Confirm that the hadoop-addon-translation-job job is stuck by running the
following
command:
oc describe hadoop hadoop-cr | grep Message - Check for the following
message:
"Job" "hadoop-addon-translation-job": Timed out waiting on resource - If you see this message, manually delete the
job:
oc delete job/hadoop-addon-translation-job
Verifying the Watson Machine Learning restore operation
After restoring from a backup, users might be unable to deploy new models and score existing models. To resolve this issue, after the restore operation, wait until operator reconciliation completes.
-
Log in to Red Hat® OpenShift® Container Platform as a user with sufficient permissions to complete the task.
${OC_LOGIN}Remember:OC_LOGINis an alias for theoc logincommand. - Check the status of the operator with the following
commands:
export PROJECT_WML=<wml-namespace>kubectl describe WmlBase wml-cr -n ${PROJECT_WML} | grep "Wml Status" | awk '{print $3}' - After backup and restore operations, before using Watson
Machine Learning, make sure that the
wml-cris incompletedstate and all the wml pods are inrunningstate. Use this command to check that all wml pods are inrunningstate:oc get pods -n <wml-namespace> -l release=wml
Restoring owner references to Watson Machine Learning Accelerator resources
Complete additional steps to restore owner references to all Watson Machine Learning Accelerator resources, see: Backing up and restoring Watson Machine Learning Accelerator.
Restoring Watson OpenScale to a different cluster
5.0.0 When you are restoring
Watson
OpenScale to a different cluster, the
cpd-external-route key in *-aios-service-secrets must be updated
to point to the target cluster external route. This update is needed to make sure that the Watson
OpenScale Dashboard URL that is included in the
notification email and the published metrics CSV is valid and correct.
Do the following steps:
-
Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to complete the task.
${OC_LOGIN}Remember:OC_LOGINis an alias for theoc logincommand. - Change to the project where Analytics Engine powered by Apache Spark was
installed:
oc project <Project> - Describe the mrm pod and look for the
secretKeyRef.nameset for the ICP_ROUTE environment variable:oc get pods | grep aios-mrm oc describe pod <pod-name>For example:oc describe pod aiopenscale-ibm-aios-mrm-87c9d8bfc-8fr6t - name: ICP_ROUTE valueFrom: secretKeyRef: key: cpd-external-route name: aiopenscale-ibm-aios-service-secrets - Get the value of the
cpd-external-routekey from the secret:oc get secret secret-name -o jsonpath={.data.cpd-external-route} | base64 -dThe output of this command is the source cluster hostname. Change it to target server host name in encoded format.
- Update the
cpd-external-routekey in the secret:oc edit secret secret-nameThis command opens a
vieditor. Replace thecpd-external-routevalue with the base64 encoded value of the target cluster, and save the secret by exiting thevieditor. You can obtain the encoded value of the target cluster URL by using thebase64command or by using the Base64 Encode and Decode website. - Restart the mrm
pod:
oc delete pod mrm-pod-nameNote: Theoc delete podcommand brings up a new pod. Make sure that pod is back up and running.
The Spark connection (also called integrated_system in
Watson
OpenScale) that is created for Analytics Engine powered by Apache Spark must be updated with the new
apikey of the target cluster. For more information, see Step 2 in Configuring the batch
processor in Watson
OpenScale.
Restoring watsonx Assistant
After restoring the watsonx Assistant backup, it is necessary to retrain the existing skills. This involves modifying a skill, to trigger training. The training process for a skill typically requires less than 10 minutes to complete. For more information, see the Retraining your backend model section in the IBM Cloud documentation.
In addition, create the secrets that watsonx Assistant uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway. Then delete the following resources that connect to Multicloud Object Gateway, if they are present. After these resources are deleted, they are recreated with the updated object store secrets.
- Set the instance name environment variable to the name that you want to use for the service
instance.
export INSTANCE=<Watson_Assistant_Instance_Name> - If they are present, delete the following
resources:
oc delete job $INSTANCE-create-bucket-store-cos-joboc delete secret registry-$INSTANCE-clu-training-$INSTANCE-dwf-trainingoc delete job $INSTANCE-clu-training-update
Restoring services that do not support offline backup and restore
The following list shows the services that don't support offline backup and restore. If any of these services are installed in your Cloud Pak for Data deployment, do the appropriate steps to make them functional after a restore.
- Data Gate
- Data Gate synchronizes Db2 for z/OS data in real time. After Cloud Pak for Data is restored, data might be out of sync from Db2 for z/OS. It is recommended that you re-add tables after Cloud Pak for Data foundational services are restored.
- MongoDB
- The service must be deleted and reinstalled. Recreate the instance as a new instance, and then restore the data with MongoDB tools. For more information, see Installing the MongoDB service and Back Up and Restore with MongoDB Tools.
- Watson Discovery
-
The service must be uninstalled, reinstalled, then the data restored.
- For more information about how to uninstall Watson Discovery, see Uninstalling Watson Discovery.
- For more information about how to reinstall Watson Discovery, see Installing Watson Discovery.
- For more information about how to restore the data, see Backing up and restoring data in Cloud Pak for Data.
- Watson Speech services
- The service is functional and you can re-import data. For more information, see Importing and exporting data.