Post-restore tasks after restoring an offline backup with the OADP utility

For the Cloud Pak for Data control plane and some services, additional tasks must be done after you restore Cloud Pak for Data from an offline backup with the OADP utility.

Cloud Pak for Data control plane

If you use node lists to pin pods to nodes, you must re-run the cpd-cli manage apply-entitlement command after you restore Cloud Pak for Data on the target cluster. Any pods that need to be rescheduled will be unavailable while they are moved to different nodes. For more information, see Passing node information to Cloud Pak for Data.

If you applied cluster HTTP proxy settings or other RSI patches to a Cloud Pak for Data instance in the source cluster, run the following commands to apply the settings to the restored instance:
  1. Start the evictor job:
    cpd-cli manage enable-rsi --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}
  2. Patch all pods that did not get the proxy patch:
    cpd-cli manage apply-rsi-patches --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} -vvv

Restoring Data Replication

After Cloud Pak for Data is restored, do the following steps:

  1. If you restored Cloud Pak for Data to a different cluster, stop the replication on the source cluster to avoid having two streams of data flowing from the same data source to the same destination when the service is restarted on the restored cluster.
  2. Connect to the restored Cloud Pak for Data instance.
  3. Go to the restored replications and stop them.
  4. Restart the replications.

Restoring Db2

If Q Replication is enabled on the source cluster, the service must be re-enabled after the restore. Follow the instructions in the following topics:

Restoring Db2 Warehouse

If Q Replication is enabled on the source cluster, the service must be re-enabled after the restore. Follow the instructions in the following topics:

Restoring Execution Engine for Apache Hadoop

After restoring a Cloud Pak for Data deployment that includes Execution Engine for Apache Hadoop, the hadoop-cr custom resource might be stuck in the InProgress state. If that state does not change for more than 20 minutes, the hadoop-addon-translation-job job might be stuck. Do the following steps:

  1. Confirm that the hadoop-addon-translation-job job is stuck by running the following command:
    oc describe hadoop hadoop-cr | grep Message
  2. Check for the following message:
    "Job" "hadoop-addon-translation-job": Timed out waiting on resource
  3. If you see this message, manually delete the job:
    oc delete job/hadoop-addon-translation-job

Verifying the Watson Machine Learning restore operation

After restoring from a backup, users might be unable to deploy new models and score existing models. To resolve this issue, after the restore operation, wait until operator reconciliation completes.

  1. Log in to Red Hat® OpenShift® Container Platform as a user with sufficient permissions to complete the task.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Check the status of the operator with the following commands:
    export PROJECT_WML=<wml-namespace>
    kubectl describe WmlBase wml-cr -n ${PROJECT_WML} | grep "Wml Status" | awk '{print $3}'
    
  3. After backup and restore operations, before using Watson Machine Learning, make sure that the wml-cr is in completed state and all the wml pods are in running state. Use this command to check that all wml pods are in running state:
    oc get pods -n <wml-namespace> -l release=wml

Restoring owner references to Watson Machine Learning Accelerator resources

Complete additional steps to restore owner references to all Watson Machine Learning Accelerator resources, see: Backing up and restoring Watson Machine Learning Accelerator.

Restoring Watson OpenScale to a different cluster

5.0.0 When you are restoring Watson OpenScale to a different cluster, the cpd-external-route key in *-aios-service-secrets must be updated to point to the target cluster external route. This update is needed to make sure that the Watson OpenScale Dashboard URL that is included in the notification email and the published metrics CSV is valid and correct.

Do the following steps:

  1. Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to complete the task.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Change to the project where Analytics Engine powered by Apache Spark was installed:
    oc project <Project>
  3. Describe the mrm pod and look for the secretKeyRef.name set for the ICP_ROUTE environment variable:
    oc get pods | grep aios-mrm
    oc describe pod <pod-name>
    For example:
    oc describe pod aiopenscale-ibm-aios-mrm-87c9d8bfc-8fr6t
     - name: ICP_ROUTE
          valueFrom:
            secretKeyRef:
              key: cpd-external-route
              name: aiopenscale-ibm-aios-service-secrets
  4. Get the value of the cpd-external-route key from the secret:
    oc get secret secret-name -o jsonpath={.data.cpd-external-route} | base64 -d

    The output of this command is the source cluster hostname. Change it to target server host name in encoded format.

  5. Update the cpd-external-route key in the secret:
    oc edit secret secret-name

    This command opens a vi editor. Replace the cpd-external-route value with the base64 encoded value of the target cluster, and save the secret by exiting the vi editor. You can obtain the encoded value of the target cluster URL by using the base64 command or by using the Base64 Encode and Decode website.

  6. Restart the mrm pod:
    oc delete pod mrm-pod-name
    Note: The oc delete pod command brings up a new pod. Make sure that pod is back up and running.

The Spark connection (also called integrated_system in Watson OpenScale) that is created for Analytics Engine powered by Apache Spark must be updated with the new apikey of the target cluster. For more information, see Step 2 in Configuring the batch processor in Watson OpenScale.

Restoring watsonx Assistant

After restoring the watsonx Assistant backup, it is necessary to retrain the existing skills. This involves modifying a skill, to trigger training. The training process for a skill typically requires less than 10 minutes to complete. For more information, see the Retraining your backend model section in the IBM Cloud documentation.

In addition, create the secrets that watsonx Assistant uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway. Then delete the following resources that connect to Multicloud Object Gateway, if they are present. After these resources are deleted, they are recreated with the updated object store secrets.

  1. Set the instance name environment variable to the name that you want to use for the service instance.
    export INSTANCE=<Watson_Assistant_Instance_Name>
  2. If they are present, delete the following resources:
    oc delete job $INSTANCE-create-bucket-store-cos-job
    oc delete secret registry-$INSTANCE-clu-training-$INSTANCE-dwf-training
    oc delete job $INSTANCE-clu-training-update

Restoring services that do not support offline backup and restore

The following list shows the services that don't support offline backup and restore. If any of these services are installed in your Cloud Pak for Data deployment, do the appropriate steps to make them functional after a restore.

Data Gate
Data Gate synchronizes Db2 for z/OS data in real time. After Cloud Pak for Data is restored, data might be out of sync from Db2 for z/OS. It is recommended that you re-add tables after Cloud Pak for Data foundational services are restored.
MongoDB
The service must be deleted and reinstalled. Recreate the instance as a new instance, and then restore the data with MongoDB tools. For more information, see Installing the MongoDB service and Back Up and Restore with MongoDB Tools.
Watson Discovery

The service must be uninstalled, reinstalled, then the data restored.

Watson Speech services
The service is functional and you can re-import data. For more information, see Importing and exporting data.