Post-restore tasks after restoring a Cloud Pak for Data online backup

Complete extra tasks for some services after you restore an IBM Cloud Pak® for Data deployment from an online backup.


IBM Knowledge Catalog metadata import jobs

After Cloud Pak for Data is restored, long running metadata import jobs might not resume. The job run status might still be Running, even though the actual import job isn't running. The job must be canceled and manually restarted. You can cancel and restart a job in IBM Knowledge Catalog or by using an API call.

To cancel and restart a job in IBM Knowledge Catalog, do the following steps.

  1. Go to a Jobs page, either the general one or the one for the project that contains the metadata import asset.
  2. Look for the job and cancel it.
  3. Restart the job.

To cancel and restart a job by using an API call, run the following command. You must have the Admin role to use this API call.

post /v2/metadata_imports/recover_task

The request payload must look like the following example:

{
  "recovery_date": "2022-05-05T01:00:00Z",
  "pending_type": "running"
}

For recovery_date, specify the date when IBM Knowledge Catalog was restored from the backup image. Any jobs that were started before the specified date are restarted automatically.



IBM Knowledge Catalog metadata enrichment jobs

After Cloud Pak for Data is restored, running metadata enrichment jobs might not complete successfully. Such jobs must be manually restarted.

To restart a metadata enrichment job, do the following steps.

  1. In IBM Knowledge Catalog, open the project that contains the metadata enrichment asset.
  2. Select the asset.
  3. Click the Button to start or delete an enrichment job. button of the asset and then click Enrich to start a new enrichment job.


IBM Knowledge Catalog lineage data import jobs

If a lineage data import job is running at the same time that an online backup is taken, the job is in a Complete state when the backup is restored. However, users cannot see lineage data in the catalog. Rerun the lineage import job.



watsonx Assistant

After restoring the watsonx Assistant backup, it is necessary to retrain the existing skills. This involves modifying a skill, to trigger training. The training process for a skill typically requires less than 10 minutes to complete. For more information, see the Retraining your backend model section in the IBM Cloud documentation.

In addition, create the secrets that watsonx Assistant uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway. Then delete the following resources that connect to Multicloud Object Gateway, if they are present. After these resources are deleted, they are recreated with the updated object store secrets.

  1. Set the instance name environment variable to the name that you want to use for the service instance.
    export INSTANCE=<Watson_Assistant_Instance_Name>
  2. If they are present, delete the following resources:
    oc delete job $INSTANCE-create-bucket-store-cos-job
    oc delete secret registry-$INSTANCE-clu-training-$INSTANCE-dwf-training
    oc delete job $INSTANCE-clu-training-update


Watson Discovery

Complete the restore process by doing the following steps:

  1. Create the secrets that Watson™ Discovery uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway.
  2. 4.8.2 If you are using Cloud Pak for Data 4.8.2, delete the wd-discovery-s3-bucket-job job:
    oc delete job wd-discovery-s3-bucket-job -n ${PROJECT_CPD_INST_OPERANDS}


Watson Machine Learning Accelerator

Complete additional steps to restore owner references to all Watson Machine Learning Accelerator resources. For more information, see Backing up and restoring Watson Machine Learning Accelerator.



Watson OpenScale
Some Watson OpenScale features, such as scheduled or on-demand model evaluations, might not function properly after a restore. Do the following steps to verify that Watson OpenScale runs correctly:
  1. Log in to Red Hat® OpenShift® Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Query the Watson OpenScale operator pod name with the following command:
    OPERATOR_POD_NAME=$(oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep wos | awk {'print $1'})
  3. Run the post-restore script in the operator by specifying the required arguments with the following command:
    instanceCRName='aiopenscale'
    
    oc exec ${OPERATOR_POD_NAME} -n ${PROJECT_CPD_INST_OPERATORS} -- /opt/ansible/roles/service/files/post_restore.sh -c ${instanceCRName} -n ${PROJECT_CPD_INST_OPERANDS}

    If you did not use aiopenscale as the name of the Watson OpenScale custom resource, specify the correct value in instanceCRName.

  4. Check the status of the Watson OpenScale custom resource reconciliation with the following command:
    oc get WOService ${instanceCRName} -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.status.wosStatus} {"\n"}'

    The status of the custom resource changes to Completed when the reconciliation finishes successfully.

In addition, the cpd-external-route key in *-aios-service-secrets must be updated to point to the target cluster external route. This update is needed to make sure that the Watson OpenScale Dashboard URL that is included in the notification email and the published metrics CSV is valid and correct. Do the following steps:
  1. Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to complete the task.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Change to the project where Analytics Engine powered by Apache Spark was installed:
    oc project <Project>
  3. Describe the mrm pod and look for the secretKeyRef.name set for the ICP_ROUTE environment variable:
    oc get pods | grep aios-mrm
    oc describe pod <pod-name>
    For example:
    oc describe pod aiopenscale-ibm-aios-mrm-87c9d8bfc-8fr6t
     - name: ICP_ROUTE
          valueFrom:
            secretKeyRef:
              key: cpd-external-route
              name: aiopenscale-ibm-aios-service-secrets
  4. Get the value of the cpd-external-route key from the secret:
    oc get secret secret-name -o jsonpath={.data.cpd-external-route} | base64 -d

    The output of this command is the source cluster hostname. Change it to target server host name in encoded format.

  5. Update the cpd-external-route key in the secret:
    oc edit secret secret-name

    This command opens a vi editor. Replace the cpd-external-route value with the base64 encoded value of the target cluster, and save the secret by exiting the vi editor. You can obtain the encoded value of the target cluster URL by using the base64 command or by using the Base64 Encode and Decode website.

  6. Restart the mrm pod:
    oc delete pod mrm-pod-name
    Note: The oc delete pod command brings up a new pod. Make sure that pod is back up and running.

The Spark connection (also called integrated_system in Watson OpenScale) that is created for Analytics Engine powered by Apache Spark must be updated with the new apikey of the target cluster. For more information, see Step 2 in Configuring the batch processor in Watson OpenScale.



Watson Speech services

Some Watson Speech services pods might be in an Error state because they cannot connect to Multicloud Object Gateway. Do the following steps:

  1. Create the secrets that Watson Speech services uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway.
  2. To enable the upload models and voices job pods to run again with the updated secrets, delete them:
    oc get po -l 'app.kubernetes.io/component in (stt-models, tts-voices)' -n ${PROJECT_CPD_INST_OPERANDS} | grep ${CUSTOM_RESOURCE_SPEECH}


Services that do not support online backup and restore

The following list shows the services that don't support online backup and restore. If any of these services are installed in your Cloud Pak for Data deployment, actions must be taken after an online backup is restored to make them functional.

Db2® Data Gate
Db2 Data Gate synchronizes Db2 for z/OS® data in real time. After Cloud Pak for Data is restored, data might be out of sync from Db2 for z/OS. It is recommended that you re-add tables after Cloud Pak for Data foundational services are restored.
MANTA Automated Data Lineage
The service is functional and data can be re-imported. For information about importing data, see Managing existing metadata imports (IBM Knowledge Catalog).
MongoDB
The service must be deleted and reinstalled. Recreate the instance as a new instance, and then restore the data with MongoDB tools. For more information, see Installing the MongoDB service and Back Up and Restore with MongoDB Tools.