Post-restore tasks after restoring a Cloud Pak for Data online backup
Complete extra tasks for some services after you restore an IBM Cloud Pak® for Data deployment from an online backup.
IBM Knowledge Catalog metadata import jobs
After Cloud Pak for Data is restored, long running
metadata import jobs might not resume. The job run status might still be Running
,
even though the actual import job isn't running. The job must be canceled and manually restarted.
You can cancel and restart a job in IBM Knowledge Catalog
or by using an API call.
To cancel and restart a job in IBM Knowledge Catalog, do the following steps.
- Go to a Jobs page, either the general one or the one for the project that contains the metadata import asset.
- Look for the job and cancel it.
- Restart the job.
To cancel and restart a job by using an API call, run the following command. You must have the Admin role to use this API call.
post /v2/metadata_imports/recover_task
The request payload must look like the following example:
{
"recovery_date": "2022-05-05T01:00:00Z",
"pending_type": "running"
}
For recovery_date
, specify the date when IBM Knowledge Catalog was restored from the backup image. Any jobs
that were started before the specified date are restarted automatically.
IBM Knowledge Catalog metadata enrichment jobs
After Cloud Pak for Data is restored, running metadata enrichment jobs might not complete successfully. Such jobs must be manually restarted.
To restart a metadata enrichment job, do the following steps.
- In IBM Knowledge Catalog, open the project that contains the metadata enrichment asset.
- Select the asset.
- Click the button of the asset and then click Enrich to start a new enrichment job.
IBM Knowledge Catalog lineage data import jobs
If a lineage data import job is running at the same time that an online backup is taken, the job is in a Complete state when the backup is restored. However, users cannot see lineage data in the catalog. Rerun the lineage import job.
watsonx Assistant
After restoring the watsonx Assistant backup, it is necessary to retrain the existing skills. This involves modifying a skill, to trigger training. The training process for a skill typically requires less than 10 minutes to complete. For more information, see the Retraining your backend model section in the IBM Cloud documentation.
In addition, create the secrets that watsonx Assistant uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway. Then delete the following resources that connect to Multicloud Object Gateway, if they are present. After these resources are deleted, they are recreated with the updated object store secrets.
- Set the instance name environment variable to the name that you want to use for the service
instance.
export INSTANCE=<Watson_Assistant_Instance_Name>
- If they are present, delete the following
resources:
oc delete job $INSTANCE-create-bucket-store-cos-job
oc delete secret registry-$INSTANCE-clu-training-$INSTANCE-dwf-training
oc delete job $INSTANCE-clu-training-update
Watson Discovery
Complete the restore process by doing the following steps:
- Create the secrets that Watson™ Discovery uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway.
- 4.8.2 If you are
using Cloud Pak for Data
4.8.2, delete the wd-discovery-s3-bucket-job
job:
oc delete job wd-discovery-s3-bucket-job -n ${PROJECT_CPD_INST_OPERANDS}
Watson Machine Learning Accelerator
Complete additional steps to restore owner references to all Watson Machine Learning Accelerator resources. For more information, see Backing up and restoring Watson Machine Learning Accelerator.
Watson OpenScale
-
Log in to Red Hat® OpenShift® Container Platform as a cluster administrator.
${OC_LOGIN}
Remember:OC_LOGIN
is an alias for theoc login
command. - Query the Watson
OpenScale operator pod name with
the following command:
OPERATOR_POD_NAME=$(oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep wos | awk {'print $1'})
- Run the post-restore script in the operator by specifying the required arguments with the
following command:
instanceCRName='aiopenscale' oc exec ${OPERATOR_POD_NAME} -n ${PROJECT_CPD_INST_OPERATORS} -- /opt/ansible/roles/service/files/post_restore.sh -c ${instanceCRName} -n ${PROJECT_CPD_INST_OPERANDS}
If you did not use
aiopenscale
as the name of the Watson OpenScale custom resource, specify the correct value ininstanceCRName
. - Check the status of the Watson
OpenScale custom
resource reconciliation with the following command:
oc get WOService ${instanceCRName} -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.status.wosStatus} {"\n"}'
The status of the custom resource changes to
Completed
when the reconciliation finishes successfully.
cpd-external-route key
in *-aios-service-secrets
must be updated
to point to the target cluster external route. This update is needed to make sure that the Watson
OpenScale Dashboard URL that is included in the
notification email and the published metrics CSV is valid and correct. Do the following steps:-
Log in to Red Hat OpenShift Container Platform as a user with sufficient permissions to complete the task.
${OC_LOGIN}
Remember:OC_LOGIN
is an alias for theoc login
command. - Change to the project where Analytics Engine powered by Apache Spark was
installed:
oc project <Project>
- Describe the mrm pod and look for the
secretKeyRef.name
set for the ICP_ROUTE environment variable:oc get pods | grep aios-mrm oc describe pod <pod-name>
For example:oc describe pod aiopenscale-ibm-aios-mrm-87c9d8bfc-8fr6t - name: ICP_ROUTE valueFrom: secretKeyRef: key: cpd-external-route name: aiopenscale-ibm-aios-service-secrets
- Get the value of the
cpd-external-route
key from the secret:oc get secret secret-name -o jsonpath={.data.cpd-external-route} | base64 -d
The output of this command is the source cluster hostname. Change it to target server host name in encoded format.
- Update the
cpd-external-route
key in the secret:oc edit secret secret-name
This command opens a
vi
editor. Replace thecpd-external-route
value with the base64 encoded value of the target cluster, and save the secret by exiting thevi
editor. You can obtain the encoded value of the target cluster URL by using thebase64
command or by using the Base64 Encode and Decode website. - Restart the mrm
pod:
oc delete pod mrm-pod-name
Note: Theoc delete pod
command brings up a new pod. Make sure that pod is back up and running.
The Spark connection (also called integrated_system
in
Watson
OpenScale) that is created for Analytics Engine powered by Apache Spark must be updated with the new
apikey
of the target cluster. For more information, see Step 2 in Configuring the batch
processor in Watson
OpenScale.
Watson Speech services
Some Watson Speech services pods might be in an
Error
state because they cannot connect to Multicloud Object Gateway. Do the following steps:
- Create the secrets that Watson Speech services uses to connect to Multicloud Object Gateway. For details, see Creating secrets for services that use Multicloud Object Gateway.
- To enable the upload models and voices job pods to run again with the updated secrets, delete
them:
oc get po -l 'app.kubernetes.io/component in (stt-models, tts-voices)' -n ${PROJECT_CPD_INST_OPERANDS} | grep ${CUSTOM_RESOURCE_SPEECH}
Services that do not support online backup and restore
The following list shows the services that don't support online backup and restore. If any of these services are installed in your Cloud Pak for Data deployment, actions must be taken after an online backup is restored to make them functional.
- Db2® Data Gate
- Db2 Data Gate synchronizes Db2 for z/OS® data in real time. After Cloud Pak for Data is restored, data might be out of sync from Db2 for z/OS. It is recommended that you re-add tables after Cloud Pak for Data foundational services are restored.
- MANTA Automated Data Lineage
- The service is functional and data can be re-imported. For information about importing data, see Managing existing metadata imports (IBM Knowledge Catalog).
- MongoDB
- The service must be deleted and reinstalled. Recreate the instance as a new instance, and then restore the data with MongoDB tools. For more information, see Installing the MongoDB service and Back Up and Restore with MongoDB Tools.