Troubleshooting problems with the Cloud Pak for Data OADP backup and restore utility

Use the following commands and log files to troubleshoot problems when you are using the Cloud Pak for Data OpenShift® APIs for Data Protection (OADP) backup and restore utility.

Best practice: You can run the commands in this task exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

Commands

Run the following commands to troubleshoot online backup and restore pre-hooks and post-hooks.

Run checkpoint backup pre-hooks only
cpd-cli oadp backup prehooks \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--hook-kind=checkpoint \
--log-level=debug \
--verbose
Run checkpoint backup post-hooks only
cpd-cli oadp backup posthooks \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--hook-kind=checkpoint \
--log-level=debug \
--verbose
Run checkpoint restore post-hooks only
cpd-cli oadp restore posthooks \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--hook-kind=checkpoint \
--log-level=debug \
--verbose
If the Cloud Pak for Data OADP backup and restore utility terminates abnormally, release and clean up resources with the checkpoint reset command
cpd-cli oadp checkpoint reset \
--log-level=debug \
--verbose

Run the following commands to troubleshoot offline backup and restore pre-hooks and post-hooks.

To troubleshoot offline backup failures, split the offline backup command into three separate stages: pre-hooks to quiesce applications, backup, and post-hooks to unquiesce applications.

Run backup pre-hooks only to investigate pre-hook errors
cpd-cli oadp backup prehooks \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--log-level=debug \
--verbose
When backup pre-hooks are successful, run Velero backup only

Backup using CSI snapshot:

cpd-cli oadp backup create <backup_name> \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--exclude-resources='Event,Event.events.k8s.io' \
--snapshot-volumes \
--skip-hooks=true \
--log-level=debug \
--verbose

Backup using Restic:

cpd-cli oadp backup create <backup_name> \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--exclude-resources='Event,Event.events.k8s.io' \
--default-volumes-to-restic \
--snapshot-volumes=false \
--cleanup-completed-resources \
--skip-hooks=true \
--log-level=debug \
--verbose
When Velero backup is successful, only bring up services by running backup post-hooks
cpd-cli oadp backup posthooks \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--log-level=debug \
--verbose
Run restore post-hooks only to investigate errors
cpd-cli oadp restore posthooks \
--include-namespaces ${PROJECT_CPD_INSTANCE} \
--log-level=debug \
--verbose

Log files

cpd-cli oadp and cpd-cli log
The CPD-CLI*.log file is found in the cpd-cli-workspace/logs directory. Errors during prehooks and posthooks are captured in this log. Errors with the REST client are also captured in this log.

For additional tracing, add --log-level=debug --verbose when you run commands.

Restic pod logs
Check the logs of the Restic pods in the oadp-operator project (namespace) for Restic errors.
Velero backup and restore log
Run cpd-cli oadp backup logs <backup_name> or cpd-cli oadp restore logs <restore_name> to check for errors.

Errors during Velero backup and restore are captured in these logs.

Velero pod log
Run oc logs -n <oadp_project_name> deployment/velero to check the Velero pod for Velero server errors.
REST server log
Run oc logs deploy/cpdbr-api -n ${PROJECT_CPD_INSTANCE} to check for REST server errors.

Locked repository

A common problem is a locked repository, indicated by the following error message:

stderr=unable to create lock in backend: repository is already locked

To resolve the problem, do the following steps:

  1. Set the following environment variables:
    OADP_OPERATOR_NS=<OADP_operator_project>
    NODE_AGENT_POD=$(oc get pods -l name=node-agent -o jsonpath='{.items[0].metadata.name}' -n ${OADP_OPERATOR_NS})
  2. Connect to the OADP container:
    oc rsh -n ${OADP_OPERATOR_NS} ${NODE_AGENT_POD}
  3. Unlock the repository:
    restic unlock -r s3:https://<server>:<port>/<bucket>/<prefix>/restic/${PROJECT_CPD_INSTANCE} --insecure-tls --remove-all
Tip: For more information about this problem, see Unlock a locked repository using the restic backup tool.

Related information

Troubleshooting problems with backup and restore