Troubleshooting problems with the Cloud Pak for Data OADP backup and restore utility
Use the following commands and log files to troubleshoot problems when you are using the Cloud Pak for Data OpenShift® APIs for Data Protection (OADP) backup and restore utility.
Ensure that you source the environment variables before you run the commands in this task.
Commands
Run the following commands to troubleshoot online backup and restore pre-hooks and post-hooks.
- Run checkpoint backup pre-hooks only
-
cpd-cli oadp backup prehooks \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --hook-kind=checkpoint \ --log-level=debug \ --verbose
- Run checkpoint backup post-hooks only
-
cpd-cli oadp backup posthooks \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --hook-kind=checkpoint \ --log-level=debug \ --verbose
- Run checkpoint restore post-hooks only
-
cpd-cli oadp restore posthooks \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --hook-kind=checkpoint \ --log-level=debug \ --verbose
- If the Cloud Pak for Data OADP backup and restore utility terminates abnormally, release and clean up resources with the checkpoint reset command
-
cpd-cli oadp checkpoint reset \ --log-level=debug \ --verbose
Run the following commands to troubleshoot offline backup and restore pre-hooks and post-hooks.
To troubleshoot offline backup failures, split the offline backup command into three separate stages: pre-hooks to quiesce applications, backup, and post-hooks to unquiesce applications.
- Run backup pre-hooks only to investigate pre-hook errors
-
cpd-cli oadp backup prehooks \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --log-level=debug \ --verbose
- When backup pre-hooks are successful, run Velero backup only
-
Backup using CSI snapshot:
cpd-cli oadp backup create <backup_name> \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --exclude-resources='Event,Event.events.k8s.io' \ --snapshot-volumes \ --skip-hooks=true \ --log-level=debug \ --verbose
Backup using Restic:
cpd-cli oadp backup create <backup_name> \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --exclude-resources='Event,Event.events.k8s.io' \ --default-volumes-to-restic \ --snapshot-volumes=false \ --cleanup-completed-resources \ --skip-hooks=true \ --log-level=debug \ --verbose
- When Velero backup is successful, only bring up services by running backup post-hooks
-
cpd-cli oadp backup posthooks \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --log-level=debug \ --verbose
- Run restore post-hooks only to investigate errors
-
cpd-cli oadp restore posthooks \ --include-namespaces ${PROJECT_CPD_INSTANCE} \ --log-level=debug \ --verbose
Log files
- cpd-cli oadp and cpd-cli log
- The CPD-CLI*.log file is found in the
cpd-cli-workspace/logs directory. Errors during prehooks and posthooks are
captured in this log. Errors with the REST client are also captured in this log.
For additional tracing, add
--log-level=debug --verbose
when you run commands. - Restic pod logs
- Check the logs of the Restic pods in the oadp-operator project (namespace) for Restic errors.
- Velero backup and restore log
- Run
cpd-cli oadp backup logs <backup_name>
orcpd-cli oadp restore logs <restore_name>
to check for errors.Errors during Velero backup and restore are captured in these logs.
- Velero pod log
- Run
oc logs -n <oadp_project_name> deployment/velero
to check the Velero pod for Velero server errors. - REST server log
- Run
oc logs deploy/cpdbr-api -n ${PROJECT_CPD_INSTANCE}
to check for REST server errors.
Locked repository
A common problem is a locked repository, indicated by the following error message:
stderr=unable to create lock in backend: repository is already locked
To resolve the problem, do the following steps:
- Set the following environment
variables:
OADP_OPERATOR_NS=<OADP_operator_project> NODE_AGENT_POD=$(oc get pods -l name=node-agent -o jsonpath='{.items[0].metadata.name}' -n ${OADP_OPERATOR_NS})
- Connect to the OADP
container:
oc rsh -n ${OADP_OPERATOR_NS} ${NODE_AGENT_POD}
- Unlock the
repository:
restic unlock -r s3:https://<server>:<port>/<bucket>/<prefix>/restic/${PROJECT_CPD_INSTANCE} --insecure-tls --remove-all