Unable to restore Cloud Pak for Data volume backup
When you try to restore Cloud Pak for Data volume data from the backup, the unquiesce step does not complete.
Symptoms
During volume restore, when you run thecpd-cli backup-restore unquiesce -n
${PROJECT_CPD_INST_OPERANDS}
command, the Common core services custom resource gets stuck in
InMaintenance
state.Environment
This problem occurs when Cloud Pak for Data is deployed on NFS storage.
Diagnosing the problem
As you follow the steps to restore a volume backup, after you run the cpd-cli
backup-restore unquiesce -n ${PROJECT_CPD_INST_OPERANDS}
command, if after one
hour the Common core services custom resource state
shows InMaintenance
, check if the wkc-unquiesce job was
completed:
oc get ccs ccs-cr -n ${PROJECT_CPD_INST_OPERANDS}
oc get jobs -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-unquiesce
If the job shows 0/1
, it did not complete.
Resolving the problem
To resolve the problem, do the following steps:
- Check the data directory in the three Elasticsearch server
pods:
oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep es-server oc exec <es-server-pod> -n ${PROJECT_CPD_INST_OPERANDS} -- bash -c 'ls /workdir/apps/elasticsearch/data'
- If an Elasticsearch server pod shows
Stale file handle
, restart it:oc delete <es-server-pod> -n ${PROJECT_CPD_INST_OPERANDS}
- Wait for the wkc-unquiesce job to complete. To confirm that the job
completed,
run:
oc get jobs -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-unquiesce
The job shows
1/1
when it is completed. - Run the unquiesce command
again.
cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INST_OPERANDS}
When the unquiesce job is completed, the Common core services custom resource state shows
Completed
.