Unable to restore Cloud Pak for Data volume backup

When you try to restore Cloud Pak for Data volume data from the backup, the unquiesce step does not complete.

Symptoms

During volume restore, when you run the cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INST_OPERANDS} command, the Common core services custom resource gets stuck in InMaintenance state.

Environment

This problem occurs when Cloud Pak for Data is deployed on NFS storage.

Diagnosing the problem

As you follow the steps to restore a volume backup, after you run the cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INST_OPERANDS} command, if after one hour the Common core services custom resource state shows InMaintenance, check if the wkc-unquiesce job was completed:

oc get ccs ccs-cr -n ${PROJECT_CPD_INST_OPERANDS}
oc get jobs -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-unquiesce

If the job shows 0/1, it did not complete.

Resolving the problem

To resolve the problem, do the following steps:

  1. Check the data directory in the three Elasticsearch server pods:
    oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep es-server
    oc exec <es-server-pod> -n ${PROJECT_CPD_INST_OPERANDS} -- bash -c 'ls /workdir/apps/elasticsearch/data'
  2. If an Elasticsearch server pod shows Stale file handle, restart it:
    oc delete <es-server-pod> -n ${PROJECT_CPD_INST_OPERANDS}
  3. Wait for the wkc-unquiesce job to complete. To confirm that the job completed, run:
    oc get jobs -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-unquiesce

    The job shows 1/1 when it is completed.

  4. Run the unquiesce command again.
    cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INST_OPERANDS}

    When the unquiesce job is completed, the Common core services custom resource state shows Completed.