Backing up Cloud Pak for Data volumes to a persistent volume claim or object store

You can do an offline backup of all persistent volumes (PVs) in your IBM Cloud Pak® for Data deployment to a separate PersistentVolumeClaim (PVC) or an S3 or S3-compatible object store with the Cloud Pak for Data volume backup and restore utility.

Before you begin

Cloud Pak for Data provides the cpd-cli backup-restore command-line interface for backing up and restoring PVs. If your cluster is in a restricted network, ensure that you completed the steps in Moving images for the Cloud Pak for Data volume backup and restore utility to the private container registry before you run any cpd-cli backup-restore commands.

The cpd-cli backup-restore command-line interface requires a cluster administrator or similar role that is able to create, read, write, and delete Kubernetes resources, such as deployments, StatefulSets, cronjobs, jobs, replicasets, configmaps, secrets, pods, namespaces, persistent volume claims (PVCs), and PVs.

If Cloud Pak for Data is installed on NFS, NFS storage must be configured with no_root_squash.

Volume backup of Cloud Pak for Data installed on Amazon Elastic Block Store storage classes is not supported.

Note: You can create volume backups only of the Cloud Pak for Data instance project (namespace). You cannot create volume backups of Cloud Pak for Data foundational services or operators projects (for example, Cloud Pak for Data common core services).

About this task

The cpd-cli backup-restore command-line interface backs up and restores volume data in the same project and installation, and assumes that Kubernetes objects are still in place.

When you are storing data in a separate PVC, it is recommended that the PVC is backed by a remote volume to ensure its availability.

Important: Backing up persistent volumes alone is not sufficient for disaster recovery purposes because Kubernetes objects like secrets are needed along with volume data to restore applications in a project.

During the backup process, write operations in application workloads are suspended (quiesced) so that you can do backups or other maintenance activities. The quiesce command calls hooks that are provided by Cloud Pak for Data services to do the quiesce. Quiesce hooks that are provided by Cloud Pak for Data services offer optimizations or other enhancements compared to scaling down all resources in the project. Services might be quiesced and unquiesced in a certain order, or services might be suspended without having to bring down pods to reduce the time it takes to bring down applications and bring them back up.

You can back up volumes in two ways:
  1. Manually scale down resources, back up volumes, and then manually scale up resources.
  2. Automatically scale down resources, back up volumes, and automatically scale up resources with a single command.
Tip: It is a good idea to manually scale down application Kubernetes resources before you do a backup so that you can find out whether a pod's services cannot scale down correctly. You can then do the backup after you fix any problems that were found.
Best practice: You can run the commands in this task exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

For more information about the Cloud Pak for Data volume backup and restore utility, including a list of commands that you can run, see the cpd-cli backup-restore reference documentation.

Procedure

  1. Initialize cpd-cli backup-restore.
    Note: If your Docker image registry is different than what is shown in the following examples, change the appropriate options.

    The following command is an example that initializes cpd-cli backup-restore when you are using an S3 object store to store the backups, and the cluster has access to icr.io/cpopen/cpd.

    # Initialize the cpdbr first with pvc name and s3 storage.  Note that the bucket must exist.
    # Example for cluster with access to ICR
    $ cpd-cli backup-restore init \
    --namespace $NAMESPACE \
    --pvc-name cpdbr-pvc \
    --image-prefix=icr.io/cpopen/cpd \
    --provider=s3 \
    --s3-endpoint="s3 endpoint" \
    --s3-bucket=cpdbr \
    --s3-prefix=$NAMESPACE/

    The following command is an example that initializes cpd-cli backup-restore when you are using an S3 object store to store the backups, in an environment that uses a private image registry, such as when your cluster is air-gapped.

    # Example for air-gapped environment
    $ cpd-cli backup-restore init \
    --namespace $NAMESPACE \
    --pvc-name cpdbr-pvc \
    --image-prefix=${PRIVATE_REGISTRY_LOCATION} \
    --provider=s3 \
    --s3-endpoint="s3 endpoint" \
    --s3-bucket=cpdbr \
    --s3-prefix=$NAMESPACE/
    

    The following command is an example that initializes cpd-cli backup-restore when you are using a separate PVC to store the backups, and the cluster has access to icr.io/cpopen/cpd.

    # Example for cluster with access to ICR
    cpd-cli backup-restore init \
    --namespace $NAMESPACE \
    --log-level=debug \
    --verbose \
    --pvc-name cpdbr-pvc \ 
    --image-prefix=icr.io/cpopen/cpd \
    --provider=local

    The following command is an example that initializes cpd-cli backup-restore when you are using a separate PVC to store the backups, in an environment that uses a private image registry, such as when your cluster is air-gapped.

    # Example for air-gapped environment
    cpd-cli backup-restore init \
    --namespace $NAMESPACE \
    --log-level=debug \
    --verbose \
    --pvc-name cpdbr-pvc \ 
    --image-prefix=${PRIVATE_REGISTRY_LOCATION} \
    --provider=local
  2. To back up volumes by manually scaling down resources, backing up volumes, and manually scaling up resources, do the following steps.
    1. Manually scale down application Kubernetes resources:
      cpd-cli backup-restore quiesce -n ${PROJECT_CPD_INSTANCE}

      If you want to scale down all resources, include the --force option.

    2. Check for completed jobs and pods by running the volume backup command with the --dry-run option, specifying a backup name identifier.

      The --dry-run option reports jobs or pods that are still attached to the PVCs to be backed up.

      Note: The backup name identifier must consist of lowercase alphanumeric characters or the hyphen (-), and must start and end with an alphanumeric character. The underscore character (_) is not supported.
      cpd-cli backup-restore volume-backup create <backup_name> -n ${PROJECT_CPD_INSTANCE} --dry-run
    3. If the dry run reports completed or failed jobs, or pods, that reference PVCs, delete them.
      Tip: Consider saving the job or pod yaml before you manually delete them, or include the --cleanup-completed-resources option in the backup step.
    4. Run the backup command with the --skip-quiesce option:
      cpd-cli backup-restore volume-backup create <backup_name> -n ${PROJECT_CPD_INSTANCE} --skip-quiesce=true
      Notes:

      If you run another backup job with the same backup name, an incremental backup occurs. If you specify a new backup name, a full volume backup occurs.

      With certain storage providers, Kubernetes resources must be scaled down to unmount the PVCs before you create the backup. In such scenarios, the volume-backup create command with the --skip-quiesce option can fail if pods are running with mounted PVCs. If this problem occurs, use the quiesce command with the --force option to scale down the resources, and rerun the volume-backup create command with the --skip-quiesce option. You can then scale up the Kubernetes resources after backup by using the unquiesce command.

    5. Manually scale up application Kubernetes resources:
      cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INSTANCE}
  3. To automatically scale down resources, back up volumes, and automatically scale up resources, do the following steps.
    1. Run the following volume backup command, specifying a backup name identifier.
      Note: The backup name identifier must consist of lowercase alphanumeric characters or the hyphen (-), and must start and end with an alphanumeric character. The underscore character (_) is not supported.
      cpd-cli backup-restore volume-backup create <backup_name> -n ${PROJECT_CPD_INSTANCE}
      Note: When you initiate a new backup with the same backup name identifier, only the incremental changes are updated to this backup.
    2. If the backup fails because of completed or failed jobs, or pods, that reference PVCs, delete them, and rerun the backup command.
      Tip: Consider saving the job or pod yaml before you manually delete them, or include the --cleanup-completed-resources option in the backup command.
    3. If the backup does not automatically scale up resources because of a previous failure, manually scale up resources:
      cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INSTANCE}
  4. To check the status of a backup job, run the following command:
    cpd-cli backup-restore volume-backup status <backup_name> -n ${PROJECT_CPD_INSTANCE}
  5. To view a list of existing volume backups, run the following command:
    cpd-cli backup-restore volume-backup list -n ${PROJECT_CPD_INSTANCE}
  6. To view the size of the backup, running the following command:
    cpd-cli backup-restore volume-backup list --details -n ${PROJECT_CPD_INSTANCE}
  7. To get the logs of a volume backup, run the following command:
    cpd-cli backup-restore volume-backup logs <backup_name> -n ${PROJECT_CPD_INSTANCE}
  8. Optional: After the volume backup is complete, clean up cpd-cli backup-restore (delete the cpd-cli backup-restore deployment and other metadata) by running the following command:
    cpd-cli backup-restore reset -n ${PROJECT_CPD_INSTANCE} --force
  9. If you stopped all Data Refinery runtimes and jobs before you created the backup, restart the service by running the following command.

    The value of <number_of_replica> depends on the scaleConfig setting when Data Refinery was installed (1 for small, 3 for medium, and 4 for large).

    oc scale --replicas=<number_of_replica> deploy wdp-shaper wdp-dataprep