You can do an offline backup of all persistent volumes (PVs) in your IBM Cloud Pak® for Data deployment to a separate
PersistentVolumeClaim (PVC) or an S3 or S3-compatible object store with the Cloud Pak for Data volume backup and restore utility.
Before you begin
Cloud Pak for Data provides the cpd-cli
backup-restore command-line interface for backing up and restoring PVs. If your cluster is
in a restricted network, ensure that you completed the steps in Moving images for the Cloud Pak for Data volume backup and restore utility to the private container registry
before you run any cpd-cli backup-restore commands.
The cpd-cli backup-restore command-line interface requires a cluster
administrator or similar role that is able to create, read, write, and delete Kubernetes resources, such as deployments, StatefulSets,
cronjobs, jobs, replicasets, configmaps, secrets, pods, namespaces,
persistent volume claims (PVCs), and PVs.
If Cloud Pak for Data is installed on NFS, NFS storage must be configured with
no_root_squash
.
Volume backup of Cloud Pak for Data installed on
Amazon Elastic Block Store storage classes is not
supported.
Note: You can create volume backups only of the Cloud Pak for Data instance project (namespace). You cannot create
volume backups of Cloud Pak for Data foundational
services or operators projects (for example, Cloud Pak for Data common core services).
About this task
The cpd-cli
backup-restore command-line interface backs up and restores volume data in the same
project and installation, and assumes that Kubernetes objects are still in place.
When you are storing data in a separate PVC, it is recommended that the PVC is backed by a remote
volume to ensure its availability.
Important: Backing up persistent volumes alone is not sufficient for disaster recovery
purposes because Kubernetes objects like secrets are needed along with volume data to restore applications in a
project.
During the backup process, write operations in application workloads are suspended (quiesced) so
that you can do backups or other maintenance activities. The quiesce command
calls hooks that are provided by Cloud Pak for Data
services to do the quiesce. Quiesce hooks that are provided by Cloud Pak for Data services offer optimizations or other
enhancements compared to scaling down all resources in the project. Services might be quiesced and
unquiesced in a certain order, or services might be suspended without having to bring down pods to
reduce the time it takes to bring down applications and bring them back up.
You can back up volumes in two ways:
- Manually scale down resources, back up volumes, and then manually scale up resources.
- Automatically scale down resources, back up volumes, and automatically scale up resources with a
single command.
Tip: It is a good idea to manually scale down application Kubernetes resources before you do a backup so that you can
find out whether a pod's services cannot scale down correctly. You can then do the backup after you
fix any problems that were found.
Best practice: You can run the commands in
this task exactly as written if you set up environment variables. For instructions, see
Setting up installation environment variables.
Ensure that you source the environment variables
before you run the commands in this task.
For more information about the Cloud Pak for Data
volume backup and restore utility, including a list of commands that you can run, see the cpd-cli backup-restore reference
documentation.
Procedure
- Initialize cpd-cli backup-restore.
Note: If your Docker image registry is
different than what is shown in the following examples, change the appropriate options.
The following command is an example that initializes cpd-cli backup-restore
when you are using an S3 object store to store the backups, and the cluster has access to icr.io/cpopen/cpd.
# Initialize the cpdbr first with pvc name and s3 storage. Note that the bucket must exist.
# Example for cluster with access to ICR
$ cpd-cli backup-restore init \
--namespace $NAMESPACE \
--pvc-name cpdbr-pvc \
--image-prefix=icr.io/cpopen/cpd \
--provider=s3 \
--s3-endpoint="s3 endpoint" \
--s3-bucket=cpdbr \
--s3-prefix=$NAMESPACE/
The following command is an example that initializes cpd-cli backup-restore
when you are using an S3 object store to store the backups, in an environment that uses a private
image registry, such as when your cluster is air-gapped.
# Example for air-gapped environment
$ cpd-cli backup-restore init \
--namespace $NAMESPACE \
--pvc-name cpdbr-pvc \
--image-prefix=${PRIVATE_REGISTRY_LOCATION} \
--provider=s3 \
--s3-endpoint="s3 endpoint" \
--s3-bucket=cpdbr \
--s3-prefix=$NAMESPACE/
The following command is an example that initializes cpd-cli backup-restore
when you are using a separate PVC to store the backups, and the cluster has access to icr.io/cpopen/cpd.
# Example for cluster with access to ICR
cpd-cli backup-restore init \
--namespace $NAMESPACE \
--log-level=debug \
--verbose \
--pvc-name cpdbr-pvc \
--image-prefix=icr.io/cpopen/cpd \
--provider=local
The following command is an example that initializes cpd-cli backup-restore
when you are using a separate PVC to store the backups, in an environment that uses a private image
registry, such as when your cluster is air-gapped.
# Example for air-gapped environment
cpd-cli backup-restore init \
--namespace $NAMESPACE \
--log-level=debug \
--verbose \
--pvc-name cpdbr-pvc \
--image-prefix=${PRIVATE_REGISTRY_LOCATION} \
--provider=local
- To back up volumes by manually scaling down resources, backing up volumes, and manually
scaling up resources, do the following steps.
- Manually scale down application Kubernetes resources:
cpd-cli backup-restore quiesce -n ${PROJECT_CPD_INSTANCE}
If you want to scale down all resources, include the --force
option.
- Check for completed jobs and pods by running the volume backup command with the
--dry-run
option, specifying a backup name identifier.
The --dry-run
option reports jobs or pods that are still attached to the PVCs to
be backed up.
Note: The backup name identifier must consist of lowercase alphanumeric characters or the hyphen
(-), and must start and end with an alphanumeric character. The underscore character (_) is not
supported.
cpd-cli backup-restore volume-backup create <backup_name> -n ${PROJECT_CPD_INSTANCE} --dry-run
- If the dry run reports completed or failed jobs, or pods, that reference PVCs, delete
them.
Tip: Consider saving the job or pod yaml before you manually delete them, or include the
--cleanup-completed-resources option in the backup step.
- Run the backup command with the
--skip-quiesce
option:
cpd-cli backup-restore volume-backup create <backup_name> -n ${PROJECT_CPD_INSTANCE} --skip-quiesce=true
Notes:
If you run another backup job with the same backup name, an incremental backup occurs. If you
specify a new backup name, a full volume backup occurs.
With certain storage providers, Kubernetes
resources must be scaled down to unmount the PVCs before you create the backup. In such scenarios,
the volume-backup create command with the --skip-quiesce
option
can fail if pods are running with mounted PVCs. If this problem occurs, use the
quiesce command with the --force
option to scale down the
resources, and rerun the volume-backup create command with the
--skip-quiesce
option. You can then scale up the Kubernetes resources after backup by using the
unquiesce command.
- Manually scale up application Kubernetes resources:
cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INSTANCE}
- To automatically scale down resources, back up volumes, and automatically scale up
resources, do the following steps.
- Run the following volume backup command, specifying a backup name identifier.
Note: The backup name identifier must consist of lowercase alphanumeric characters or the hyphen
(-), and must start and end with an alphanumeric character. The underscore character (_) is not
supported.
cpd-cli backup-restore volume-backup create <backup_name> -n ${PROJECT_CPD_INSTANCE}
Note: When you initiate a new backup with the same backup name identifier, only the incremental
changes are updated to this backup.
- If the backup fails because of completed or failed jobs, or pods, that reference PVCs,
delete them, and rerun the backup command.
Tip: Consider saving the job or pod yaml before you manually delete them, or include the
--cleanup-completed-resources
option in the backup command.
- If the backup does not automatically scale up resources because of a previous failure,
manually scale up resources:
cpd-cli backup-restore unquiesce -n ${PROJECT_CPD_INSTANCE}
-
To check the status of a backup job, run the following command:
cpd-cli backup-restore volume-backup status <backup_name> -n ${PROJECT_CPD_INSTANCE}
- To view a list of existing volume backups, run the following command:
cpd-cli backup-restore volume-backup list -n ${PROJECT_CPD_INSTANCE}
- To view the size of the backup, running the following command:
cpd-cli backup-restore volume-backup list --details -n ${PROJECT_CPD_INSTANCE}
- To get the logs of a volume backup, run the following command:
cpd-cli backup-restore volume-backup logs <backup_name> -n ${PROJECT_CPD_INSTANCE}
- Optional: After the volume backup is complete, clean up cpd-cli
backup-restore (delete the cpd-cli backup-restore deployment and other
metadata) by running the following command:
cpd-cli backup-restore reset -n ${PROJECT_CPD_INSTANCE} --force
- If you
stopped all Data Refinery runtimes and jobs
before you created the backup, restart the service by
running the following command.
The value of <number_of_replica>
depends on the
scaleConfig
setting when Data Refinery was installed (1 for small, 3 for
medium, and 4 for large).
oc scale --replicas=<number_of_replica> deploy wdp-shaper wdp-dataprep