Creating volume snapshots

If you are using Portworx storage, you can back up all persistent volumes (PVs) in your IBM® Cloud Pak for Data deployment by creating volume snapshots with the Cloud Pak for Data volume backup and restore utility.

Before you begin

To create snapshots, the cpd-cli backup-restore command-line interface requires a cluster administrator or similar role that is able to create, read, write, and delete Stork CRDs and other Kubernetes resources, such as deployments, StatefulSets, cronjobs, jobs, replicasets, configmaps, secrets, pods, namespaces, persistent volume claims (PVCs), and PVs.

To run snapshot-related commands, your cluster must also meet the following requirements:

  • The minimum version of Portworx that Cloud Pak for Data supports. For more information, see Storage considerations.

    To check the Portworx version, run the following commands:

    PX_POD=$(oc get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
    oc exec -it $PX_POD -n kube-system -- /opt/pwx/bin/pxctl --version
  • Stork 2.3.3 or later.

    To check the Stork version, run the following commands:

    STORK_POD=$(oc get pods -n kube-system -l name=stork -o jsonpath='{.items[0].metadata.name}')
    oc exec -it $STORK_POD -n kube-system -- /storkctl/linux/storkctl version
Note: You can create snapshots only of the Cloud Pak for Data instance project (namespace), like zen. You cannot create snapshots of Cloud Pak for Data foundational services or operators projects (for example, Cloud Pak for Data common core services).

About this task

The cpd-cli backup-restore command-line interface creates a snapshot of the Portworx PVCs in your system at a particular moment in time. The interface backs up and restores volume data in the same project and installation, and assumes that Kubernetes objects are still in place.

Important: Backing up persistent volumes alone is not sufficient for disaster recovery purposes because Kubernetes objects like secrets are needed along with volume data to restore applications in a project.

The commands in the following section use an example Cloud Pak for Data project named zen.

For a list of all possible options, enter the command ./cpd-cli backup-restore -h.

Procedure

  1. If you are using one of the following services, stop all active runtimes and jobs.
    1. Before starting the backup, confirm that you are logged in as cluster administrator.
    2. If Jupyter Notebooks with Python 3.9 for GPU, Jupyter Notebooks with Python 3.9, or Jupyter Notebooks with R 3.6 is installed, run the following commands:
      oc get pod -l app.kubernetes.io/name=ibm-cpd-ws-runtimes,runtime=true -o name | xargs --no-run-if-empty -I{} oc exec {} -- /opt/ibm/ws/bin/shutdown.sh
      oc get pod -l app.kubernetes.io/name=ibm-cpd-ws-runtimes,job=true -o name | xargs --no-run-if-empty -I{} oc exec {} -- /opt/ibm/ws/bin/shutdown.sh
    3. If RStudio® Server with R 3.6 is installed, run the following commands:
      oc delete $(oc get deployment -l type=rstudio -o name)
      oc delete $(oc get svc -l type=rstudio -o name)
      oc delete $(oc get job -l type=rstudio -o name)
      oc delete $(oc get secrets -l type=rstudio -o name)
      oc delete $(oc get cronjobs -l type=rstudio -o name)

      To check if there are any RStudio Server with R 3.6 pods that are still running, run the following command:

      oc get pod -l type=rstudio

      When there are no pods running, no output is produced for this command.

    4. If SPSS® Modeler is installed, run the following commands:
      oc delete $(oc get deployment -l type=spss-modeler -o name)
      oc delete $(oc get svc -l type=spss-modeler -o name)
      oc delete $(oc get secret -l type=spss-modeler -o name)

      To check if there are any SPSS Modeler pods that are still running, run the following command:

      oc get pod -l type=spss-modeler

      When there are no pods running, no output is produced for this command.

    5. If Data Refinery is installed, to avoid any unnecessary data loss, it is recommended that you stop all runtimes and jobs. Run the following commands:
      oc delete $(oc get deployment -l type=shaper -o name)
      oc delete $(oc get svc -l type=shaper -o name)
      oc delete $(oc get job -l type=shaper -o name)
      oc delete $(oc get secrets -l type=shaper -o name)
      oc delete $(oc get cronjobs -l type=shaper -o name)
      oc scale --replicas=0 deploy wdp-shaper wdp-dataprep
    6. If Watson™ Machine Learning is installed, disable scheduled jobs and (cancel/wait to finish) existing starting/running jobs. Run the following command, replacing <service_namespace> with the name of the Cloud Pak for Data project (namespace) and <timeout_in_seconds> with the timeout duration for jobs to complete before terminating them:
      oc -n <service_namespace> get pods -l app=wml-deployment-manager -o name | xargs -I{} oc -n <service_namespace> exec {} -- bash -c "/opt/ibm/wml-online-scoring/runtime-manager/bin/startQuiesce.sh <timeout_in_seconds>"
  2. Create a local volume snapshot, specifying the snapshot name.
    Note: The snapshot name must consist of lowercase alphanumeric characters or the hyphen (-), and must start and end with an alphanumeric character. The underscore character (_) is not supported.
    cpd-cli backup-restore snapshot create <snapshot_name> -n zen
  3. To check the status of a snapshot, run the following command:
    cpd-cli backup-restore snapshot status <snapshot_name> -n zen 
  4. To view a list of existing snapshots, run the following command:
    cpd-cli backup-restore snapshot list -n zen
  5. If you stopped all Data Refinery runtimes and jobs in step 1, restart the service by running the following command.

    The value of <number_of_replica> depends on the scaleConfig setting when Data Refinery was installed (1 for small, 3 for medium, and 4 for large).

    oc scale --replicas=<number_of_replica> deploy wdp-shaper wdp-dataprep