Kubernetes Resource Protection

Kubernetes applications rely on Kubernetes API resources to function within a cluster. For example, deployments can specify the program components of an application, ConfigMap modifies how a user wants to run an application, and custom resources control the overall operation of the application through application operators. Protecting Kubernetes resources against data loss and disasters in an application-independent manner simplifies Kubernetes application development for applications, requiring those backups and restore services.

Selective protection and recovery of Kubernetes resources for disaster recovery

To minimize recovery time, an application can pre-deploy some of its Kubernetes resources on the recovery cluster. Such an application wants a method to avoid duplicate recovery of its Kubernetes resources to ensure the benefits of pre-deployment. Some Kubernetes resources are created dynamically by Kubernetes itself and an application does not need to preserve the history of those resources so that the protection is not needed. Kubernetes events are a prime example of such resources. In the case of events, applications might need a method to prevent protection and recovery. Hence, the RamenDR Recipe custom resource for disaster recovery provides a flexible mechanism to support these examples and has generalized the mechanism for other use cases.

The general technique of selective protection and recovery mechanism is to filter Kubernetes resources by its kind and label. Resources can be selectively protected and recovered by its kind. It uses an include and exclude mechanism that is provided within the VRG. In addition to include and exclude, the standard Kubernetes label selector mechanism is used to protect specific resources.

Kubernetes resource protection and recovery order

Kubernetes applications must be designed to run on a specified system with a desired state, and the desired state is continuously attempted, achieved, and maintained over time. It is up to the application to deal with asynchronous behavior that is required by this set and attempt, achieve, maintain architecture. However, this level of asynchrony cannot be maintained in highly complex applications with stringent user expectations. First, an asynchronous scope breaks the ability of programmers to predict all sequences of events. Second, asynchronous scope can be a source of failed dependencies. It results in back-off retry loops that can violate the application's Recovery Time Objective (RTO). Restoring resources in a prescribed order can avoid both of these problems. So the Ramen VRG provides a mechanism to support capturing and restoring resources in a prescribed order.

An example of Kubernetes resource protection specification


apiVersion: ramendr.openshift.io/v1alpha1
kind: Recipe
metadata:
  name: recipe-sample
  namespace: my-app-ns
spec:
  appType: demo-app  # required, but not currently used
- name: volumes
  type: volume
  labelSelector: app=my-app
- name: config
  backupRef: config
  type: resource
  includedResourceTypes:
  - configmap
  - secret
- name: deployments
  backupRef: deployments
  type: resource
  includedResourceTypes:
  - deployment
- name: instance-resources
  backupRef: instance-resources
  type: resource
  excludedResourceTypes:
  - configmap
  - secret
  - deployment
hooks:
- name: service-hooks
  namespace: my-app-ns
  labelSelector: shouldRunHook=true
  ops:
  - name: pre-backup
    container: main
    command: ["/scripts/pre_backup.sh"]  # must exist in 'main' container
    timeout: 1800
  - name: post-restore
    container: main
    command: ["/scripts/post_restore.sh"]  # must exist in 'main' container
    timeout: 3600
workflows:
- name: capture  # referenced in VRG
  sequence:
  - group: config
  - group: deployments
  - hook: service-hooks/pre-backup
  - group: instance-resources
- name: recover  # referenced in VRG
  sequence:
  - group: config
  - group: deployments
  - group: instance-resources
  - hook: service-hooks/post-restore
VRG sample that uses this Recipe

apiVersion: ramendr.openshift.io/v1alpha1
kind: VolumeReplicationGroup
metadata:
  name: vrg-sample
  namespace: my-app-ns  # same Namespace as Recipe
spec:
  kubeObjectProtection:
    recipeRef:
      name: recipe-sample
      captureWorkflowName: capture
      recoverWorkflowName: recover
      volumeGroupName: volumes

Explanation of the capture and recovery specifications in VRG

The scope of a VRG disaster protection is a single Kubernetes namespace. The VRG protects persistent volumes that are associated with the namespace and optionally protects Kubernetes resources in the namespace. This documentation contains the product previews for protecting Kubernetes resources so does not explain the persistent volume disaster protection.

VRG allows Kubernetes resources to be captured (backed up) and restored as part of disaster recovery. This is achieved through the RamenDR recipe. If a RamenDR recipe specification is not included in VRG, Kubernetes resources are not protected as part of VRG disaster protection.

The RamenDR has two workflows, namely capture and recover. The capture order workflow provides instructions on how to capture a namespace Kubernetes resource. The recover workflow provides instructions on how to recover a namespaces Kubernetes resource after a disaster. This indicates that the order of capture and recovery workflow can be different and need not include the same resources. Each capture and recover workflow contains a list of resource instructions. Each item in the list is acted upon, even if it duplicates the work that is done by other items in the list. So care must be taken to avoid duplication in backup or recover lists so that best RPO and RTO are achieved.

The list items must meet the following requirements:
Note: If the requirements are not met, then the operations of the lists are undefined.
  1. The name of each item in the captureOrder list must be unique.
  2. The backupName of each item in the recoverOrder list much matches a name in the recoverOrder list.
  3. A labelSelector in a list item only applies to that item in the list.
  4. If a list item contains multiple labelSelectors, then any resource that matches either label selector is operated upon.
  5. IncludeClusterResources in a list item apply only to that item in the list.
  6. Each list item can contain either an includedResources section or an excludedResources section, but not both.