Backup & restore configuration parameters
Configuration parameters allow customization of IBM Fusion Backup & restore settings.
Change defaults for IBM Fusion Backup & restore agent
- deleteBackupWait
- Timeout for restic command to delete backup in S3 storage. Set in minutes. The default value is 20 minutes, and the allowed range is 10 to 120.
- pvcSnapshotMaxParallel
- Number of threads available to take concurrent snapshots. The default value is 20.
- backupDatamoverTimeout
- Maximum amount of time in minutes for the datamover to complete backup. The default value is 1200 seconds, and the allowed range is 10 to 14400. After you modify backupDatamoverTimeout, update cancelJobAfter.
- restoreDatamoverTimeout
- Maximum amount of time in minutes for datamover to complete restore. The default value is 1200 seconds, and the allowed range is 10 to 14400. After you modify restoreDatamoverTimeout, update cancelJobAfter.
- snapshotRestoreJobTimeLimit
- This parameter is not used.
- pvcSnapshotRestoreTimeout
- Timeout for creating PVC from snapshot in minutes. The default value is 15 minutes.
- kafka-thread-size
- The number of processing threads in the transaction manager. The default value is 10.
- snapshotTimeout
- Timeout for snapshot to resolve to the ready state in minutes. The default value is 20, and the allowed range is 10 to 120.
- datamoverJobpodEphemeralStorageLimit
- Datamover pod ephemeral storage limit. The default value is 2000Mi.
- datamoverJobPodDataMinGB
- Minimum PVC capacity for each datamover pod before a new datamover pod is started. It is set in GB, and the default value is 10 GB.
- datamoverJobpodMemoryLimit
- It is the Datamover pod memory limit, and the default value is 15000Mi.
- datamoverJobpodCPULimit
- It is the Datamover pod CPU limit, and the default value is 2.
- cancelJobAfter
- If you modify backupDatamoverTimeout or restoreDatamoverTimeout, update the job-manager deployment configuration parameter cancelJobAfter. It is the maximum amount of time in milliseconds that the job-manager waits before it cancels the long-running job. The default value is 3600000 (1 hour).
- MaxNumJobPods
- Implement configurable MaxNumJobPods for datamovers.
This field helps to control the number of PersistentVolumeClaims that is attached to datamover pods during backup and restore to the
BackupStorageLocation
. It is set on a per install basis. If you have spoke clusters installed in your IBM Fusion installation, then set on each spoke cluster individually. It is not a global field that applies to all clusters in the installation.It helps to distribute the storage load across multiple nodes of the cluster when available. Some StorageClasses impose a maximum number of PVCs that can be attached to an individual node of the cluster. This field helps to manage this StorageClass limitation. To find out whether your StorageClass has this limitation, check whether the CSINode of your storage provider has the spec.drivers[].allocatable.count field set. The VPC Block on IBM Cloud is one such storage provider with this limitation, typically 10 per node. If you increase the number of pods and the application has more than 30 PVCs, it decreases the number of PVCs attached to each node. If it goes below this, you can use the default, which is more than sufficient.
This field can increase or decrease the maximum number of datamover pods that are assigned in each backup or restore. More pods use more resources such as CPU and memory, and can help improve performance for backups and restores with larger numbers of PVCs. Increasing this value may help if the number of PVCs to be backed up or restored at the same time is more than 30.
This field does not guarantee the creation of more datamover pods. A number of heuristics are used at runtime to help determine the assignment of PVCs to datamovers, including total PVC capacity, number of PVCs, amount of data transferred during previous backups, total number of PVCs handled, storage providers involved, among others. This field changes the maximum allowed, and it does not guarantee the specified number of datamovers.
For more information about how to change the value, see Change defaults for IBM Fusion Backup & restore agent.
- DeleteBackupRequest CR cleanup
- The
DeleteBackupRequest
CRs in the Completed state gets automatically deleted after a default retention of 14 days. You can set thedbrCRRetention
configuration parameter in ConfigMap guardian-configmap to change the default. By default, the cleanup thread runs every 2 hours and cleans up a maximum of 100DeleteBackupRequest
CRs in one run. These two values can be overridden by specifyingdbrCleanupCheckInterval
anddbrCleanupBatchSize
configuration parameters in theguardian-configmap
ConfigMap.
- dbrCRRetention
- Number of days after which
DeleteBackupRequest
CRs in the Completed state get deleted. If not specified, then 14 days is the default.
- dbrCleanupCheckInterval
- Interval, in hours, at which the
DeleteBackupRequest
cleanup thread runs. If not specified, 2 hours is the default.
- dbrCleanupBatchSize
- Number of
DeleteBackupRequest
CRs to cleanup in one run of the cleanup thread. If not specified, 100 is the default.
Change defaults for IBM Fusion Backup & restore agent
- In the OpenShift® Container Platform console, click .
- Change the project to the IBM Spectrum Fusion Backup and Restore namespace. For example
ibm-backup-restore
. - Click to open IBM Fusion Backup & Restore Agent.
- Click the Data Protection Agent tab and click the
dpagent install. Alternatively, if you want to use the OC command:
oc edit -n ibm-backup-restore dataprotectionagent
- Go to the YAML tab.
- Edit spec.transactionManager.datamoverJobPodCountLimit. The value must be numeric and in quotes. For example, '3', '5', '10'
Backup and restore large number of files
- Prevent the transaction manager from failing long running backup jobs. In the
ibm-backup-restore
project, edit the config map namedguardian-configmap
. Look forbackupDatamoverTimeout
. This value is in minutes, and the default is 20 minutes. For example, increase this value to 8 hours (480). - Prevent the job manager from canceling long running jobs. In the
ibm-backup-restore
project, edit thejob-manager
deployment. Underenv
, look forcancelJobAfter
. This value is in milliseconds, and the default is 1 hour. For example, increase this value to 20 hours (72000000). - Prevent the transaction manager from failing long running restore jobs. In the
ibm-backup-restore
project, edit the config map namedguardian-configmap
. Look forrestoreDatamoverTimeout
. This value is in minutes, and the default is 20 minutes. For example, increase this value to 20 hours (1200). - In the same config map, increase the amount of ephemeral storage the data mover is allowed to use by increasing datamoverJobpodEphemeralStorageLimit to 4000Mi or more.
- OpenShift
Data Foundation parameters.
- Increase the resources available to OpenShift Data Foundation. Increase the limits and requests for the two MDS pods, a and b, for example to 2 CPU and 32 Gi memory. For more information about the changes, see Changing resources for the OpenShift Data Foundation components.
- Prevent SELinux relabelling
At restore time, OpenShift will attempt to relabel each of the files. If it takes too long, the restored pod will fail with CreateContainerError. This article explains the situation and some of the possible workarounds to prevent the relabeling:https://access.redhat.com/solutions/6221251.
Additional parameters are available when you backup and restore large number of files that are located on CephFS. The optimal values depend on your individual environment, but the values in this example represent backup and restore of a million files. For such large number of files, you must be on OpenShift Container Platform 4.14 or later, and Data Foundation 4.12 or later.
- In the same config map, increase the amount of ephemeral storage the data mover is allowed to
use. Increase
datamoverJobpodEphemeralStorageLimit
to 4000Mi or more.
- Increase the resources available to Red Hat OpenShift Data Foundation. Increase the limits and requests for the two MDS pods, for example, to 2 cpu and 32 Gi memory. For the procedure to change, see Changing resources for the OpenShift Data Foundation components.
- Set up to Prevent SELinux relabeling:
At restore time, OpenShift attempts to relabel each of the files. If it takes too long, the restored pod fail with
CreateContainerError
. This article explains the situation and some of the possible workarounds to prevent the relabeling: https://access.redhat.com/solutions/6221251.
- Run the following command to check whether the MDS pods are restarting:
oc get pod -n openshift-storage |grep mds
- If they are restart, check the termination reason:
- Describe the pod.
- Check whether the termination is
OOM Kill
.
- Run the following command to check the memory usage by the MDS pods and monitor for memory
usage:
oc adm top pod -n openshift-storage
- If the memory usage keeps spiking until the pod restarts, then see Changing resources for the OpenShift Data Foundation components.
Configure DataMover type for Backup and Restore
kopia
is the only allowed value. - The default value is
kopia
. - The
legacy
DataMover for versions equal to or lower than 2.8
- Change the DataMover type used across all new backups
- From the OpenShift console:
- Log in to OpenShift Console.
- Go to .
- Select the IBM Fusion install namespace.
- Open
isf-data-protection-config
and updatedata.Datamover
field.
Example command:
Here, replace namespaceoc patch configmap -n ibm-spectrum-fusion-ns isf-data-protection-config -p '{"data":{"DataMover":"kopia"}}'
ibm-spectrum-fusion-ns
with your Fusion namespace. Also, this example showskopia
DataMover. Forlegacy
DataMover, replacekopia
tolegacy
.
- Choose DataMover during policy assignment
- To facilitate a trial of the DataMover options, select it by annotating PolicyAssignment
objects. Each PolicyAssignment object associates an existing Policy and BackupStorageLocation with
an application to backup. By setting this value according to the following instructions, the global
setting gets ignored. Consequently, all new backups associated with the PolicyAssignment use the
specified DataMover type in the annotation.
- Search by label to find the existing PolicyAssignment for your applications.
- To search by application -
dp.isf.ibm.com/application-name=<\application name>
- To search by policy -
dp.isf.ibm.com/backuppolicy-name=<\policy name>
- To search by backpropagation -
dp.isf.ibm.com/backupstoragelocation-name=<\backupstoragelocation name>
The general naming format is as follows:
Example:<application name>-<\policy name>-<cluster url>
aws-20240716-220437-awsdaily-apps.bnr-hcp-munch.apps.blazehub01.mydomain.com
- To search by application -
- Change the value of DataMover. From the OpenShift console:
- Go to .
- Select the IBM Fusion install namespace.
- Open IBM Fusion.
- Go to the Policy Assignment tab and change the DataMover value.
Using commands:- Get the policy assignment object:
oc get policyassignment -n ibm-spectrum-fusion-ns
- Add the annotation to the PolicyAssignment
object.
dp.isf.ibm.com/dataMover: <datamover type>
Example:
dp.isf.ibm.com/dataMover: legacy
- Search by label to find the existing PolicyAssignment for your applications.
Choose nodes for DataMovers for OADP DataMover
- spec.datamoverConfiguration
- The field
spec.datamoverConfiguration
is used to configure the new OADP DataMovers. For backwards compatibility, you can continue to use the fields underspec.transactionManager
to configure the DataMovers from the previous releases.
- spec.datamoverConfiguration.nodeAgentConfig.env
- This array allows users to specify environment variables inserted into the node-agent pods. The
example value of HOME allows setting the folder used by kopia for local files.It controls the on-container folder where cache files related to the storage repository are stored. The default value is /home/velero and it uses local ephemeral storage.
spec: datamoverConfiguration: nodeAgentConfig: env: - name: HOME value: /home/velero
- spec.datamoverConfiguration.nodeAgentConfig.labels
- This field adds additional labels to the node-agent daemonset and pods. It is an optional field
to organize pods and does not affect backup and restore behavior.This field is a YAML
object.
spec: datamoverConfiguration: nodeAgentConfig: labels: cloudpakbackup: datamover
- spec.datamoverConfiguration.nodeAgentConfig.tolerations
- This is an array field that specifies the conditions for nodes that permit the execution of Pods
in resource-constrained environments or multi-architecture clusters. For more information about this
field, see Kubernetes documentation.
spec: datamoverConfiguration: nodeAgentConfig: tolerations: - key: "kubernetes.io/arch" operator: "Equal" effect: "amd64"
- spec.datamoverConfiguration.nodeAgentConfig.resourceAllocations
- This field defines the resource allocation for the node-agent controller, including both
scheduling requirements and limits. If backups or restores are causing evictions in the node-agent pods, increase the value of this field. Go through the Events in the Backup and Restore install namespace to check resource violations:
- The status field of the Daemonset node-agent
- The OpenShift Dashboard for resource monitoring in the Backup and Restore namespace
- Grafana in the Backup and Restore namespace if installed
- Other applications that use Prometheus metrics.
spec: datamoverConfiguration: nodeAgentConfig: resourceAllocations: limits: cpu: "2" ephemeral-storage: "50Mi" memory: "2048Mi" requests: cpu: "200m" ephemeral-storage: "25Mi" memory: "256Mi"
- spec.datamoverConfiguration.nodeAgentConfig.nodeSelector
- This field decides the nodes where the DaemonSet node agent must run. This feature is a crucial
security feature to restrict the HostPath mount of the node-agent that exposes the following values:
- All PersistentVolumeClaims
- Projected API volumes such as configmaps and secrets through volume mount
- Kubernetes API tokens that are in use on the node
Note: As this field causes Restore failures, remove it before restore operations.If needed, this field can isolate the Daemonset node-agent Pods to Nodes without any security concerns.
The pods are assigned to Nodes that match the labels attached to the Nodes. If more than one label is used, the labels are treated in a logical AND. Only Nodes that match all of the labels runs the Pod.
For example, if you set the nodeSelector to "kubernetes.io/hostname: bnr-hcp-munch-6df6702b-2bv7z", then the Daemonset node-agent Pods run only the node with this label.If there is only one node with this label, Daemonset node-agent is restricted to deploying Pods to only the labeled node.
spec: datamoverConfiguration: nodeAgentConfig: nodeSelector: kubernetes.io/hostname: bnr-hcp-munch-6df6702b-2bv7z
- spec.datamoverConfiguration.datamoverPodConfig
- This setting manages the PVCs, resource allocations, and the location within the cluster where the datamover pods operate. The LoadConcurrency provides the number and LoadAffinity provides the location.
- spec.datamoverConfiguration.datamoverPodConfig.loadConcurrency
- This field controls the maximum number of datamovers assigned to nodes in the cluster.
If there is anything that appears to be in conflict, see Velero documentation
This setting controls the number of datamovers deployed per node-agent controller pod, and it consists of two parts:spec.datamoverConfiguration.datamoverPodConfig.loadConcurrency.globalConfig
This sets the maximum number of datamovers deployable from a single node-agent controller pod for a generic node in the cluster. The subsequent field overrides globalConfig and manages this setting independently. It is of type integer with default value of five and minimum value of 1. If loadConcurrency is added to the config, this value is REQUIRED.
spec.datamoverConfiguration.datamoverPodConfig.loadConcurrency.perNodeConfig
is optional.This allows setting a maximum number of datamovers assigned to a node. Example:
The perNodeConfig uses the default nodeSelector elements from Kubernetes to select nodes by using labels. When multiple labels are specified in a matchLabels, the selected nodes are found using an AND operation, meaning all the labels must be found. These also take a more complex matchExpressions field as an alternative to nodeSelector. For more information about assigning pods to nodes, see Kubernetes documentation.loadConcurrency: globalConfig: 2 perNodeConfig: - nodeSelector: matchLabels: kubernetes.io/hostname: node1 number: 3 - nodeSelector: matchLabels: beta.kubernetes.io/instance-type: Standard_B4ms number: 5
In this example, the node with a label
kubernetes.io/hostname=node1
can run up to 3 DataMovers at once. And a node with labelbeta.kubernetes.io/instance-type=Standard_B4ms
can run up to 5 DataMovers. With the globalConfig set to 2, all other nodes with matching storage to the PVC may run up to 2 DataMovers at the same time.
- spec.datamoverConfiguration.datamoverConfiguration.datamoverPodConfig.LoadAffinity
- This field controls the nodes in the cluster where the DataMovers must run. For more
information, see Velero reference documentation.
loadAffinity: - nodeSelector: matchLabels: beta.kubernetes.io/instance-type: Standard_B4ms matchExpressions: - key: kubernetes.io/hostname values: - node-1 - node-2 - node-3 operator: In - key: xxx/critial-workload operator: DoesNotExist
The value is a list of
nodeSelectors
. The selected nodes are controlled through a nodeSelector field that matches the labels. Each member of the list is evaluated independently. The nodes that match the element are combined to create the final list where the DataMovers can run. If theloadConcurrency.globalConfig
is set to 0, then only the selected nodes inloadAffinity
andperNodeConfig
can run the backup and restore jobs.Ensure that you meet the following criteria.- The intersection of nodes matching loadConcurrency.perNodeConfig and loadConcurrency.loadAffinity is where the jobs run. Make sure the size of the intersection is at least one node.
- If the storage used during the backup or restore process is only available on a subset of nodes
and
loadConcurrency.globalConfig
is set to 0, you must select at least one node where the storage is accessible. Otherwise, the job fails. - If the
loadConcurrency.globalConfig
is set to 0 and the selected nodes do not have enough resources in CPU and memory to schedule a DataMover pod, then the backup or restore job fails.
- spec.datamoverConfiguration.datamoverPodConfig.podResources
- Controls the amount of CPU and memory that is made available by the DataMovers. It has the
following sub-fields and default values are set for the missing fields.
Example:
podResources: cpuRequest: 2 memoryRequest: 4Gi ephemeralStorageRequest: 5Gi cpuLimit: 4 memoryLimit: 16Gi ephemeralStorageLimit: 5Gi
For more information about the default Velero resource profile for large clusters, see CPU and memory requirements in Red Hat documentation.
Resource limits must be sufficient to back up the most resource-intensive volumes in the cluster.
- spec.datamoverConfiguration.datamoverPodConfig.backupPVC
- This section defines the format for the PersistentVolumeClaim used in the backup process. See .
Velero documentation.
During backup, each execution of a volumegroup in a Recipe (default Recipe is all PersistentVolumeClaims), a VolumeSnapshot gets created from a PersistentVolumeClaim representing the volume state at the time of the VolumeSnapshot timestamp. Some storage systems, such as CephFS from Data Foundation and IBM Storage Scale, implement shallow-copy backup volume exposure without file copy to address performance concerns compared to regular volume restores.
By default, the agent configures CephFS, IBM Storage Scale, and NFS PersistentVolumeClaims to be created in ReadOnlyMany mode during backups.
Setting values in
backupPVC
allows you to configure PersistentVolumeClaims from any StorageClass to use their "shallow-copy" implementation. This way, IBM Fusion does not need the specifics, and you can leverage alternative StorageClasses to access different storage features.Example:backupPVC: storage-class-1: storageClass: backupPVC-storage-class readOnly: true storage-class-2: storageClass: backupPVC-storage-class storage-class-3: readOnly: true
In this example, a data transfer of PersistentVolumeClaims of StorageClass "storage-class-1" makes use of an alternative StorageClass "backupPVC-storage-class" to create the volume with
accessModes: ReadOnlyMany
. The storage-class-2 is same as storage-class-1 except that theaccessModes
remains unchanged.In storage-class-3, no alternate StorageClass can be used, and the volume gets created with
accessModes: ReadOnlyMany
. All DataMovers mount the backup volume as ReadOnly, and this affects the behavior of storage on volume creation. The original applicationPersistentVolumeClaim
is not modified. Only thePersistentVolumeClaim
used for the backup is modified.If the StorageClass belongs to storage systems CephFS or IBM Storage Scale, the setting overwrites the agent's internal behavior with the one configured.
Not all storage systems support
PersistentVolumeClaims
withaccessModes: ReadOnlyMany
. See your appropriate storage documentation. The OpenShift does not provide a way to check whataccessModes
are supported prior toPersistentVolumeClaim
creation.To use PVC storage as a local cache for kopia datamovers:- Update ConfigMap node-agent-config.
kind: ConfigMap apiVersion: v1 metadata: data: node-agent-config: '{"loadConcurrency": {"globalConfig": 6}, "podResources": {"cpuRequest": "500m", "cpuLimit": "4", "memoryRequest": "500Mi", "memoryLimit": "4Gi", "ephemeralStorageRequest": "5Gi", "ephemeralStorageLimit": "5Gi"}, "backupPVC": {"ibm-spectrum-fusion-mgmt-sc": {"storageClass": "ibm-spectrum-fusion-mgmt-sc", "readOnly": true, "spcNoRelabeling": true}, "ocs-storagecluster-cephfs": {"storageClass": "ocs-storagecluster-cephfs", "readOnly": true, "spcNoRelabeling": true}}}'
- Add the following field highlighted with ** ***:
Where thekind: ConfigMap apiVersion: v1 metadata: data: node-agent-config: '{**"storage":{"storageClassName":"<storage class to use for PVC>","size":"<size of PVC>"}**,"loadConcurrency":{"globalConfig":6},"podResources":{"cpuRequest":"500m","cpuLimit":"4","memoryRequest":"500Mi","memoryLimit":"4Gi","ephemeralStorageRequest":"5Gi","ephemeralStorageLimit":"5Gi"},"backupPVC":{"ibm-spectrum-fusion-mgmt-sc":{"storageClass":"ibm-spectrum-fusion-mgmt-sc","readOnly":true,"spcNoRelabeling":true},"ocs-storagecluster-cephfs":{"storageClass":"ocs-storagecluster-cephfs","readOnly":true,"spcNoRelabeling":true}}}'
storageClassName
is theStorageClass
to allocate the storage from. Example:ibm-spectrum-fusion-mgmt-sc
. The size is the Kubernetes resource size field. Examples: "500Mi" "5Gi" "10Gi" "20Gi" "50Gi". - Save the ConfigMap.
- Restart the pods of
daemonset node-agent
. - Replace the namespace of
ibm-backup-restore
with the Fusion Backup & Restore install namespace.oc delete pod -n ibm-backup-restore -l name=node-agent
- Update ConfigMap node-agent-config.
- spec.datamoverConfiguration.datamoverPodConfig.maintenanceConfig
- Kopia requires periodic maintenance jobs on the backup repository. See Velero
documentation.
For each namespace, a Kopia repository is created and maintained. If a cloud object storage repository is used, automatic object expiration must not be used as it triggers repository corruption. The maintenance tasks enhance efficiency and reduce resource usage in future backup and restore jobs, typically lasting less than a minute. Their resource usage is similar to backup expiration tasks. These maintenance jobs run by default every 4 hours. The maintenance frequency can be adjusted by edited the
OADP DataProtectionAgents
under the fieldspec.configuration.velero.args.default-repo-maintain-frequency
. These are the "Quick Maintenance Tasks" described by Kopia. See https://kopia.io/docs/advanced/maintenance/The "Full Maintenance Tasks" are run during backup expiration. The only available configuration item through the
DataProtectionAgent
object is the pod resources used during maintenance jobs.
- spec.datamoverConfiguration.datamoverPodConfig.maintenanceConfig.resourceAllocations
- This field is set using the Kubernetes
resourceAllocations
object format. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/The following example is also the default values. Missing values such asrequests.cpu
are set to unlimited.
The default is 5Gi.datamoverConfiguration: datamoverPodConfig: podResources: cpuLimit: '4' cpuRequest: '2' memoryLimit: 16Gi memoryRequest: 4Gi ephemeralStorageRequest: 5Gi ephemeralStorageLimit: 5Gi
- spec.datamoverConfiguration.maintenanceConfig.jobSettings.ttl
- This adds a Time To Live to Velero maintenance jobs. Velero defaults to keeping a record of 3
previously completed maintenance jobs. IBM Fusion
changes this default value to 1. When you add this field, the Velero maintenance jobs are removed
within seconds after completion.
If you do not set this field and you havedatamoverConfiguration maintenanceConfig: jobSettings: ttl: 240 podResources: cpuLimit: '4' cpuRequest: '2' memoryLimit: '4Gi' memoryRequest: '500Mi' ephemeralStorageLimit: '5Gi' ephemeralStorageRequest: '5Gi'
spec.datamoverConfiguration.maintenanceConfig.storage
, then it is automatically set to 300. For more information, see https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/.
- spec.datamoverConfiguration.maintenanceConfig.storage
- This section enables you to use a PersistentVolumeClaim to act as a cache during Velero
maintenance jobs. Most usage of ephemeral-storage during maintenance jobs originates from the kopia
datamover cache. The addition of this field with a sufficiently sized PersistentVolumeClaim to
handle the required cached data eliminates the aforementioned ephemeral-storage usage.
maintenanceConfig: jobSettings: ttl: 240 storage: storageClassName: "ocs-external-storagecluster-ceph-rbd" size: 25Gi podResources: cpuLimit: '4' cpuRequest: '2' memoryLimit: '4Gi' memoryRequest: '500Mi' ephemeralStorageLimit: '5Gi' ephemeralStorageRequest: '5Gi'
If storage is configured, then you need the following fields:storageClassName
The name of the
StorageClass
to be used with thePersistentVolumeClaim
. TheStorageClass
must supportaccessMode: ReadWriteOnce
.size
The size of the volume to be created. This field uses the existing Kubernetes resources request format such as "5Gi" and "250Mi".
The size of the created PVC is to be used by each data mover pod.
During maintenance jobs, a PersistentVolumeClaim gets created. It is deleted when the job is removed based on the
spec.datamoverConfiguration.maintenanceConfig.jobSettings.ttl
field. Ifspec.datamoverConfiguration.maintenanceConfig.jobSettings.ttl
is not specified and the storage configuration exists, then the ttl gets set to 300 (five minutes).
- Advanced: Per Namespace Configuration
- The maintenance config values mentioned previously are applied globally. These fields can also
be applied manually by editing the appropriate
maintenance-config
ConfigMap. It allows overrides by namespace in case of resource usage differences.The format is the same as
DataProtectionAgent spec.datamoverConfiguration.maintenanceConfiguration
in JSON instead of YAML.The configuration is applied using a JSON format where the key is the repository. If a value is set for a repository, the value overrides the global values. For instance, if the global values are as described in the following example:kind: ConfigMap apiVersion: v1 metadata: name: maintenance-config namespace: ibm-backup-restore data: global: '{"jobSettings":{"ttl":300},"podResources":{"cpuRequest":"500m","cpuLimit":"4","memoryRequest":"500Mi","memoryLimit":"4Gi","ephemeralStorageLimit":"5Gi","ephemeralStorageRequest":"5Gi"}, "storage":{"storageClassName":"ocs-external-storagecluster-ceph-rbd","size":"20Gi"}}'
Additional keys are in the<namespacename>-<bsl name>-kopia
format:
It ensures that the maintenance jobs related to data from the zen namespace connected to BSL S3 use a 25Gi PersistentVolumeClaim cache and do not set resource requests or limits on ephemeral storage.kind: ConfigMap apiVersion: v1 metadata: name: maintenance-config namespace: ibm-backup-restore data: global: '{"jobSettings":{"ttl":300},"podResources":{"cpuRequest":"500m","cpuLimit":"4","memoryRequest":"500Mi","memoryLimit":"4Gi","ephemeralStorageLimit":"5Gi","ephemeralStorageRequest":"5Gi"}}' zen-s3-kopia: '{"jobSettings":{"ttl":300},"podResources":{"cpuRequest":"500m","cpuLimit":"4","memoryRequest":"500Mi","memoryLimit":"4Gi"}, "storage":{"storageClassName":"ocs-external-storagecluster-ceph-rbd","size":"25Gi"}}'
- node-agent-config
- To configure the data mover pods to use PVCs as cache in place of ephemeral storage, add the
following field into the node-agent-config config map:
To configure the data mover pods to use PVCs as cache in place of ephemeral storage.storage":{"storageClassName":"ocs-external-storagecluster-ceph-rbd","size":"25Gi"}
Example value of the node-agent-config:{"loadConcurrency": {"globalConfig": 5}, "podResources": {"cpuRequest": "2", "cpuLimit": "4", "memoryRequest": "4Gi", "memoryLimit": "16Gi", "ephemeralStorageRequest": "5Gi", "ephemeralStorageLimit": "5Gi"}, "storage":{"storageClassName":"ocs-external-storagecluster-ceph-rbd","size":"25Gi"}, "backupPVC": {"ocs-storagecluster-cephfs": {"storageClass": "ocs-storagecluster-cephfs", "readOnly": true, "spcNoRelabeling": true}}}
Latest permissible start time for the backup process
You can set the latest permissible start time for a scheduled backup. This setting is useful whenever multiple backups use the same policy or schedule and the agent gets overloaded frequently with jobs as they all start at the same time. Without the windowEndTime setting, jobs have, by default, up to one hour for the agent to start processing before they are considered hung and subsequently cancelled. You can manually spawn additional agent replicas to handle the spike in job activity and process the backlog of jobs sooner, but these replicas remain idle for the remainder of the day. With the windowEndTime option, the job can remain queued up for processing by the agent for a custom period of time before getting cancelled. This allows for a single agent replica to be fully utilized for a longer period of time throughout the day without the need to manually create multiple policies or schedules.
Set the windowEndTime field in the BackupPolicy
CR in the
24 hour "HH:MM" format.
spec:
backupStorageLocation: kn-aws
provider: isf-backup-restore
retention:
number: 5
unit: days
schedule:
cron: `45 14 * * * '
timezone: America/Los_Angeles
windowEndTime: `17:45'
BackupPolicy
CRs that
have a specific start time (not just a repeat interval), if the windowEndTime
is not explicitly specified, a default end time gets set to 4 hours after the scheduled start time.
Existing BackupPolicy
CRs maintain their original behavior without
windowEndTime, resulting in a one-hour limit before they are considered hung
and subsequently cancelled. You can edit the existing Policy CRs to add the
windowEndTime option.