Backup & restore hub performance and scaling

Use this topic to understand the number of spokes you can scale up to and the number of spoke clusters that can be connected to a single hub. It outlines the capacities and the scalability considerations to ensure efficient backup and restore operations.

The Backup & restore hub service is efficiently designed to handle large-scale backup and restore concurrent jobs across clusters. With proven scalability and performance. The system is tested successfully to handle up to 1000 concurrent jobs. It serves as a reference point and not a limitation; the hub can scale further with appropriate resource allocation.

Sizing blueprints for varying workloads

The following table provides the recommended configurations for each required service when handling different numbers of concurrent jobs. As the number of concurrent jobs increases, you can scale out (increase the replicas) or scale up (increase the resource limits) for the required services as outlined:

Table 1. Sizing blueprints
Clusters Concurrent Jobs Applicationsvc Pods Backup-service Pods Mongodb CPU and Memory Limit
10 100 1 1 Default
20 200 1 1 Default
25 250 1 1 Default
30 300 1 1 1 CPU and 1 GiB
40 400 1 2 1 CPU and 1 GiB
50 500 1 3 2 CPU and 2 GiB
60 600 1 4 2 CPU and 2 GiB
70 700 1 5 3 CPU and 3 GiB
80 800 2 6 3 CPU and 3 GiB
90 900 2 7 4 CPU and 4 GiB
100 1000 2 8 4 CPU and 4 GiB

Services not listed in the blueprint table are not required to be scaled. These services are capable of handling the load without any changes, as they do not have scaling requirements for up to 1000 concurrent jobs.

Services like mongodb or any operator-control-manager pods in the ibm-backup-restore namespace cannot be scaled out (increased replicas). For these services, increase the CPU and memory resources to ensure they perform well as the number of concurrent jobs increases.

The guardian-bridge service automatically scales to handle the additional load.

Scaling Services

Use the following command to scale out the desired replicas for the required services:
oc scale deployment <deployment-name> --replicas=<desired-replicas> -n <backup-restore-namespace> 

Or

oc scale sts <statefulset-name> --replicas=<desired-replicas> -n <backup-restore-namespace>
Scale backup-service Pods:
oc scale deployment backup-service --replicas=<desired-replicas> -n <backup-restore-namespace>
Scale applicationsvc Pods
oc scale deployment <applicationsvc --replicas=<desired-replicas> -n <backup-restore-namespace>
Use the following command to scale up the desired resource limits for the required services:
oc set resources deployment <deployment-name> --limits=cpu=<desired cpu limit>,memory=<desired-memory-limit -n <backup-restore-namespace>
Or
oc set resources sts <statefulset-name > --limits=cpu=<desired cpu limit>,memory=<desired-memory-limit -n <backup-restore-namespace>
Scale up mongodb resource limits
oc set resources sts mongodb --limits=cpu=<desired cpu limit>,memory=<desired-memory-limit> --containers=mongodb  -n <backup-restore-namespace>