Backup & restore hub performance and scaling

Use this topic to understand the number of spokes you can scale up to and the number of spoke clusters that can be connected to a single hub. It outlines the capacities and the scalability considerations to ensure efficient backup and restore operations.

The Backup & restore hub service is efficiently designed to handle large-scale backup and restore concurrent jobs across clusters. With proven scalability and performance. The system is tested successfully to handle up to 1000 concurrent jobs. It serves as a reference point and not a limitation; the hub can scale further with appropriate resource allocation.

Sizing blueprints for varying workloads

The following table provides the recommended configurations for each required service when handling different numbers of concurrent jobs. As the number of concurrent jobs increases, you can scale out (increase the replicas) or scale up (increase the resource limits) for the required services as outlined:

Table 1. Sizing blueprints
Clusters Concurrent Jobs Applicationsvc Pods Backup-service Pods Mongodb CPU and Memory Limit guardian-bridge
Memory Required and Limit CPU Required and Limit JVM Xms and Xmx
10 100 1 1 Default 1 GiB and 2 GiB (default) 0.5 and 1 (default) 1 G and 1 G
20 200 1 1 Default 2 GiB and 4 GiB 0.5 and 1 2 G and 2 G
25 250 1 1 Default 2 GiB and 4 GiB 0.5 and 1 2 G and 2 G
30 300 1 1 1 CPU and 1 GiB 3 GiB and 6 GiB 1 and 2 4 G and 4 G
40 400 1 2 1 CPU and 1 GiB 4 GiB and 7 GiB 1 and 2 5 G and 5 G
50 500 1 3 2 CPU and 2 GiB 5 GiB and 8 GiB 1 and 3 6 G and 6 G
60 600 1 4 2 CPU and 2 GiB 6 GiB and 10 GiB 1 and 3 8 G and 8 G
70 700 1 5 3 CPU and 3 GiB 8 GiB and 12 GiB 2 and 4 9 G and 9 G
80 800 2 6 3 CPU and 3 GiB 8 GiB and 12 GiB 2 and 4 10 G and 10 G
90 900 2 7 4 CPU and 4 GiB 10 GiB and 15 GiB 3 and 5 11 G and 11 G
100 1000 2 8 4 CPU and 4 GiB 10 GiB and 15 GiB 3 and 5 12 G and 12 G

Services not listed in the blueprint table are not required to be scaled. These services are capable of handling the load without any changes, as they do not have scaling requirements for up to 1000 concurrent jobs.

Services like mongodb or any operator-control-manager pods in the ibm-backup-restore namespace cannot be scaled out (increased replicas). For these services, increase the CPU and memory resources to ensure they perform well as the number of concurrent jobs increases.

Scaling Services

Use the following command to scale out the desired replicas for the required services:
oc scale deployment <deployment-name> --replicas=<desired-replicas> -n <backup-restore-namespace> 

Or

oc scale sts <statefulset-name> --replicas=<desired-replicas> -n <backup-restore-namespace>
Scale backup-service Pods:
oc scale deployment backup-service --replicas=<desired-replicas> -n <backup-restore-namespace>
Scale applicationsvc Pods
oc scale deployment <applicationsvc --replicas=<desired-replicas> -n <backup-restore-namespace>
Use the following command to scale up the desired resource limits for the required services:
oc set resources deployment <deployment-name> --limits=cpu=<desired cpu limit>,memory=<desired-memory-limit -n <backup-restore-namespace>
Or
oc set resources sts <statefulset-name > --limits=cpu=<desired cpu limit>,memory=<desired-memory-limit -n <backup-restore-namespace>
Scale up mongodb resource limits
oc set resources sts mongodb --limits=cpu=<desired cpu limit>,memory=<desired-memory-limit> --containers=mongodb  -n <backup-restore-namespace>
Scale up guardian-bridge resource limits
oc patch kafkabridge guardian-bridge \
  -n ibm-backup-restore \
  --type=merge \
  -p '{
    "spec": {
      "resources": {
        "limits": {
          "cpu": "<cpu limit>",
          "memory": "<memory limit>"
        },
        "requests": {
          "cpu": "<cpu request>",
          "memory": "<memory request"
        }
      },
      "jvmOptions": {
        "-Xms": "<min heap>",
        "-Xmx": "<max heap>"
      }
    }
  }'