Autoscaling
You can configure Decision Center, Decision Runner, and Decision Server Runtime deployments, both default and dedicated, to automatically scale horizontally when the workload demands it.
About this task
When autoscaling is enabled, a new Kubernetes resource, called
HorizontalPodAutoscaler, is created. This resource controls the scale of a
deployment and its replica set by adding or removing pods according to volume of the workload. For
more information, see the Horizontal Pod Autoscaling documentation
.
The decision to scale up or down a deployment is based on the comparison of the average CPU usage and average memory consumption of all the pods of the deployment with the configurable targets for both metrics.
- Tolerance around the configurable targets (10% by default)
- Minimum duration of time (stabilization window) during which the need to scale up or down must be consistently reached before acting
- Rate of creation or deletion of pods
For more information, see Configurable scaling behavior
.
| Parameter name | Description | Default value |
|---|---|---|
| enabled | Specify whether to enable autoscaling for the component | false |
| minReplicas | The minimum number of replicas | 2 |
| maxReplicas | The maximum number of replicas | 3 |
| targetAverageCpuUtilization | The target average utilization of CPU | 300% |
| targetAverageMemoryUtilization | The target average utilization of memory |
|
| behavior | Additional parameters, including tolerance and stabilization window, as defined in Configurable scaling behavior
|
Depends on the component. For Decision Server
Runtime: |
The target average utilization is expressed as a percentage of the request for CPU and memory
(resources.requests.cpu and resources.requests.memory).
The default values are greater than 100% because the CPU and memory utilization can increase over
the request, up to the specified limits (resources.limits.cpu and
resources.limits.memory).
In Decision Server Runtime, the
default value for resources.requests.memory is 512 MB, and the default value for
resources.limits.memory is 4 GB. The default value of 600% for
targetAverageMemoryUtilization amounts to 75% of the maximum amount of memory for a
pod (600% of 512 MB = 3 GB, which is 75% of 4 GB).
Procedure
You can configure HPA for each component independently.
Results
After the deployment is scaled, check the computed usage of the component by running the following command.
$ kubectl get hpa
Example output with RELEASE_NAME = hpa:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-odm-decisionserverruntime Deployment/hpa-odm-decisionserverruntime cpu: 7%/350% 1 4 1 2m1s
hpa-odm-decisionserverruntime-loan-validation Deployment/hpa-odm-decisionserverruntime-loan-validation cpu: 10%/350%, memory: 186%/700% 1 2 1 2m1s