Autoscaling

You can configure Decision Center, Decision Runner, and Decision Server Runtime deployments, both default and dedicated, to automatically scale horizontally when the workload demands it.

About this task

When autoscaling is enabled, a new Kubernetes resource, called HorizontalPodAutoscaler, is created. This resource controls the scale of a deployment and its replica set by adding or removing pods according to volume of the workload. For more information, see the Horizontal Pod Autoscaling documentation External link opens a new window or tab .

The decision to scale up or down a deployment is based on the comparison of the average CPU usage and average memory consumption of all the pods of the deployment with the configurable targets for both metrics.

Additional parameters:

Tolerance around the configurable targets (10% by default)
Minimum duration of time (stabilization window) during which the need to scale up or down must be consistently reached before acting
Rate of creation or deletion of pods

For more information, see Configurable scaling behavior External link opens a new window or tab .

Horizontal pod autoscaling (HPA) is best suited for stateless applications that run short-lived transactions.

Warning: When HPA is enabled on Decision Center, users might be disconnected and lose their work when pods are scaled down. Likewise, test or simulation runs on Decision Runner might be interrupted. To mitigate the risk, the default value for the stabilization window when scaling down is set to a high value for Decision Center and Decision Runner (3,000 seconds = 50 minutes), and the default value for minReplicas is set to 2 for Decision Center.

The following table lists and defines the autoscaling parameters:

Table 1. Autoscaling parameters
Parameter name	Description	Default value
enabled	Specify whether to enable autoscaling for the component	false
minReplicas	The minimum number of replicas	2
maxReplicas	The maximum number of replicas	3
targetAverageCpuUtilization	The target average utilization of CPU	300%
targetAverageMemoryUtilization	The target average utilization of memory	200% in Decision Center 600% in Decision Runner and Decision Server Runtime
behavior	Additional parameters, including tolerance and stabilization window, as defined in Configurable scaling behavior	Depends on the component. For Decision Server Runtime: `behavior: scaleDown: stabilizationWindowSeconds: 300 scaleUp: stabilizationWindowSeconds: 30`

The target average utilization is expressed as a percentage of the request for CPU and memory (resources.requests.cpu and resources.requests.memory).

The default values are greater than 100% because the CPU and memory utilization can increase over the request, up to the specified limits (resources.limits.cpu and resources.limits.memory).

In Decision Server Runtime, the default value for resources.requests.memory is 512 MB, and the default value for resources.limits.memory is 4 GB. The default value of 600% for targetAverageMemoryUtilization amounts to 75% of the maximum amount of memory for a pod (600% of 512 MB = 3 GB, which is 75% of 4 GB).

Procedure

You can configure HPA for each component independently.

For example, for the Decision Server Runtime component, set the decisionServerRuntime.autoscaling parameter in a new dsr-autoscaling.yaml file:
```
decisionServerRuntime:
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 4
    targetAverageCpuUtilization: 350
    targetAverageMemoryUtilization: ''
  dedicated:
  - name: loan-validation
    paths:
    - /loan-validation/
    autoscaling:
      enabled: true
      maxReplicas: 2
      targetAverageMemoryUtilization: 700
```
By use the parameters above:
- The default runtime pods are scaled automatically from 1 to 4 pods, based on the CPU utilization only, with a target of 350%.
- The dedicated runtime pods (for the loan-validation rulesets) are scaled between 1 to 2 pods, with a CPU utilization target of 350% and a memory utilization target of 700%.
Note: If autoscaling is enabled for one of the components, the value set for the parameter replicaCount is ignored.

Modify the existing Helm release by running the following helm upgrade command.

$ helm get values RELEASE_NAME > get-values.yaml
$ helm upgrade RELEASE_NAME -f get-values.yaml -f dsr-autoscaling.yaml charts/ibm-odm-prod-CHART_VERSION.tgz

Results

After the deployment is scaled, check the computed usage of the component by running the following command.

$ kubectl get hpa

Example output with RELEASE_NAME = hpa:


NAME                                                    REFERENCE                                                  TARGETS                             MINPODS   MAXPODS   REPLICAS   AGE
hpa-odm-decisionserverruntime                   Deployment/hpa-odm-decisionserverruntime                   cpu:  7%/350%                       1         4         1          2m1s
hpa-odm-decisionserverruntime-loan-validation   Deployment/hpa-odm-decisionserverruntime-loan-validation   cpu: 10%/350%, memory: 186%/700%    1         2         1          2m1s