Scaling

Scaling is the process of adding or removing computing resources to meet the demands that a workload makes for resources to maintain availability and performance.

As the traffic and the workload increase for an application, scaling the application enables it to keep up with the user demands. Running multiple instances of an application distributes the traffic to all of them. Also, when multiple instances of an application are running, you are able to do rolling updates without any downtime.

The following sections provide information about the types of scaling that is used within FTM on Red Hat® OpenShift®.

Manual scaling

In manual scaling, you can manually change the number of minimum replicas for a particular pod. The number of replicas can be changed before you install the operator and also after the instance is deployed and is running.

For example, suppose that OAC is running with a minimum replica count of 1. Based on current performance scenario, you decide to increase the minimum number of replicas to 3. For more information about how to increase the number of replicas, see Modifying the parameters of the deployed instance of your FTM offering.

Auto-scaling

Auto-scaling is a feature in Red Hat OpenShift where the applications that are deployed can scale the amount of resources that they use based on certain specifications. In Red Hat OpenShift applications, auto-scaling is also known as horizontal pod auto-scaling.

Auto-scaling is based on CPU usage targets in the following FTM products:
  • FTM
  • Check

The FTM operator uses the CPU usage resource to dynamically increase or decrease the replica count. The threshold value is 80, which means that if a pod starts to use more than 80% of the allocated CPU, a new pod is started dynamically. After the CPU usage falls under 80%, the newly created pods are dynamically deleted until the pod count reaches the minimum size parameter.

For example, suppose that the OAC pod is running with the resource consumption of 4% CPU initially. The CPU target is set to 80%. This target value refers to the CPU resource requests value. The CPU resource request for OAC is 1 CPU. Therefore, 80% is 800 m of CPU for OAC, which is the target for Horizontal Pod Autoscaler (HPA).
Targets=4%/80%   Minpods=1   Maxpods=3    Replicas=1
The CPU load is being increased and the CPU load reaches 99%, whereas the target value is 80%. So, now the HPA automatically increases the number of OAC pods.
Targets=99%/80%  Minpods=1   Maxpods=3   Replicas=2

Two OAC pods are running. Based on CPU usage, the HPA for OAC auto-scaled the pods with two replicas.

Now, the CPU load is being decreased. The HPA detects that the CPU load is less than 80% and then decreases the scaled pods back to the original number of replicas.
Targets=1%/80%    Minpods=1   Maxpods=3   Replicas=1

Configuring auto-scaling

By default, each FTM pod is configured with a maximum replica count of 1. Therefore, auto-scaling is not configured by default. Auto-scaling is configured by updating the custom resource for your FTM offering.

The auto-scaling parameters are in the spec.resources[].containers section of the custom resource. To meet your requirements, you can configure the number of replicas and the maximum number of replicas that can be created for the resource. The following example YAML snippet shows how to set the maximum replica count for Approvals to 3.
        - maxReplicas: 3
          name: approvals-engine
          replica: 1