Calculating loads for applications can prove to be time-consuming and challenging given its variability based on regional business hours or significant holidays. When preliminary testing was done on IBM Cloud Ingress ALBs to determine its baseline load, it was shown to handle approximately 20,000 connections per second. However, if your applications received traffic exceeding this limitation, the number of ALB pods in the cluster had to be manually increased. This manual solution is not convenient nor able to handle varying load sizes, as it requires you to pick a static replica count, which might lead to resource waste when the cluster is underutilized or cause request timeouts during peak usage.

On 28 July 2023, we introduced managed autoscaling for IBM Cloud Kubernetes Service Ingress ALBs. This feature allows you to dynamically and automatically adjust the number of ALB pods based on the actual load. Therefore, you can minimize your resource waste and preserve your ability to serve increased traffic demand.

What is horizontal pod autoscaling?

Horizontal pod autoscaling (HPA) is a Kubernetes concept. In simple terms, HPA horizontally scales an application in response to changes in load. In practice, pods are launched or terminated automatically.

The load can be determined by many factors. Most of the time, CPU utilization is a good measure, but it is possible to configure other factors, as well. When configured, Kubernetes tries to keep the average load of all pods within a specified range by adjusting the replica count of the deployment. This helps by scaling up the replica count in more demanding times and scaling back in less-active hours.

In order to utilize the benefits of HPA on IBM Cloud Kubernetes Service Ingress ALB deployments, a set of CLI commands were introduced. In the next sections, we demonstrate how you can configure autoscaling on resource metrics like CPU usage or even on custom metrics for more advanced use cases.

Enable ALB autoscaling based on CPU average utilization

As the ALB pods are predominantly CPU-heavy processes, using the CPU average utilization as the basis of autoscaling configuration is a good choice for most use cases.

In order to simply enable HPA based on CPU utilization average, you can use the following CLI command:

$ ibmcloud ks ingress alb autoscale set -c <clusterID> --alb <albID> --max-replicas <desired maximum replicas> --min-replicas <desired minimum replicas> --cpu-average-utilization <desired CPU average utilization>

To determine the desired CPU average utilization, you may find guidance in our documentation. It is not recommended to set the minimum replica count below two for high availability purposes. It is also not recommended to set the maximum replica count above the number of workers your cluster has, as these excess ALB pods will not be scheduled due to anti-affinity rules.

For example, if you would like to configure a maximum of 12 replicas with a target CPU average utilization of 600%, you can use the following command:

$ ibmcloud ks ingress alb autoscale set -c <clusterID> --alb public-cr<clusterID>-alb1 --max-replicas 12 --min-replicas 2 --cpu-average-utilization 600

Keep in mind that it may take up to 10 minutes for the HPA resource to be deployed.

To verify that your HPA resource is configured as intended, you can use the kubectl get horizontalpodautoscaler command:

$ kubectl get horizontalpodautoscaler -n kube-system
NAME                                 REFERENCE                                       TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
public-cr<clusterID>-alb1            Deployment/public-cr<clusterID>-alb1            11%/600%   2         12        2          131m

You can use the same command to determine if the HPA is working as intended. The following command was executed on the same cluster. This time, the utilization has increased, and more requests are being served by the ALB. As you can see, the HPA automatically increased the amount of ALB replicas as the load increased:

NAME                                 REFERENCE                                       TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
public-cr<clusterID>-alb1            Deployment/public-cr<clusterID>-alb1            418%/600%   2         12        5          3h

Enable ALB autoscaling based on custom metrics

The ALBs expose various metrics, including request statistics and Nginx process metrics. These metrics can be captured and aggregated by a metric collector like Prometheus and made available to Kubernetes HPA by using Prometheus Adapter.

Depending on your use case, you might want to scale your ALBs based on metrics other than CPU average utilization. For example, based on the number of incoming requests per second or the number of established connections. We provide a way for you to fine-tune your scaling with custom metrics.

In the following example, we present an example for setting up autoscaling based on custom metrics. With this example, you can easily get started on designing your custom metrics-based setup. As a prerequisite, we deployed an application, exposed the application using an Ingress resource (named alb-autoscale-example-ingress) and set up monitoring by using Prometheus. We also configured Prometheus Adapter to expose the nginx_ingress_controller_requests_rate metric for the Kubernetes metric-server. If you would like to follow along, you can find the sample resources and instructions in the IBMCloud/kube-samples repository.

To verify whether the configured custom metrics are available for the Kubernetes metrics-server, we can issue the following command:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta2/namespaces/alb-autoscale-example/ingress/alb-autoscale-example-ingress/nginx_ingress_controller_requests_rate"
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Ingress",
        "namespace": "alb-autoscale-example",
        "name": "alb-autoscale-example-ingress",
        "apiVersion": "networking.k8s.io/v1"
      },
      "metric": {
        "name": "nginx_ingress_controller_requests_rate",
        "selector": null
      },
      "timestamp": "2023-08-08T09:11:02Z",
      "value": "0"
    }
  ]
}

To enable ALB autoscaling based on custom metrics, we can use the same command we used before, but this time we are going to define the --custom-metrics-file flag instead of --cpu-average-utilization. The custom metrics file must contain a MetricSpec array that will be directly injected in the HPA configuration.

Our custom metrics file is named custom-metrics.yaml and has the following content:

- type: Object
  object:
    metric:
      name: nginx_ingress_controller_requests_rate
    describedObject:
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      name: alb-autoscale-example-ingress
      namespace: alb-autoscale-example
    target:
      type: Value
      value: 2k

Let’s use our custom metrics configuration for autoscale:

$ ibmcloud ks ingress alb autoscale set -c <clusterID> --alb <albID> --min-replicas 1 --max-replicas 12 --custom-metrics-file custom-metrics.yaml
Setting autoscaling configuration for <albID>...
OK

Now we can use the kubectl get horizontalpodautoscaler command again to see whether autoscaling works as expected:

$ kubectl get horizontalpodautoscaler -n kube-system public-cr<clusterID>-alb1
NAME                                 REFERENCE                                       TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
public-cr<clusterID>-alb1            Deployment/public-cr<clusterID>-alb1            0/2k          1         12        2          2h

The HPA scaled down the ALB to one replica because there were no requests and we configured the minimum replica count as one. As the request-per-second rate increases, the HPA scales up the ALB automatically:

$ kubectl get horizontalpodautoscaler -n kube-system public-cr<clusterID>-alb1
NAME                                 REFERENCE                                       TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
public-cr<clusterID>-alb1            Deployment/public-cr<clusterID>-alb1            399700m/2k    1         12        3          2h

More information

To learn more about autoscaling in general, you can check out the Horizontal Pod Autoscaling page in the official Kubernetes documentation.

Check out “Dynamically scaling ALBs with autoscaler” in our official documentation

Contact us

If you have questions, engage our team via Slack by registering here and join the discussion in the #general channel on our public IBM Cloud Kubernetes Service Slack.

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters