Horizontal pod auto scaling by using custom metrics

The Horizontal Pod Autoscaler (HPA) Opens in a new tab in IBM Cloud Private allows your system to automatically scale workloads up or down based on the resource usage. This automatic scaling helps to guarantee service level agreements (SLAs) for your workloads.

By default, the HPA policy automatically scales the number of pods based on the observed CPU utilization. However, in many situations, you might want to scale the application based on other monitored metrics, such as the number of incoming requests or the memory consumption. Starting with IBM Cloud Private Version 3.1.2, you have the capability to automate scaling by leveraging the Prometheus and Prometheus adapter.

Prometheus

Prometheus Opens in a new tab is widely used to monitor all the components of a Kubernetes cluster. These components include the control plane, the worker nodes, and the applications that are running on the cluster.

Prometheus adapter

Prometheus adapter Opens in a new tab is the Kubernetes aggregator layer Opens in a new tab that installs extra Kubernetes-style APIs and register custom API servers to the Kubernetes cluster. The adapter gathers the names of available metrics from Prometheus at regular intervals and then exposes metrics to HPA for autoscaling.

Preparing for the installation

By default, in IBM Cloud Private, HPA is enabled to auto scale based on CPU utilization. To enable auto scaling based on custom metrics, you must remove the custom-metrics-adapter option from the disabled_management_services parameter in the /<installation_directory>/cluster/config.yaml file.

Your configuration file might resemble the following code:

## Management Services Settings
## You can disable following services: custom-metrics-adapter, istio, metering, monitoring, service-catalog, storage-glusterfs, vulnerability-advisor
management_services:
  istio: disabled
  vulnerability-advisor: disabled
  storage-glusterfs: disabled
  storage-minio: disabled

Verifying the installation

After installation completes, verify that the custom-metrics-adapter is enabled.

  1. Ensure that the autoscaling/v2beta1 API group displays.

     kubectl api-versions |grep "autoscaling/v2beta1"
    

    The output resembles the following code:

     autoscaling/v2beta1
    
  2. Ensure that the corresponding custom-metrics-adapter pod is deployed and is in a running state.

     kubectl get po -n kube-system |grep custom-metrics-adapter
    

    The output resembles the following code:

     custom-metrics-adapter-76d7bb8dcd-2pj4k                        1/1       Running     0          18m
    
  3. List the default custom metrics that are provided by the Prometheus adapter on the pod.

     kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .  |grep "pods/"
    

    The output resembles the following code:

           "name": "pods/kube_pod_container_status_waiting_reason",
           "name": "pods/fs_read",
           "name": "pods/memory_failures",
           "name": "pods/kube_pod_status_phase",
           "name": "pods/kube_pod_container_resource_limits_memory_bytes",
           "name": "pods/cpu_user",
           "name": "pods/fs_usage_bytes",
           "name": "pods/tasks_state",
           "name": "pods/kube_pod_container_info",
           "name": "pods/cpu_cfs_throttled",
           "name": "pods/fs_sector_writes",
           "name": "pods/kube_pod_created",
           "name": "pods/network_tcp_usage",
           "name": "pods/spec_memory_limit_bytes",
           "name": "pods/network_udp_usage",
           "name": "pods/memory_max_usage_bytes",
           "name": "pods/spec_cpu_quota",
           "name": "pods/kube_pod_container_status_terminated_reason",
           "name": "pods/cpu_system",
           "name": "pods/kube_pod_container_status_running",
           "name": "pods/kube_pod_status_ready",
           "name": "pods/fs_io_time_weighted",
           "name": "pods/fs_reads_bytes",
           "name": "pods/kube_pod_info",
           "name": "pods/fs_reads_merged",
           "name": "pods/kube_pod_container_resource_requests_cpu_cores",
           "name": "pods/fs_io_time",
           "name": "pods/kube_pod_container_resource_limits_cpu_cores",
           "name": "pods/fs_inodes",
           "name": "pods/start_time_seconds",
           "name": "pods/kube_pod_container_status_terminated",
           "name": "pods/kube_pod_container_status_waiting",
           "name": "pods/cpu_usage",
           "name": "pods/spec_cpu_shares",
           "name": "pods/spec_memory_reservation_limit_bytes",
           "name": "pods/kube_pod_container_status_ready",
           "name": "pods/fs_writes_merged",
           "name": "pods/fs_inodes_free",
           "name": "pods/cpu_cfs_throttled_periods",
           "name": "pods/kube_pod_labels",
           "name": "pods/cpu_load_average_10s",
           "name": "pods/fs_io_current",
           "name": "pods/memory_working_set_bytes",
           "name": "pods/spec_memory_swap_limit_bytes",
           "name": "pods/fs_reads",
           "name": "pods/kube_pod_container_resource_requests_memory_bytes",
           "name": "pods/memory_rss",
           "name": "pods/cpu_cfs_periods",
           "name": "pods/fs_writes_bytes",
           "name": "pods/fs_writes",
           "name": "pods/last_seen",
           "name": "pods/spec_cpu_period",
           "name": "pods/kube_pod_start_time",
           "name": "pods/fs_write",
           "name": "pods/memory_failcnt",
           "name": "pods/kube_pod_container_status_restarts",
           "name": "pods/fs_sector_reads",
           "name": "pods/kube_pod_status_scheduled",
           "name": "pods/memory_cache",
           "name": "pods/memory_usage_bytes",
           "name": "pods/memory_swap",
           "name": "pods/fs_limit_bytes",
           "name": "pods/kube_pod_owner",
    

Example: Deploying an application with a HPA policy

This example shows you how to autoscale a nginx web application based on memory usage by using a HPA policy. When the memory_usage_bytes of a nginx pod is greater than 10 M, the policy scales up the nginx web application. Scaling up an application increases the number of pods available for a deployment. If the memory_usage_bytes of a nginx pod is less than 10 M, the application scales down, but does not scale below the minimum number of replicas that are specified for the deployment.

  1. Create the podinfo-svc.yaml file by using the following code:

    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: podinfo
      labels:
        app: podinfo
      annotations:
        prometheus.io/scrape: "true"        
    spec:
      type: NodePort
      ports:
       - port: 80
         targetPort: 80
         nodePort: 31198
         protocol: TCP
      selector:
       app: podinfo
    
  2. Create a podinfo service by running the following command:

    kubectl create -f podinfo-svc.yaml
    

    The response resembles the following example:

    service "podinfo" created
    
  3. Create the podinfo-dep.yaml file by using the following code:

     ---
     apiVersion: extensions/v1beta1
     kind: Deployment
     metadata:
       name: podinfo
     spec:
       replicas: 2
       template:
         metadata:
           labels:
             app: podinfo
           annotations:
             prometheus.io/scrape: 'true'
         spec:
           containers:
           - name: podinfod
             image: nginx:latest
             imagePullPolicy: Always
             ports:
             - containerPort: 80
               protocol: TCP
             resources:
               requests:
                 memory: "32Mi"
                 cpu: "1m"
               limits:
                 memory: "256Mi"
                 cpu: "100m"
    
  4. Create a podinfo deployment by running the following command:

    kubectl create -f podinfo-dep.yaml
    

    The response resembles the following example:

    deployment "podinfo" created
    
  5. Create the podinfo-hpa-custom.yaml file by using the following code:

    ---
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    metadata:
      name: podinfo
    spec:
      scaleTargetRef:
        apiVersion: extensions/v1beta1
        kind: Deployment
        name: podinfo
      minReplicas: 2
      maxReplicas: 10
      metrics:
      - type: Pods
        pods:
          metricName: memory_usage_bytes
          targetAverageValue: 10485760
    
  6. Create a podinfo HPA policy based on pod memory usage (memory_usage_bytes, 10485760=10M) by running the following command:

    kubectl create -f podinfo-hpa-custom.yaml
    

    The response resembles the following example:

     horizontalpodautoscaler.autoscaling "podinfo" created
    
  7. Simulate the load by using the Apache ab application. This application triggers an autoscaling workload.

    for a in `seq 1 50`; do ab -rSqd -c 200 -n 20000 <node_ip>:31198/;done
    

    <node_ip> is the IP address of a node in your IBM Cloud Private cluster.