Horizontal pod auto scaling by using custom metrics

The Horizontal Pod Autoscaler (HPA) Opens in a new tab in IBM Cloud Private allows your system to automatically scale workloads up or down based on the resource usage. This automatic scaling helps to guarantee service level agreements (SLAs) for your workloads.

By default, the HPA policy automatically scales the number of pods based on the observed CPU utilization. However, in many situations, you might want to scale the application based on other monitored metrics, such as the number of incoming requests or the memory consumption. Starting with IBM Cloud Private Version 3.1.2, you have the capability to automate scaling by leveraging the Prometheus and Prometheus adapter.

Prometheus

Prometheus Opens in a new tab is widely used to monitor all the components of a Kubernetes cluster. These components include the control plane, the worker nodes, and the applications that are running on the cluster.

Prometheus adapter

Prometheus adapter Opens in a new tab is the Kubernetes aggregator layer that installs extra Kubernetes-style APIs and register custom API servers to the Kubernetes cluster. The adapter gathers the names of available metrics from Prometheus at regular intervals and then exposes metrics to HPA for autoscaling.

Preparing for the installation

By default, in IBM Cloud Private, HPA is enabled to auto scale based on CPU utilization. To enable auto scaling based on custom metrics, you must remove the custom-metrics-adapter option from the disabled_management_services parameter in the /<installation_directory>/cluster/config.yaml file.

Your configuration file might resemble the following code:

## Management Services Settings
## You can disable following services: custom-metrics-adapter, istio, metering, monitoring, service-catalog, storage-glusterfs, vulnerability-advisor
management_services:
  istio: disabled
  vulnerability-advisor: disabled
  storage-glusterfs: disabled
  storage-minio: disabled

Verifying the installation

After installation completes, verify that the custom-metrics-adapter is enabled.

Ensure that the autoscaling/v2beta1 API group displays.
```
 kubectl api-versions |grep "autoscaling/v2beta1"
```
The output resembles the following code:
```
 autoscaling/v2beta1
```

Ensure that the corresponding custom-metrics-adapter pod is deployed and is in a running state.

 kubectl get po -n kube-system |grep custom-metrics-adapter

The output resembles the following code:

 custom-metrics-adapter-76d7bb8dcd-2pj4k                        1/1       Running     0          18m

List the default custom metrics that are provided by the Prometheus adapter on the pod.

 kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .  |grep "pods/"

The output resembles the following code:

       "name": "pods/kube_pod_container_status_waiting_reason",
       "name": "pods/fs_read",
       "name": "pods/memory_failures",
       "name": "pods/kube_pod_status_phase",
       "name": "pods/kube_pod_container_resource_limits_memory_bytes",
       "name": "pods/cpu_user",
       "name": "pods/fs_usage_bytes",
       "name": "pods/tasks_state",
       "name": "pods/kube_pod_container_info",
       "name": "pods/cpu_cfs_throttled",
       "name": "pods/fs_sector_writes",
       "name": "pods/kube_pod_created",
       "name": "pods/network_tcp_usage",
       "name": "pods/spec_memory_limit_bytes",
       "name": "pods/network_udp_usage",
       "name": "pods/memory_max_usage_bytes",
       "name": "pods/spec_cpu_quota",
       "name": "pods/kube_pod_container_status_terminated_reason",
       "name": "pods/cpu_system",
       "name": "pods/kube_pod_container_status_running",
       "name": "pods/kube_pod_status_ready",
       "name": "pods/fs_io_time_weighted",
       "name": "pods/fs_reads_bytes",
       "name": "pods/kube_pod_info",
       "name": "pods/fs_reads_merged",
       "name": "pods/kube_pod_container_resource_requests_cpu_cores",
       "name": "pods/fs_io_time",
       "name": "pods/kube_pod_container_resource_limits_cpu_cores",
       "name": "pods/fs_inodes",
       "name": "pods/start_time_seconds",
       "name": "pods/kube_pod_container_status_terminated",
       "name": "pods/kube_pod_container_status_waiting",
       "name": "pods/cpu_usage",
       "name": "pods/spec_cpu_shares",
       "name": "pods/spec_memory_reservation_limit_bytes",
       "name": "pods/kube_pod_container_status_ready",
       "name": "pods/fs_writes_merged",
       "name": "pods/fs_inodes_free",
       "name": "pods/cpu_cfs_throttled_periods",
       "name": "pods/kube_pod_labels",
       "name": "pods/cpu_load_average_10s",
       "name": "pods/fs_io_current",
       "name": "pods/memory_working_set_bytes",
       "name": "pods/spec_memory_swap_limit_bytes",
       "name": "pods/fs_reads",
       "name": "pods/kube_pod_container_resource_requests_memory_bytes",
       "name": "pods/memory_rss",
       "name": "pods/cpu_cfs_periods",
       "name": "pods/fs_writes_bytes",
       "name": "pods/fs_writes",
       "name": "pods/last_seen",
       "name": "pods/spec_cpu_period",
       "name": "pods/kube_pod_start_time",
       "name": "pods/fs_write",
       "name": "pods/memory_failcnt",
       "name": "pods/kube_pod_container_status_restarts",
       "name": "pods/fs_sector_reads",
       "name": "pods/kube_pod_status_scheduled",
       "name": "pods/memory_cache",
       "name": "pods/memory_usage_bytes",
       "name": "pods/memory_swap",
       "name": "pods/fs_limit_bytes",
       "name": "pods/kube_pod_owner",

Example: Deploying an application with a HPA policy

This example shows you how to autoscale a nginx web application based on memory usage by using a HPA policy. When the memory_usage_bytes of a nginx pod is greater than 10 M, the policy scales up the nginx web application. Scaling up an application increases the number of pods available for a deployment. If the memory_usage_bytes of a nginx pod is less than 10 M, the application scales down, but does not scale below the minimum number of replicas that are specified for the deployment.

Create the podinfo-svc.yaml file by using the following code:

---
apiVersion: v1
kind: Service
metadata:
  name: podinfo
  labels:
    app: podinfo
  annotations:
    prometheus.io/scrape: "true"        
spec:
  type: NodePort
  ports:
   - port: 80
     targetPort: 80
     nodePort: 31198
     protocol: TCP
  selector:
   app: podinfo

Create a podinfo service by running the following command:
```
kubectl create -f podinfo-svc.yaml
```
The response resembles the following example:
```
service "podinfo" created
```

Create the podinfo-dep.yaml file by using the following code:

 ---
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
   name: podinfo
 spec:
   replicas: 2
   template:
     metadata:
       labels:
         app: podinfo
       annotations:
         prometheus.io/scrape: 'true'
     spec:
       containers:
       - name: podinfod
         image: nginx:latest
         imagePullPolicy: Always
         ports:
         - containerPort: 80
           protocol: TCP
         resources:
           requests:
             memory: "32Mi"
             cpu: "1m"
           limits:
             memory: "256Mi"
             cpu: "100m"

Create a podinfo deployment by running the following command:
```
kubectl create -f podinfo-dep.yaml
```
The response resembles the following example:
```
deployment "podinfo" created
```

Create the podinfo-hpa-custom.yaml file by using the following code:

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: podinfo
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metricName: memory_usage_bytes
      targetAverageValue: 10485760

Create a podinfo HPA policy based on pod memory usage (memory_usage_bytes, 10485760=10M) by running the following command:
```
kubectl create -f podinfo-hpa-custom.yaml
```
The response resembles the following example:
```
 horizontalpodautoscaler.autoscaling "podinfo" created
```
Simulate the load by using the Apache ab application. This application triggers an autoscaling workload.
```
for a in `seq 1 50`; do ab -rSqd -c 200 -n 20000 <node_ip>:31198/;done
```
<node_ip> is the IP address of a node in your IBM Cloud Private cluster.