Horizontal pod auto scaling by using custom metrics
The Horizontal Pod Autoscaler (HPA) in IBM Cloud Private allows
your system to automatically scale workloads up or down based on the resource usage. This automatic scaling helps to guarantee service level agreements (SLAs) for your workloads.
By default, the HPA policy automatically scales the number of pods based on the observed CPU utilization. However, in many situations, you might want to scale the application based on other monitored metrics, such as the number of incoming requests or the memory consumption. Starting with IBM Cloud Private Version 3.1.2, you have the capability to automate scaling by leveraging the Prometheus and Prometheus adapter.
Prometheus
Prometheus is widely used to monitor all the components of a Kubernetes cluster. These components
include the control plane, the worker nodes, and the applications that are running on the cluster.
Prometheus adapter
Prometheus adapter is the Kubernetes aggregator layer
that installs extra Kubernetes-style APIs and register custom API servers to the Kubernetes cluster. The adapter gathers the names of available metrics from Prometheus at regular intervals and then exposes metrics to HPA for autoscaling.
Preparing for the installation
By default, in IBM Cloud Private, HPA is enabled to auto scale based on CPU utilization. To enable auto scaling based on custom metrics, you must remove the custom-metrics-adapter option from the disabled_management_services parameter in the /<installation_directory>/cluster/config.yaml file.
Your configuration file might resemble the following code:
## Management Services Settings
## You can disable following services: custom-metrics-adapter, istio, metering, monitoring, service-catalog, storage-glusterfs, vulnerability-advisor
management_services:
istio: disabled
vulnerability-advisor: disabled
storage-glusterfs: disabled
storage-minio: disabled
Verifying the installation
After installation completes, verify that the custom-metrics-adapter is enabled.
-
Ensure that the
autoscaling/v2beta1API group displays.kubectl api-versions |grep "autoscaling/v2beta1"The output resembles the following code:
autoscaling/v2beta1 -
Ensure that the corresponding
custom-metrics-adapterpod is deployed and is in arunningstate.kubectl get po -n kube-system |grep custom-metrics-adapterThe output resembles the following code:
custom-metrics-adapter-76d7bb8dcd-2pj4k 1/1 Running 0 18m -
List the default custom metrics that are provided by the Prometheus adapter on the pod.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . |grep "pods/"The output resembles the following code:
"name": "pods/kube_pod_container_status_waiting_reason", "name": "pods/fs_read", "name": "pods/memory_failures", "name": "pods/kube_pod_status_phase", "name": "pods/kube_pod_container_resource_limits_memory_bytes", "name": "pods/cpu_user", "name": "pods/fs_usage_bytes", "name": "pods/tasks_state", "name": "pods/kube_pod_container_info", "name": "pods/cpu_cfs_throttled", "name": "pods/fs_sector_writes", "name": "pods/kube_pod_created", "name": "pods/network_tcp_usage", "name": "pods/spec_memory_limit_bytes", "name": "pods/network_udp_usage", "name": "pods/memory_max_usage_bytes", "name": "pods/spec_cpu_quota", "name": "pods/kube_pod_container_status_terminated_reason", "name": "pods/cpu_system", "name": "pods/kube_pod_container_status_running", "name": "pods/kube_pod_status_ready", "name": "pods/fs_io_time_weighted", "name": "pods/fs_reads_bytes", "name": "pods/kube_pod_info", "name": "pods/fs_reads_merged", "name": "pods/kube_pod_container_resource_requests_cpu_cores", "name": "pods/fs_io_time", "name": "pods/kube_pod_container_resource_limits_cpu_cores", "name": "pods/fs_inodes", "name": "pods/start_time_seconds", "name": "pods/kube_pod_container_status_terminated", "name": "pods/kube_pod_container_status_waiting", "name": "pods/cpu_usage", "name": "pods/spec_cpu_shares", "name": "pods/spec_memory_reservation_limit_bytes", "name": "pods/kube_pod_container_status_ready", "name": "pods/fs_writes_merged", "name": "pods/fs_inodes_free", "name": "pods/cpu_cfs_throttled_periods", "name": "pods/kube_pod_labels", "name": "pods/cpu_load_average_10s", "name": "pods/fs_io_current", "name": "pods/memory_working_set_bytes", "name": "pods/spec_memory_swap_limit_bytes", "name": "pods/fs_reads", "name": "pods/kube_pod_container_resource_requests_memory_bytes", "name": "pods/memory_rss", "name": "pods/cpu_cfs_periods", "name": "pods/fs_writes_bytes", "name": "pods/fs_writes", "name": "pods/last_seen", "name": "pods/spec_cpu_period", "name": "pods/kube_pod_start_time", "name": "pods/fs_write", "name": "pods/memory_failcnt", "name": "pods/kube_pod_container_status_restarts", "name": "pods/fs_sector_reads", "name": "pods/kube_pod_status_scheduled", "name": "pods/memory_cache", "name": "pods/memory_usage_bytes", "name": "pods/memory_swap", "name": "pods/fs_limit_bytes", "name": "pods/kube_pod_owner",
Example: Deploying an application with a HPA policy
This example shows you how to autoscale a nginx web application based on memory usage by using a HPA policy. When the memory_usage_bytes of a nginx pod is greater than 10 M, the policy scales up the nginx web application. Scaling up
an application increases the number of pods available for a deployment. If the memory_usage_bytes of a nginx pod is less than 10 M, the application scales down, but does not scale below the minimum number of replicas that are specified
for the deployment.
-
Create the
podinfo-svc.yamlfile by using the following code:--- apiVersion: v1 kind: Service metadata: name: podinfo labels: app: podinfo annotations: prometheus.io/scrape: "true" spec: type: NodePort ports: - port: 80 targetPort: 80 nodePort: 31198 protocol: TCP selector: app: podinfo -
Create a
podinfoservice by running the following command:kubectl create -f podinfo-svc.yamlThe response resembles the following example:
service "podinfo" created -
Create the
podinfo-dep.yamlfile by using the following code:--- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: podinfo spec: replicas: 2 template: metadata: labels: app: podinfo annotations: prometheus.io/scrape: 'true' spec: containers: - name: podinfod image: nginx:latest imagePullPolicy: Always ports: - containerPort: 80 protocol: TCP resources: requests: memory: "32Mi" cpu: "1m" limits: memory: "256Mi" cpu: "100m" -
Create a
podinfodeployment by running the following command:kubectl create -f podinfo-dep.yamlThe response resembles the following example:
deployment "podinfo" created -
Create the
podinfo-hpa-custom.yamlfile by using the following code:--- apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metricName: memory_usage_bytes targetAverageValue: 10485760 -
Create a
podinfoHPA policy based on pod memory usage (memory_usage_bytes,10485760=10M) by running the following command:kubectl create -f podinfo-hpa-custom.yamlThe response resembles the following example:
horizontalpodautoscaler.autoscaling "podinfo" created -
Simulate the load by using the Apache
abapplication. This application triggers an autoscaling workload.for a in `seq 1 50`; do ab -rSqd -c 200 -n 20000 <node_ip>:31198/;done<node_ip>is the IP address of a node in your IBM Cloud Private cluster.