September 4, 2019 By Ying Liu
Sachi Pradhan
5 min read

How to use the autoscale feature to adjust in response to dynamic workload changes.

We previously announced new autoscaling capabilities for IBM Cloud Foundry Public that are now Generally Available. This blog post will focus on using the autoscaling feature to adjust your application capability in response to dynamic workload changes. 

Cloud Foundry Autoscaling helps you to scale in/out your application horizontally by adding or removing your application instances. This makes sure that your application can run across multiple-instances for High Availability.  

You can consider your scalability design through the following aspects as defined in the “twelve-factor app” methodology: 

  1. Stateless Processes 
  2. Scaling based on process model
  3. Fast startup and graceful shutdown

Best practices when setting up autoscaling policies

Cloud Foundry Autoscaling allows you to customize when and how to scale your application using a pre-defined autoscaling policy. Autoscaling policy documents the min/max instance limits for your application, the selected scaling criteria metric type, the threshold to trigger scale out/in, and the number of instances to be added/removed per step. 

Here are some tips to help you create a better autoscaling policy: 

  • Set up performance baselines to understand how your application behaves under load.
  • Choose a proper metric to scale your application according to performance test results.
    • For CPU-intensive applications, be aware that the CPU is actually weighted and shared with other apps on the same host VMs. So, the autoscaling decision may be affected by other apps.  
    • For memory-intensive applications, the memory usage of an application depends on the process memory allocation algorithm of the runtime type. If the runtime doesn’t free up allocated memory in time, the scale-in based on memory-related metrics will be slow. 
    • Scaling rules with multiple metric types may conflict with one another and can introduce unexpected fluctuations on application instance counts. 
    • One is recommended to use the same metric-type for scale-out and scale-in for consistency. 
  • Set proper threshold values to trigger scale out/in.
    • Aggressively scale-out, but scale-in less aggressively if the workload varies frequently.
    • Don’t push the upper threshold value too high to avoid system crash before autoscaling happens. 
  • Use schedule scaling to get enough resources prepared if the burst workload is predictable. 
  • Allow enough quota for your organization for scaling out.
  • Autoscaling policy needs to be applied to each target “application” separately. If the target application is rolling over with the blue-green approach in which a new application is pushed, please ensure the same policy is added to the newly created application as well. 

Examples of the best practices in action

Example 1: Create a dynamic autoscaling policy with throughput

User scenario    

  • A web application is designed to serve about 1200 requests/second in total with at least three application instances. An automatic scale-out with a throughput metric is required to expand the capacity, when necessary, to support up to 4000 requests/second with more instances.
  • The application memory is set to 128MB for each instance.  
  • The application is hosted in Cloud Foundry with an organization memory of 4GB. Also, this application is the only consumer of the org memory. 

Solution

  • Define min/max instances count:
    • According to the use case, you need to define the instance_min_count to 3 to fulfill the minimum instance requirements.  
    • Then, for the instance_max_count—according the org/space quota limitation 2G—the maximum instance number could be up to 32. But, given that the massive instances will add more cost, you can set the instance_max_count according to the estimation for application maximum capacity as well. In this case, 10+ instances should be fine to support the requirement of throughput of 4000 requests/second, so instance_max_count can safely be set to 12. 
  • Define dynamic scaling rules:
    • For dynamic scaling rules, throughput is selected as the metric_type. Since the maximum capacity on each instance is 400 requests/second, you can set 300 requests/second as the upper threshold to scale-out and 100 requests/second as the lower threshold to scale-down. 
    • To add more instances quickly with scale-out, you can change the capacity using a % ‘adjustment’ when scaling out and stepping down one by one.  
    • The breach_duration_secs controls when the scaling action should happen. The cool_down_sec controls how long the next scaling action must wait to make sure that the system is stable after a scaling action is done. You can define these explicitly or omit to use the default values. 

Sample policy JSON

{
  "instance_min_count": 3,
  "instance_max_count": 12,
  "scaling_rules": [
    {
      "metric_type": "throughput",
      "operator": ">",
      "threshold": 300,
      "adjustment": "+100%",
      "breach_duration_secs": 120,
      "cool_down_secs": 120
    },
    {
      "metric_type": "throughput",
      "operator": "<=",
      "threshold": 100,
      "adjustment": "-1",
      "breach_duration_secs": 120,
      "cool_down_secs": 120
    }
  ]
}

Example 2: Create a specific-date schedule to handle massive access during special event

User scenario

  • A web application normally runs with 3–10 instances but is expected to handle more requests during a marketing event that is scheduled for New Years Eve.   

Solution

If the usage of an application increases extremely quickly during a special event (e.g., marketing promotion events), dynamic scaling may not respond to the usage changes quickly enough. In this case, it is recommended to prepare more instances before the event starts.

  • Create a schedule for the specific event: 
    • You can use specific-date schedule to override the default instance limits definition so that autoscaling can adjust instance numbers in a large scope. 
    • In this case, the default instance_min_count is 3 and the default instance_max_count is 10; in a pre-defined period, you can set the instance_min_count to 10 and the instance_max_count to 30. Additionally, you can set the initial_min_instance_count to 15 if more instances are required at the beginning.   
    • During the period of the defined schedule, the dynamic rules still take effect to adjust the instance capacity but just within the larger range 10–30, instead of the default 3–10.  
    • Once the schedule period ends, the instance limits will fall to the default one.

Sample Policy JSON 

{
   "instance_min_count": 3,
   "instance_max_count": 12,
   "scaling_rules": [
       {
           "metric_type": "throughput",
           "operator": ">",
           "threshold": 300,
           "adjustment": "+100%",
           "breach_duration_secs": 120,
           "cool_down_secs": 120   
       },
       {
           "metric_type": "throughput",
           "operator": "<=",
           "threshold": 100,
           "adjustment": "-1",
           "breach_duration_secs": 120,
           "cool_down_secs": 120   
       } 
   ],
   "schedules": {
       "timezone": "Asia/Shanghai",
       "specific_date": [
          {
               "start_date_time": "2019-12-31T00:00",
               "end_date_time": "2020-01-01T23:59",
               "instance_min_count": 10,
               "instance_max_count": 30,
               "initial_min_instance_count": 15     
           } 
        ]      
   }
}

How to change the specific schedule quickly with the command line

If you need to change schedule frequently, it is recommended to use the command line tool (i.e., JQ) to edit the policy JSON file

For example, if you would like to replace the start_date_time and end_end_time of the original schedule, you can use the following script snippets:  

starttime="2020-12-31T00:00"
endtime="2021-01-01T23:59"
jq ".schedules.specific_date[0].start_date_time=\"$starttime\"|.schedules.specific_date[0].end_date_time=\"$endtime\"" policy.json 

Feel free to use other tools to edit policy JSON file by scripting. 

Example 3: Apply autoscaling policy for blue-green deployments

Blue-green deployment is a common practice in Cloud Foundry to update an application with zero down time. One needs to push a NEW “green” application during the update and then re-route the traffic to the NEW application. 

Autoscaler policy is applied to the Cloud Foundry application, so once you push a NEW “green” application,  you need apply the same autoscaling policy against the new application. 

You can achieve this easily by using the autoscaler CLI tool:

cf autoscaling-policy BLUE_APP_NAME --output policy.json 
cf attach-autoscaling-policy GREEN_APP_NAME policy.json

​About IBM Cloud Foundry

IBM Cloud Foundry is a Cloud Foundry-certified development platform for cloud-native applications on IBM Cloud. IBM Cloud Foundry is the fastest and the cheapest way to build and host a cloud-native application on IBM Cloud

Get started and deploy your first application today.

Was this article helpful?
YesNo

More from Cloud

IBM Cloud expands its VPC operations in Dallas, Texas

3 min read - Everything is bigger in Texas—including the IBM Cloud® Network footprint. Today, IBM Cloud opened its 10th data center in Dallas, Texas, in support of their virtual private cloud (VPC) operations. DAL14, the new addition, is the fourth availability zone in the IBM Cloud area of Dallas, Texas. It complements the existing setup, which includes two network points of presence (PoPs), one federal data center, and one single-zone region (SZR). The facility is designed to help customers use technology such as…

Apache Kafka use cases: Driving innovation across diverse industries

6 min read - Apache Kafka is an open-source, distributed streaming platform that allows developers to build real-time, event-driven applications. With Apache Kafka, developers can build applications that continuously use streaming data records and deliver real-time experiences to users. Whether checking an account balance, streaming Netflix or browsing LinkedIn, today’s users expect near real-time experiences from apps. Apache Kafka’s event-driven architecture was designed to store data and broadcast events in real-time, making it both a message broker and a storage unit that enables real-time…

Primary storage vs. secondary storage: What’s the difference?

6 min read - What is primary storage? Computer memory is prioritized according to how often that memory is required for use in carrying out operating functions. Primary storage is the means of containing primary memory (or main memory), which is the computer’s working memory and major operational component. The main or primary memory is also called “main storage” or “internal memory.” It holds relatively concise amounts of data, which the computer can access as it functions. Because primary memory is so frequently accessed,…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters