Scaling applications in IBM Bluemix

Optimize application performance with the Auto-Scaling add-on in Bluemix

IBM® Bluemix™ provides capabilities for optimizing application performance through both vertical and horizontal scaling. This article covers the basics of scalability and the capabilities that Bluemix offers, and it presents an example of using the Bluemix Auto-Scaling add-on for a Java™ application.

Lak Sri (laksri@us.ibm.com), Solutions Architect, Ecosystems Development, IBM

Lak SriLak Sri has a master's degree in information technology and works as Senior Engineer and Solution Architect for Bluemix Ecosystem Development, helping business partners and nonprofit organizations adapt to cloud computing. His development experience in high-performance algorithms, enterprise integration, and mobile applications has helped him serve as a trusted advisor and solution architect for large IBM accounts. He has presented and demonstrated innovative solutions at events such as Pulse for several years.



Ruth Willenborg (rewillen@us.ibm.com), IBM Distinguished Engineer , IBM

Ruth Willenborg photoRuth Willenborg is a Distinguished Engineer in the IBM Software Group, Ecosystem Development. She leads the introduction of new technologies to partners, startups, and academics across the globe. Ruth has more 25 years of experience in software development at IBM. She is coauthor of Performance Analysis for Java Web Sites (Addison-Wesley, 2002) and numerous articles on WebSphere performance, using WebSphere with virtualization technologies, cloud computing, and Bluemix.



Bala K. Vellanki (bala.vellanki@us.ibm.com), Solutions Architect, Ecosystems Development, IBM

Bala VellankiBala Vellanki is a solutions architect and worldwide cloud technical lead for IBM Ecosystem Development. Since joining IBM in 2005, Bala has worked as a solutions architect for various Fortune 500 clients in multiple initiatives (including cloud, smarter commerce, and mobile). Bala holds a BS degree in electronics and an MS degree in computer science.



15 August 2014

Sign up for IBM Bluemix
This cloud platform is stocked with free services, runtimes, and infrastructure to help you quickly build and deploy your next mobile or web application.

You love how easy Bluemix is for deploying your applications, but now you are thinking ahead to performance and scalability. What capabilities does Bluemix provide to help you scale your application?

Bluemix provides capabilities for optimizing the performance and scalability of your application through both vertical and horizontal scaling. This article covers the basics of scalability and the capabilities that Bluemix currently provides, and it presents an example of using the Bluemix Auto-Scaling add-on for a Java application.

Types of scaling

Bluemix includes two different methods for scaling an application: vertical scaling and horizontal scaling. Both techniques can be applied to the same application.

Vertical scaling

Vertical scaling is often referred to as scaling up. Vertical scaling increases the resources available to an application by adding capacity directly to the individual nodes — for example, adding additional memory or increasing the number of CPU cores. Figure 1 illustrates the concept of vertical scaling with the addition of both memory and CPU to an application.

Figure 1. Vertical scaling
Diagram illustrating vertical scaling via addition of memory and CPU to an application

Some resource changes require a restart, which results in application downtime. Vertical scaling techniques typically improve the performance of any application, but the improvements might not be linear.

Horizontal scaling

Horizontal scaling is often referred to as scaling out. The overall application resource capacity grows through the addition of entire nodes. Each additional node adds equivalent capacity, such as the same amount of memory and the same CPU. Horizontal scaling typically is achievable without downtime. In Figure 2, which illustrates the concept of horizontal scaling, you see additional identical nodes added with a load balancer in front of the application nodes.

Figure 2. Horizontal scaling
Diagram illustrating horizontal scaling, showing additional identical nodes added with a load balancer in front of the application nodes

Bluemix provides built-in load-balancing capabilities, so you do not need to deploy or manage these capabilities. Bluemix also provides the mechanism to deploy the additional, identical nodes of your application.

Horizontal scaling often achieves near-linear scaling results, but only if your application is designed for scalability. This design is critical, and not just for the performance of your application: If your application is not designed for horizontal scaling, your application might not even function correctly when horizontal scaling is on.

Application considerations when choosing horizontal scaling

Entire articles are devoted to best practices for designing cloud applications. For a great, practical starting point on the topic, we recommend the "Top 9 rules for cloud applications" article on developerWorks. We won't go into as much detail here, but we'll touch on a few important areas so you get an appreciation of why your application design is so important:

  • Design and develop a solution that can be readily scaled out and scaled in. For example, a long-running task in an instance can prevent the instance from shutting down easily when the solution must be scaled down.
  • Do not assume that the code will always be running on a specific instance. When scaling an application horizontally, a series of requests from the same source might not be routed to same instance.
  • Consider the end-to-end performance of your application. Your application resources will be scaled. However, scaling them will put more pressure on any network and back-end services. Especially if your application is accessing enterprise resources, recognize that as the application tier is scaled, additional pressure is put on the back end not to become the bottleneck.
  • Do not assume that continuing to apply additional application resources is the best way to improve performance. If your application has inherent bottlenecks, good performance analysis and improvements in the base application will yield increased performance from every instance.

Autoscaling

Autoscaling is the process of dynamically allocating or deallocating the resources required by an application to match performance requirements and satisfy service-level agreements (SLAs). Based on the growth of traffic, an application might require additional resources to perform its tasks efficiently and effectively in a timely manner.

Autoscaling is an elastic process whereby more resources are provisioned as the load increases and deprovisioned as the demand for the resource slackens. Autoscaling is designed to optimize performance to meet the SLAs without overprovisioning resources. Autoscaling helps ease management overhead by automatically monitoring the performance of the system, and by making decisions about adding or removing resources without requiring an operator.


Bluemix manual resource scaling

The Bluemix UI and command line support both vertical and horizontal scaling through increasing the amount of memory and increasing the number of instances of an application runtime. Both techniques can be applied to the same application. Figure 3 shows the manual scaling of the number of instance and memory resources in the Bluemix Liberty for Java runtime.

Figure 3. Bluemix Liberty for Java runtime horizontal and vertical scaling resources
Illustration that shows manual scaling of the number of instance and memory resources in the Liberty for Java runtime

Bluemix Auto-Scaling add-on

The Auto-Scaling add-on monitors the chosen resources against their policies and increases or decreases the number of instances. Policies are definable on CPU utilization, memory, and heap-utilization metrics.

Bluemix provides an autoscaling add-on that's available through the catalog. Select the Add-Ons category from the list next to the catalog's search box. Choose Auto-Scaling from the available DevOps add-ons, create an instance, and "bind" the add-on to your applications.

The Bluemix Auto-Scaling add-on manages the scaling up and down of the application. The scaling decisions are made based on a set of configurable policies. The policy configuration used during our experiments is provided in the "Procedure for configuring and testing the Auto-Scaling add-on in Bluemix" section of this article. The Auto-Scaling service provides a set of default autoscaling policies, but we do not recommend relying on defaults. It's better to test and tune the policies based on performance-testing your application.

The Auto-Scaling add-on in Bluemix monitors the chosen resources against their policies and increases or decreases the number of instances. Policies are definable on CPU utilization, memory, and heap-utilization metrics. Table 1 shows the currently available autoscalable triggers by runtime. Based on our experiences, we do not recommend using a memory trigger for Java applications.

Table 1. Bluemix Auto-Scaling add-on supported resource and runtimes
Autoscalable resourceDescriptionSupported runtime
CPU Usage percentage of the CPU Java, Node.js
JVM heap Usage percentage of the JVM heap memory Java
Memory Usage percentage of the memory Java (not recommended), Node.js, Ruby

Bluemix provides the automation for autoscaling. Figure 4 illustrates how the autoscaling process works with the Bluemix add-on.

Figure 4. How the Bluemix Auto-Scaling add-on works
Diagram that illutrates the Bluemix autoscaling process

This process is fully automated and is transparent to the application. The metrics and scaling decisions are viewable in the Bluemix console, as we'll show in the example:

  1. The Bluemix Auto-Scaling add-on injects an agent into the runtime to capture performance data of the application process.
  2. The Bluemix Auto-Scaling add-on monitors the performance metrics against the selected policies and decides when and how many instances to scale.
  3. The Bluemix Auto-Scaling add-on requests the placement controller for the scaling action.
  4. Based on the different parameters, the Bluemix placement controller decides where to create the instances.
  5. The Bluemix placement controller issues a request and starts the instances.

Procedure for configuring and testing the Auto-Scaling add-on in Bluemix

We experimented with the Auto-Scaling add-on by using a Java application that's available for download from IBM DevOps Services. However, you should test with your own Java application to determine your policy settings. For load generation, we used Rational® Performance Tester. Here are the steps from our Java autoscaling experiments.

Step 1: Create and deploy your Java application

Create your application and deploy it to Bluemix. You can deploy applications from DevOps Services or the command line. If you don't have a Java application, you can use the sample Java application from these experiments. If you are using the command line and our sample application, follow these steps:

  1. Download BlueMixM-AutoScalingV1.0.war and save it in a directory:
  2. Navigate to that directory and run:
    cf push appname -p BlueMixM-AutoScalingV1.0.war

Note: You need to repush your application after configuring the Auto-Scaling add-on.

When a Java application is deployed onto a Liberty runtime, it runs in a JVM heap. As shown in Figure 5, a single heap per process is shared by all threads. As more requests come in to an application, the average JVM heap consumed increases. Even accounting for garbage collection, our tests show that monitoring average JVM heap is a reasonable trigger for autoscaling our application. However, the dynamics of your application heap usage can be different, so performance testing with your own application is important.

Figure 5. Heap consumed per application instance
Illustration showing that as an application received more requests, the average JVM heap consumed increases

Step 2: Collect no-load baseline metrics for your application

Before beginning any testing, capture the baseline statistics for your application in Bluemix. As shown in Figure 6, for our test, the heap usage is at 27.93MB before any load test on the web application is performed.

Figure 6. Initial JVM heap usage (no load)
Screen capture of test results for heap usage

Step 3: Run a load test for the application (without the Auto-Scaling add-on)

After capturing the baseline no-load metrics, you can start load testing. The load test should be representative of the mix of requests that you expect to be made against your site. Your test should gradually ramp up load and also capture steady-state measurements. You want to understand your throughput plateau, when response time starts to degrade, and any cliffs where performance degrades sharply.

You should run tests against a single instance, as well as test horizontal scalability of multiple instances, using manual scaling. If horizontal scaling is not providing benefits when manually applied, autoscaling will not either. Remember, you must monitor any back-end services as you scale the application tier because the back-end tier can easily become a bottleneck in horizontal scaling.

We ran a load test with Rational Performance Tester and Apache JMeter tools.

Step 4: Collect and analyze initial load test results with a single instance (without autoscaling)

In this step, collect the metrics from your performance testing tool, Bluemix performance monitors, and any other monitoring tools you use for your application. Look carefully at throughput and response-time results. Use the data to understand how much load your application handles before it hits the throughput plateau, when response time degrades below your SLA, and when any specific response time spikes or throughput cliffs occur.

Also look for any errors serving pages. Under high load conditions, some applications start producing errors. Because returning errors is often faster than completing requests, response-time metrics can be artificially lowered by the errors. This is a bad situation: You have an application that isn't working, and you don't realize it if you only monitor response time and not successful results.

Figure 7 shows the throughput from our tests using a single instance. Our throughput plateau is around 54 requests per second; however, at high load, some significant degradation occurs. Response time (shown in Figure 8) continually increases as more load is applied.

Figure 7. Throughput measurements for a single instance (without autoscaling)
Screen capture of throughput test results

Once you understand the baseline load dynamics of your application, analyze the metrics from Bluemix to see which metrics correlate to patterns in your application performance. Your most important end-user metric is likely response time, so your focus is on metrics that correspond to where response time begins its linear degradation toward exceeding your SLA, and how to scale before this occurs. You should also analyze any spikes in response time, or throughput cliffs.

The potential metrics in the current Bluemix Auto-Scaling add-on that you can use as scaling triggers are Average JVM Heap, CPU, and Memory. For our application, Average JVM Heap turned out to be the best indicator of the three currently available. As you can see in Figure 8, the Average JVM Heap correlates well to the increase in average response time.

Figure 8. Average JVM heap metrics and average response time for a single instance (without autoscaling)
Screen capture of results for average JVM heap metrics and average response (single instance without autoscaling)

If none of these available metrics correlates well for your application, you can set up an external monitoring capability and use metrics such as response time to trigger scaling using the Bluemix command-line interface. External performance management capabilities are now available in a SaaS model through IBM Service Engage, making these easy to try out with your Bluemix application.

Note: Performance measurements were collected on a shared development-level system. There is no guarantee that these measurements will be the same on generally available systems. Actual results can vary. You should verify the applicable data for your specific applications and environment.

Step 5: Collect and analyze initial load test results with two instances (without autoscaling)

In this step, test horizontal scalability by performing the same load test performed in the previous step but add one more manually added instance of your application.

Figure 9 shows the throughput from our tests using two instances. Our throughput plateau is much higher, leveling nicely at 80-85 requests per second, and average response time stays well within our target range.

Figure 9. Throughput measurements with two instances (adding an instance manually)
Screen capture of throughput rest results (two instances, adding an instance manually)

Click to see larger image

Figure 9. Throughput measurements with two instances (adding an instance manually)

Screen capture of throughput rest results (two instances, adding an instance manually)

There is improved consistency in throughput (Figure 7 to Figure 9) and response time (Figure 8 with Figure 10) when horizontal scaling is applied. Similarly, you can see the stabilization in Average JVM Heap (Figure 8 with Figure 10).

Figure 10. Average JVM heap metrics and average response time for two instances (without autoscaling)
Screen capture of average JVM heap metrics and average response time (two instances without autoscaling)

Horizontal scaling of our application clearly improves overall throughput and lowers average response time. However, horizontal scaling via manual techniques requires that either a person monitors all the instances, or the added scaling capacity is always on even when not needed. Leaving the instances running when they are not required can increase the cost.

Therefore, let's move to the next step and look at using the autoscaling capability to bring an additional instance online dynamically when response time degrades from additional load, and remove the instance when the load is less.

Step 6: Configure the Auto-Scaling add-on in Bluemix

To configure autoscaling for your application, select the Auto-Scaling add-on from the Bluemix catalog. The add-on is now part of your application, as shown in Figure 11, and ready to be configured.

Figure 11. Auto-Scaling add-on
Screenshot of the Bluemix Auto-Scaling add-on's overview page

Click to see larger image

Figure 11. Auto-Scaling add-on

Screenshot of the Bluemix Auto-Scaling add-on's overview page

Click the Open Dashboard link in the Auto-Scaling add-on and configure the policy to trigger autoscaling. In the configuration, you can scale applications based on three metrics: CPU, Memory, and JVM Heap.

Different autoscaling policy configurations are available based on the metric selected. In this example, we configured a policy for the JVM Heap metric to scale out when the upper threshold goes above 50 percent of the heap and scale in when the lower threshold goes below 20 percent of the heap. These thresholds were chosen to align with response-time goals. Figure 12 shows our Auto-Scaling JVM heap policy.

Figure 12. Sample policy configuration for JVM heap
Screenshot of the Bluemix Auto-Scaling add-on's configuration page

Click to see larger image

Figure 12. Sample policy configuration for JVM heap

Screenshot of the Bluemix Auto-Scaling add-on's configuration page

Step 7: Perform the same load test (with autoscaling)

Perform the same load test done previously with multiple instances. This time, as you are generating load, Bluemix is monitoring the JVM heap metrics against the configured policy. If the JVM heap hits either the scale-out or scale-in metrics, instances of your application are automatically added or removed.

Step 8: Look at autoscaling results

Normally, we'd tell you to look at throughput and response time metrics first, but we know you won't be able to resist looking at whether the application was autoscaled. You can find the Scaling History on the Auto-Scale add-on dashboard, shown in Figure 13. You can see that based on the triggers configured, the autoscaling automatically added one new instance as the load increased, and as the test wound down, the additional instance was automatically removed.

Figure 13. Scaling history
Screenshot of the Auto-Scaling history page

If you click the Metrics Statistics tab, you see the actual JVM Heap measurements that triggered the scaling, as shown in Figure 14.

Figure 14. Average JVM heap metric statistics
Screenshot of the Metric Statistics page for the average JVM heap metric

Click to see larger image

Figure 14. Average JVM heap metric statistics

Screenshot of the Metric Statistics page for the average JVM heap metric

In Figure 14, you can see that:

  • As the load grows and the heap grows above the upper threshold (as configured in the autoscaling policy), the heap spikes and instances are added as part of scale-out.
  • After scale-out, the heap drops as the load gets distributed across the instances.
  • As test are completed and the load falls below the lower threshold as configured in the autoscaling policy configuration, the instances are removed as part of scale-in.

Step 9: Review throughput (with autoscaling)

Now, let's see if the autoscaling improved the performance dynamics of the application. The throughput measurements, shown in Figure 15, have an initial throughput plateau at around 50 requests per second, with response time degrading as in the single-instance tests.

Figure 15. Throughput measurements with autoscaling
Screenshot of throughput test results with autoscaling

However, in this test, as the upper threshold of the autoscaling configuration policy (Java heap) is reached, additional instances of the application are added to achieve better performance, as shown in Figure 16.

In Figure 15, the maximum throughput reaches a second plateau of 80 requests per second as the second instance is added, and response time stabilizes back down to our target.

Figure 16. Average JVM heap metrics and average response time
Screen shot of average JVM heap metrics and average response time with autoscaling

Click to see larger image

Figure 16. Average JVM heap metrics and average response time

Screen shot of average JVM heap metrics and average response time with autoscaling

The throughput in Figure 15 compares closely with the manual two-instance test in Figure 9, demonstrating how autoscaling helps achieve similar results without manual intervention. As the autoscaling configuration policy hits the lower threshold, an instance is scaled in, because the single instance can handle the load.


Conclusion

This article described capabilities available in Bluemix for vertically and horizontally scaling an application. A step-by-step example showed how to configure the Bluemix Auto-Scaling add-on to automatically scale-out and scale-in a Java application based on JVM heap metric triggers. The Auto-Scaling add-on automatically monitors the performance of a Bluemix application, adding and removing capacity based on the metrics threshold settings selected. Automatic scaling out and scaling in helps to optimize your application performance relative to the resources provisioned and the operational monitoring requirements.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Cloud computing on developerWorks


  • Bluemix Developers Community

    Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

  • Cloud digest

    Complete cloud software, infrastructure, and platform knowledge.

  • DevOps Services

    Software development in the cloud. Register today to create a project.

  • Try SoftLayer Cloud

    Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing
ArticleID=980428
ArticleTitle=Scaling applications in IBM Bluemix
publish-date=08152014