This article was written using the Bluemix classic interface. Given the rapid evolution of technology, some steps and illustrations may have changed.
You love how easy Bluemix is for deploying your applications, but now you are thinking ahead to performance and scalability. What capabilities does Bluemix provide to help you scale your application?
Bluemix provides capabilities for optimizing the performance and scalability of your application through both vertical and horizontal scaling. This article covers the basics of scalability and the capabilities that Bluemix currently provides, and it presents an example of using the Bluemix Auto-Scaling add-on for a Java application.
Types of scaling
Bluemix includes two different methods for scaling an application: vertical scaling and horizontal scaling. Both techniques can be applied to the same application.
Vertical scaling is often referred to as scaling up. Vertical scaling increases the resources available to an application by adding capacity directly to the individual nodes — for example, adding additional memory or increasing the number of CPU cores. Figure 1 illustrates the concept of vertical scaling with the addition of both memory and CPU to an application.
Figure 1. Vertical scaling
Some resource changes require a restart, which results in application downtime. Vertical scaling techniques typically improve the performance of any application, but the improvements might not be linear.
Horizontal scaling is often referred to as scaling out. The overall application resource capacity grows through the addition of entire nodes. Each additional node adds equivalent capacity, such as the same amount of memory and the same CPU. Horizontal scaling typically is achievable without downtime. In Figure 2, which illustrates the concept of horizontal scaling, you see additional identical nodes added with a load balancer in front of the application nodes.
Figure 2. Horizontal scaling
Bluemix provides built-in load-balancing capabilities, so you do not need to deploy or manage these capabilities. Bluemix also provides the mechanism to deploy the additional, identical nodes of your application.
Horizontal scaling often achieves near-linear scaling results, but only if your application is designed for scalability. This design is critical, and not just for the performance of your application: If your application is not designed for horizontal scaling, your application might not even function correctly when horizontal scaling is on.
Application considerations when choosing horizontal scaling
Entire articles are devoted to best practices for designing cloud applications. For a great, practical starting point on the topic, we recommend the "Top 9 rules for cloud applications" article on developerWorks. We won't go into as much detail here, but we'll touch on a few important areas so you get an appreciation of why your application design is so important:
- Design and develop a solution that can be readily scaled out and scaled in. For example, a long-running task in an instance can prevent the instance from shutting down easily when the solution must be scaled down.
- Do not assume that the code will always be running on a specific instance. When scaling an application horizontally, a series of requests from the same source might not be routed to same instance.
- Consider the end-to-end performance of your application. Your application resources will be scaled. However, scaling them will put more pressure on any network and back-end services. Especially if your application is accessing enterprise resources, recognize that as the application tier is scaled, additional pressure is put on the back end not to become the bottleneck.
- Do not assume that continuing to apply additional application resources is the best way to improve performance. If your application has inherent bottlenecks, good performance analysis and improvements in the base application will yield increased performance from every instance.
Autoscaling is the process of dynamically allocating or deallocating the resources required by an application to match performance requirements and satisfy service-level agreements (SLAs). Based on the growth of traffic, an application might require additional resources to perform its tasks efficiently and effectively in a timely manner.
Autoscaling is an elastic process whereby more resources are provisioned as the load increases and deprovisioned as the demand for the resource slackens. Autoscaling is designed to optimize performance to meet the SLAs without overprovisioning resources. Autoscaling helps ease management overhead by automatically monitoring the performance of the system, and by making decisions about adding or removing resources without requiring an operator.
Bluemix manual resource scaling
The Bluemix UI and command line support both vertical and horizontal scaling through increasing the amount of memory and increasing the number of instances of an application runtime. Both techniques can be applied to the same application. Figure 3 shows the manual scaling of the number of instance and memory resources in the Bluemix Liberty for Java runtime.
Figure 3. Bluemix Liberty for Java runtime horizontal and vertical scaling resources
Bluemix Auto-Scaling add-on
“The Auto-Scaling add-on monitors the chosen resources against their policies and increases or decreases the number of instances. Policies are definable on CPU utilization, memory, and heap-utilization metrics.”
Bluemix provides an autoscaling add-on that's available through the catalog. Select the Add-Ons category from the list next to the catalog's search box. Choose Auto-Scaling from the available DevOps add-ons, create an instance, and "bind" the add-on to your applications.
The Bluemix Auto-Scaling add-on manages the scaling up and down of the application. The scaling decisions are made based on a set of configurable policies. The policy configuration used during our experiments is provided in the "Procedure for configuring and testing the Auto-Scaling add-on in Bluemix" section of this article. The Auto-Scaling service provides a set of default autoscaling policies, but we do not recommend relying on defaults. It's better to test and tune the policies based on performance-testing your application.
The Auto-Scaling add-on in Bluemix monitors the chosen resources against their policies and increases or decreases the number of instances. Policies are definable on CPU utilization, memory, and heap-utilization metrics. Table 1 shows the currently available autoscalable triggers by runtime. Based on our experiences, we do not recommend using a memory trigger for Java applications.
Table 1. Bluemix Auto-Scaling add-on supported resource and runtimes
|Autoscalable resource||Description||Supported runtime|
|CPU||Usage percentage of the CPU||Java, Node.js|
|JVM heap||Usage percentage of the JVM heap memory||Java|
|Memory||Usage percentage of the memory||Java (not recommended), Node.js, Ruby|
Bluemix provides the automation for autoscaling. Figure 4 illustrates how the autoscaling process works with the Bluemix add-on.
Figure 4. How the Bluemix Auto-Scaling add-on works
This process is fully automated and is transparent to the application. The metrics and scaling decisions are viewable in the Bluemix console, as we'll show in the example:
- The Bluemix Auto-Scaling add-on injects an agent into the runtime to capture performance data of the application process.
- The Bluemix Auto-Scaling add-on monitors the performance metrics against the selected policies and decides when and how many instances to scale.
- The Bluemix Auto-Scaling add-on requests the placement controller for the scaling action.
- Based on the different parameters, the Bluemix placement controller decides where to create the instances.
- The Bluemix placement controller issues a request and starts the instances.
Procedure for configuring and testing the Auto-Scaling add-on in Bluemix
We experimented with the Auto-Scaling add-on by using a Java application that's available for download from IBM DevOps Services. However, you should test with your own Java application to determine your policy settings. For load generation, we used Rational® Performance Tester. Here are the steps from our Java autoscaling experiments.
Step 1: Create and deploy your Java application
Create your application and deploy it to Bluemix. You can deploy applications from DevOps Services or the command line. If you don't have a Java application, you can use the sample Java application from these experiments. If you are using the command line and our sample application, follow these steps:
- Download BlueMixM-AutoScalingV1.0.war and save it in a directory:
- Navigate to that directory and run:
cf push appname -p BlueMixM-AutoScalingV1.0.war
Note: You need to repush your application after configuring the Auto-Scaling add-on.
When a Java application is deployed onto a Liberty runtime, it runs in a JVM heap. As shown in Figure 5, a single heap per process is shared by all threads. As more requests come in to an application, the average JVM heap consumed increases. Even accounting for garbage collection, our tests show that monitoring average JVM heap is a reasonable trigger for autoscaling our application. However, the dynamics of your application heap usage can be different, so performance testing with your own application is important.
Figure 5. Heap consumed per application instance
Step 2: Collect no-load baseline metrics for your application
Before beginning any testing, capture the baseline statistics for your application in Bluemix. As shown in Figure 6, for our test, the heap usage is at 27.93MB before any load test on the web application is performed.
Figure 6. Initial JVM heap usage (no load)
Step 3: Run a load test for the application (without the Auto-Scaling add-on)
After capturing the baseline no-load metrics, you can start load testing. The load test should be representative of the mix of requests that you expect to be made against your site. Your test should gradually ramp up load and also capture steady-state measurements. You want to understand your throughput plateau, when response time starts to degrade, and any cliffs where performance degrades sharply.
You should run tests against a single instance, as well as test horizontal scalability of multiple instances, using manual scaling. If horizontal scaling is not providing benefits when manually applied, autoscaling will not either. Remember, you must monitor any back-end services as you scale the application tier because the back-end tier can easily become a bottleneck in horizontal scaling.
We ran a load test with Rational Performance Tester and Apache JMeter tools.
Step 4: Collect and analyze initial load test results with a single instance (without autoscaling)
In this step, collect the metrics from your performance testing tool, Bluemix performance monitors, and any other monitoring tools you use for your application. Look carefully at throughput and response-time results. Use the data to understand how much load your application handles before it hits the throughput plateau, when response time degrades below your SLA, and when any specific response time spikes or throughput cliffs occur.
Also look for any errors serving pages. Under high load conditions, some applications start producing errors. Because returning errors is often faster than completing requests, response-time metrics can be artificially lowered by the errors. This is a bad situation: You have an application that isn't working, and you don't realize it if you only monitor response time and not successful results.
Figure 7 shows the throughput from our tests using a single instance. Our throughput plateau is around 54 requests per second; however, at high load, some significant degradation occurs. Response time (shown in Figure 8) continually increases as more load is applied.
Figure 7. Throughput measurements for a single instance (without autoscaling)
Once you understand the baseline load dynamics of your application, analyze the metrics from Bluemix to see which metrics correlate to patterns in your application performance. Your most important end-user metric is likely response time, so your focus is on metrics that correspond to where response time begins its linear degradation toward exceeding your SLA, and how to scale before this occurs. You should also analyze any spikes in response time, or throughput cliffs.
The potential metrics in the current Bluemix Auto-Scaling add-on that you can use as scaling triggers are Average JVM Heap, CPU, and Memory. For our application, Average JVM Heap turned out to be the best indicator of the three currently available. As you can see in Figure 8, the Average JVM Heap correlates well to the increase in average response time.
Figure 8. Average JVM heap metrics and average response time for a single instance (without autoscaling)
If none of these available metrics correlates well for your application, you can set up an external monitoring capability and use metrics such as response time to trigger scaling using the Bluemix command-line interface. External performance management capabilities are now available in a SaaS model through IBM Service Engage, making these easy to try out with your Bluemix application.
Note: Performance measurements were collected on a shared development-level system. There is no guarantee that these measurements will be the same on generally available systems. Actual results can vary. You should verify the applicable data for your specific applications and environment.
Step 5: Collect and analyze initial load test results with two instances (without autoscaling)
In this step, test horizontal scalability by performing the same load test performed in the previous step but add one more manually added instance of your application.
Figure 9 shows the throughput from our tests using two instances. Our throughput plateau is much higher, leveling nicely at 80-85 requests per second, and average response time stays well within our target range.
Figure 9. Throughput measurements with two instances (adding an instance manually)
There is improved consistency in throughput (Figure 7 to Figure 9) and response time (Figure 8 with Figure 10) when horizontal scaling is applied. Similarly, you can see the stabilization in Average JVM Heap (Figure 8 with Figure 10).
Figure 10. Average JVM heap metrics and average response time for two instances (without autoscaling)
Horizontal scaling of our application clearly improves overall throughput and lowers average response time. However, horizontal scaling via manual techniques requires that either a person monitors all the instances, or the added scaling capacity is always on even when not needed. Leaving the instances running when they are not required can increase the cost.
Therefore, let's move to the next step and look at using the autoscaling capability to bring an additional instance online dynamically when response time degrades from additional load, and remove the instance when the load is less.
Step 6: Configure the Auto-Scaling add-on in Bluemix
To configure autoscaling for your application, select the Auto-Scaling add-on from the Bluemix catalog. The add-on is now part of your application, as shown in Figure 11, and ready to be configured.
Figure 11. Auto-Scaling add-on
Click the Open Dashboard link in the Auto-Scaling add-on and configure the policy to trigger autoscaling. In the configuration, you can scale applications based on three metrics: CPU, Memory, and JVM Heap.
Different autoscaling policy configurations are available based on the metric selected. In this example, we configured a policy for the JVM Heap metric to scale out when the upper threshold goes above 50 percent of the heap and scale in when the lower threshold goes below 20 percent of the heap. These thresholds were chosen to align with response-time goals. Figure 12 shows our Auto-Scaling JVM heap policy.
Figure 12. Sample policy configuration for JVM heap
Step 7: Perform the same load test (with autoscaling)
Perform the same load test done previously with multiple instances. This time, as you are generating load, Bluemix is monitoring the JVM heap metrics against the configured policy. If the JVM heap hits either the scale-out or scale-in metrics, instances of your application are automatically added or removed.
Step 8: Look at autoscaling results
Normally, we'd tell you to look at throughput and response time metrics first, but we know you won't be able to resist looking at whether the application was autoscaled. You can find the Scaling History on the Auto-Scale add-on dashboard, shown in Figure 13. You can see that based on the triggers configured, the autoscaling automatically added one new instance as the load increased, and as the test wound down, the additional instance was automatically removed.
Figure 13. Scaling history
If you click the Metrics Statistics tab, you see the actual JVM Heap measurements that triggered the scaling, as shown in Figure 14.
Figure 14. Average JVM heap metric statistics
In Figure 14, you can see that:
- As the load grows and the heap grows above the upper threshold (as configured in the autoscaling policy), the heap spikes and instances are added as part of scale-out.
- After scale-out, the heap drops as the load gets distributed across the instances.
- As test are completed and the load falls below the lower threshold as configured in the autoscaling policy configuration, the instances are removed as part of scale-in.
Step 9: Review throughput (with autoscaling)
Now, let's see if the autoscaling improved the performance dynamics of the application. The throughput measurements, shown in Figure 15, have an initial throughput plateau at around 50 requests per second, with response time degrading as in the single-instance tests.
Figure 15. Throughput measurements with autoscaling
However, in this test, as the upper threshold of the autoscaling configuration policy (Java heap) is reached, additional instances of the application are added to achieve better performance, as shown in Figure 16.
In Figure 15, the maximum throughput reaches a second plateau of 80 requests per second as the second instance is added, and response time stabilizes back down to our target.
Figure 16. Average JVM heap metrics and average response time
The throughput in Figure 15 compares closely with the manual two-instance test in Figure 9, demonstrating how autoscaling helps achieve similar results without manual intervention. As the autoscaling configuration policy hits the lower threshold, an instance is scaled in, because the single instance can handle the load.
This article described capabilities available in Bluemix for vertically and horizontally scaling an application. A step-by-step example showed how to configure the Bluemix Auto-Scaling add-on to automatically scale-out and scale-in a Java application based on JVM heap metric triggers. The Auto-Scaling add-on automatically monitors the performance of a Bluemix application, adding and removing capacity based on the metrics threshold settings selected. Automatic scaling out and scaling in helps to optimize your application performance relative to the resources provisioned and the operational monitoring requirements.
BLUEMIX SERVICES USED IN THIS TUTORIAL:
- Auto-Scaling in IBM Bluemix: Watch this video presentation to learn more about autoscaling in Bluemix.
- "Top 9 rules for cloud applications" (Kyle Brown and Mike Capern, developerworks, April 2014): Learn how following some simple rules in your application design can make your existing applications cloud-ready without needing full reimplementation.
- "Build highly scalable applications with Node.js and Redis" (Ryan Baxter, developerworks, June 2014): See how to build a chat application and scale the app across multiple instances in Bluemix to handle the load.
- Performance Analysis for Java Websites (Stacy Joines, Ruth Willenborg, and Ken Hygh, Addison-Wesley, 2002): Check out this guide for enterprise website developers and quality assurance teams.
- Cloud Architecture Patterns (Bill Wilder, O'Reilly Media, 2012): Learn 11 architectural patterns that can help you take advantage of cloud platform services.
- Programming for PaaS (Lucas Carlson and Doug Baldwin, O'Reilly Media, 2013): Understand the PaaS model from a developer perspective.
- Cloud Computing Patterns (Christoph Fehling et al., Springer, 2014): Learn about proven practices and recent academic advances in cloud computing and about the differences among available cloud offerings.
Dig deeper into Cloud computing on developerWorks
Exclusive tools to build your next great app. Learn more.
Crazy about Cloud? Sign up for our monthly newsletter and the latest cloud news.
Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.