May 30, 2017 | Written by: Frederic Lavigne
Categorized: Compute Services | Trending
Share this post:
Our journey to compare compute options started with a post describing the choices and the introduction of a simple Fibonacci microservice. In the second post, we tested how the microservice would behave after a crash.
How about scalability? As your app gets more users and traffic how do you scale its components to handle the additional load. Scaling a solution is not a simple task. Do I need to add a new node? Should I put more memory or CPU? Is there some tuning I can do in the code? Can this scaling happen automatically? How and when do I scale down? Again no single answer. And when you build a system with multiple components, identifying which component to scale and how becomes even more complex.
To highlight the (auto)scaling capabilities, we need to generate some load on our microservice. Several tools and online services are available to achieve this. We are going to use Apache JMeter. Apache JMeter is an open source software, a Java application designed to load test functional behavior and measure performance. With JMeter, you define a test plan. The test plan describes the requests you want to run against the service you are testing. You can simulate multiple users calling your service. JMeter also has reporting options to measure the throughput, the response time.
The GitHub project comes with JMeter scripts to test the behavior of Cloud Foundry, Kubernetes and OpenWhisk. The scripts require the JMeter Plugin manager and the 3 Basic Graphs and 5 Additional Graphs plugins to be installed.
View the JMeter scripts in GitHub
Cloud Foundry – instance scaling
The JMeter script for our Cloud Foundry application simulates 100 users. Each user calls the Fibonacci service, computing numbers for 10ms, doing this in a non-stopping loop.
If you run this script against your deployment of the Fibonacci service, you will quickly notice that the app reaches a maximum number of requests per second. To those looking at the code right now and suggesting improvements or to use another runtime than Node.js to increase the number of requests, yes, maybe, I get your point but keep in mind the implementation is like this to illustrate the compute options capabilities. So obviously some implementation choices were made to force the trait.
Now we can add more instances to our application. Cloud Foundry makes it easy either through the console or manually with:
cf scale fibonacci-service -i 4
The impact is immediate. As soon as new instances become available, the number of requests per second increases.
When the load decreases, we can scale down the instances from the console or with:
cf scale fibonacci-service -i 1
The manual option is handy if you are monitoring the service throughput and want to scale it interactively. Another option is the Auto-Scaling service in Bluemix. This service enables you to automatically increase or decrease the compute capacity of your application. The number of application instances are adjusted dynamically based on the Auto-Scaling policy you define.
The Auto-Scaling service can be configured with a rule to automatically add instances if the throughput reaches a threshold.
For further details, these two blog posts, here and there, give an in-depth look at the Auto-Scaling service.
Kubernetes – pod and cluster scaling
For Kubernetes, the JMeter script simulates 50 users. Each user calls the Fibonacci service, computing numbers for 1000 iterations, doing this in a non-stopping loop.
The attentive reader will have noticed that this script differs from the Cloud Foundry one. It does. The goal of this post is to showcase the scalability capabilities of each compute option. To make this easy to apprehend for Kubernetes, the Fibonacci deployment is configured with CPU limits. These limits will be reached quickly with the above JMeter script.
Once the limits are reached, adding pod replicas will increase our throughput:
kubectl scale -f fibonacci-deployment.yml --replicas=4
Like Cloud Foundry, Kubernetes has a mechanism to automatically scale pods, the Horizontal Pod Autoscaling. It works by monitoring metrics and controlling the number of replicas needed to match a target average CPU utilization.
In addition to the built-in CPU based scaling, Kubernetes supports custom metrics. However the documentation on this is very light. But I found someone who achieved to do a requests per second custom metric.
The Horizontal Pod Autoscaling (or HPA for short) is only part of the strategy for Kubernetes. As you increase the number of pods, you will likely need to increase the overall capacity of your cluster by adding nodes – new nodes mean more space for pods.
OpenWhisk – designed to scale
Serverless platforms are designed to scale with the load. You don’t have to plan the capacity of the environment, it should handle the load you are throwing at it. Of course behind the scene, the platform vendor has planned the environment capacity and deployed methods to scale this environment. But for you, the developer, this is transparent.
Take the OpenWhisk JMeter script. It simulates up to 150 users. Each user calls the Fibonacci service, computing numbers for 100ms, doing this in a non-stopping loop. Watch the number of requests per second increasing as more users are hitting the Fibonacci action.
And we did not have to manually scale the system or configure an autoscaling rule. OpenWhisk takes care of this. Though be aware of the OpenWhisk system limits such as the maximum number of activations that might be submitted per namespace per minute and the concurrent number of activations that may be submitted per namespace either executing or queued for execution.
What will you choose?
Each compute option has the ability to manually or automatically scale based on the metrics (memory, CPU, throughput, response time depending of the platform you pick). Obviously this is only one aspect in scaling a system. Based on the problem you are addressing, adding more nodes may provide a better option than adding raw CPU or memory. Fine tuning of the runtime and your code might be sufficient in some cases to deliver better performance. As usual, there is no one-size-fits-all answer but understanding the available options is the first step in making conscious decisions.
If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter: @L2FProd.