Compute Services

How to scale my microservice with Cloud Foundry, Containers and OpenWhisk?

Share this post:

Our journey to compare compute options started with a post describing the choices and the introduction of a simple Fibonacci microservice. In the second post, we tested how the microservice would behave after a crash.

How about scalability? As your app gets more users and traffic how do you scale its components to handle the additional load. Scaling a solution is not a simple task. Do I need to add a new node? Should I put more memory or CPU? Is there some tuning I can do in the code? Can this scaling happen automatically? How and when do I scale down? Again no single answer. And when you build a system with multiple components, identifying which component to scale and how becomes even more complex.

Apache JMeter

To highlight the (auto)scaling capabilities, we need to generate some load on our microservice. Several tools and online services are available to achieve this. We are going to use Apache JMeter. Apache JMeter is an open source software, a Java application designed to load test functional behavior and measure performance. With JMeter, you define a test plan. The test plan describes the requests you want to run against the service you are testing. You can simulate multiple users calling your service. JMeter also has reporting options to measure the throughput, the response time.

The GitHub project comes with JMeter scripts to test the behavior of Cloud Foundry, Kubernetes and OpenWhisk. The scripts require the JMeter Plugin manager and the 3 Basic Graphs and 5 Additional Graphs plugins to be installed.

 

View the JMeter scripts in GitHub

 

Cloud Foundry – instance scaling

The JMeter script for our Cloud Foundry application simulates 100 users. Each user calls the Fibonacci service, computing numbers for 10ms, doing this in a non-stopping loop.

If you run this script against your deployment of the Fibonacci service, you will quickly notice that the app reaches a maximum number of requests per second. To those looking at the code right now and suggesting improvements or to use another runtime than Node.js to increase the number of requests, yes, maybe, I get your point but keep in mind the implementation is like this to illustrate the compute options capabilities. So obviously some implementation choices were made to force the trait.

Now we can add more instances to our application. Cloud Foundry makes it easy either through the console or manually with:

cf scale fibonacci-service -i 4

The impact is immediate. As soon as new instances become available, the number of requests per second increases.

When the load decreases, we can scale down the instances from the console or with:

cf scale fibonacci-service -i 1

The manual option is handy if you are monitoring the service throughput and want to scale it interactively. Another option is the Auto-Scaling service in Bluemix. This service enables you to automatically increase or decrease the compute capacity of your application. The number of application instances are adjusted dynamically based on the Auto-Scaling policy you define.

The Auto-Scaling service can be configured with a rule to automatically add instances if the throughput reaches a threshold.

For further details, these two blog posts, here and there, give an in-depth look at the Auto-Scaling service.

Kubernetes – pod and cluster scaling

For Kubernetes, the JMeter script simulates 50 users. Each user calls the Fibonacci service, computing numbers for 1000 iterations, doing this in a non-stopping loop.

The attentive reader will have noticed that this script differs from the Cloud Foundry one. It does. The goal of this post is to showcase the scalability capabilities of each compute option. To make this easy to apprehend for Kubernetes, the Fibonacci deployment is configured with CPU limits. These limits will be reached quickly with the above JMeter script.

Once the limits are reached, adding pod replicas will increase our throughput:

kubectl scale -f fibonacci-deployment.yml --replicas=4

Like Cloud Foundry, Kubernetes has a mechanism to automatically scale pods, the Horizontal Pod Autoscaling. It works by monitoring metrics and controlling the number of replicas needed to match a target average CPU utilization.

In addition to the built-in CPU based scaling, Kubernetes supports custom metrics. However the documentation on this is very light. But I found someone who achieved to do a requests per second custom metric.

The Horizontal Pod Autoscaling (or HPA for short) is only part of the strategy for Kubernetes. As you increase the number of pods, you will likely need to increase the overall capacity of your cluster by adding nodes – new nodes mean more space for pods.

OpenWhisk – designed to scale

Serverless platforms are designed to scale with the load. You don’t have to plan the capacity of the environment, it should handle the load you are throwing at it. Of course behind the scene, the platform vendor has planned the environment capacity and deployed methods to scale this environment. But for you, the developer, this is transparent.

Take the OpenWhisk JMeter script. It simulates up to 150 users. Each user calls the Fibonacci service, computing numbers for 100ms, doing this in a non-stopping loop. Watch the number of requests per second increasing as more users are hitting the Fibonacci action.

And we did not have to manually scale the system or configure an autoscaling rule. OpenWhisk takes care of this. Though be aware of the OpenWhisk system limits such as the maximum number of activations that might be submitted per namespace per minute and the concurrent number of activations that may be submitted per namespace either executing or queued for execution.

What will you choose?

Each compute option has the ability to manually or automatically scale based on the metrics (memory, CPU, throughput, response time depending of the platform you pick). Obviously this is only one aspect in scaling a system. Based on the problem you are addressing, adding more nodes may provide a better option than adding raw CPU or memory. Fine tuning of the runtime and your code might be sufficient in some cases to deliver better performance. As usual, there is no one-size-fits-all answer but understanding the available options is the first step in making conscious decisions.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter: @L2FProd.

Offering Manager - IBM Cloud

More Compute Services stories
August 16, 2018

Call for Code is Ready to Help Save Lives When Weather Strikes

Perhaps the next great advancement of our age, a better way to protect people from destruction, is an idea you have . . . a code that can save lives. Learn more about Call for Code, a rallying cry for developers to help create the next big solution to disaster crisis.

Continue reading

July 18, 2018

Part III: Wimbledon Facebook Bot on IBM Cloud

Delivering at scale: In the final part of the series, we discuss integrations with on-site systems at the All England Club and how we used Multi-Region within IBM Cloud to ensure scale and availability.

Continue reading

July 16, 2018

Part II: Wimbledon Facebook Bot on IBM Cloud

In the second in a series of posts about how IBM iX designed, developed, and delivered the Facebook Messenger Bot available at Wimbledon 2018, we focus on the broadcast integration within Facebook and how we persisted user preferences using IBM Cloudant and Compose Redis.

Continue reading