Compute Services

My microservice has crashed, now what?

Share this post:

In this previous post, we defined one simple microservice to compute Fibonacci numbers. Then we deployed this microservice as a Cloud Foundry application, as an action in OpenWhisk and as a container in a Kubernetes cluster. Now it is time to use these deployed Fibonacci services to compare the compute options.

Let’s start with the use case where a microservice crashes. It could be because of a bug in the code (it happens!), an out of memory error (that too), a hardware failure of the underlying platform (yes there are still servers running your code somewhere!).

How does the platform where the microservice is running react? Does it help you keeping the service available? With Cloud Foundry, Kubernetes, and OpenWhisk, the answer is Yes. Want some proof? Here they come!

Testing the Fibonacci services

Our Fibonacci service exposes these API methods:

  • given a number n of iteration, return the Fibonacci number at position n in the sequence;
  • given a duration t in milliseconds, compute the Fibonacci sequence during this duration and return the value it was able to compute and the iteration number it reached;
  • and a special endpoint that should crash the microservice.

In the GitHub repository hosting the Fibonacci service code, there is a web application to test this failure scenario. It allows to register Fibonacci service endpoints. Once the endpoints are registered, the web tester pings each endpoint every second and display each ping together with its result.

If you want to go along with the post, you can deploy your own Fibonacci services by going through the manual instructions or with the semi-automated toolchain:

Deploy the Fibonacci service project

Register your endpoints

If you have deployed the Fibonacci service to all compute options, register your endpoints with the web tester.

Open the deployed Cloud Foundry application. The URL should be something like https://fibonacci-service-<random-route> Click on the Add this service to the web tester link:

This will redirect your web browser to a pre-deployed web tester and your endpoint will be added to the list. The web tester saves the endpoint configuration in your web browser local storage. Nothing is stored on the server. Only you can see the registered endpoints.

Do the same for the Kubernetes service. With the free Kubernetes cluster, the URL is like http://cluster-public-ip:30080:

and for the OpenWhisk action where the URL contains your organization name and the Bluemix space where the action was deployed:

Ping the services

Once you have registered the three endpoints, you can start the ping loop:

Each green checkmark corresponds to a successful call to the endpoint – HTTP status code is 200. The gray dots correspond to a call waiting for a reply. You can stop the loop, change the interval between two pings, clear the results, edit/disable/delete individual endpoints. And of course, you can also call the crash endpoint.

Crash the services

In all three deployments of the service, the crash endpoint is implemented as process.exit(1). This effectively kills the Node.js process where the service is running.

With the ping loop running, click the Crash button in the web tester.

The black checkmark corresponds to when you clicked on Crash. Let the ping loop run for a bit. Some failures (red crosses) should be reported, and after a few seconds all endpoints should be back to green checkmarks as shown in the image above.

Cloud Foundry automatically restarts the microservice

When we call the crash endpoint, our Cloud Foundry application quits.

  • Subsequent pings fail;
  • Cloud Foundry detects that the application has ended;
  • it kills the container where it was running;
  • it starts a new instance;
  • and makes it available to the outside world.

This can be observed in the Bluemix console logs for the application:

This is the standard behavior of an application container in Cloud Foundry. You can learn more about the application container lifecycle in this article of the Cloud Foundry documentation.

Kubernetes also restarts the microservice

A similar behavior is observed with Kubernetes. The microservice crashes, subsequent pings fail until Kubernetes starts a new instance of our microservice.

This works because in our Kubernetes deployment we use the concept of ReplicaSets. A ReplicaSet ensures that a specified number of pod “replicas” are running at any given time. We set this number to 1 in the Fibonacci service deployment file:

apiVersion: extensions/v1beta1
kind: Deployment
name: fibonacci-deployment
replicas: 1
run: fibonacci
- name: fibonacci-container
image: ""
imagePullPolicy: Always

To better understand the automatic restart, read about ReplicaSets in Kubernetes documentation.

OpenWhisk is highly available by default

The OpenWhisk results are a bit different. You don’t see any failures except the one call to the crash endpoint. High availability is inherent to the platform at no extra cost or configuration. One call failing does not impact the next. The invoker where the call was running fails but other invokers for your actions will handle the next call. The OpenWhisk documentation has a good comparison of compute options around this topic.

Avoid downtime in Cloud Foundry and Kubernetes with multiple instances

In our current configuration, we only have one instance of our microservice.

This explains the downtime and the errors (red crosses) we observe while Cloud Foundry or Kubernetes spawns a new application container instance. To avoid this downtime, these platforms have the option to have multiple instances of our microservice.

With Cloud Foundry, navigate to the application dashboard and add a new instance. Such configuration can also be made in the manifest.yml before pushing your application.

With Kubernetes, from the service directory, scale the deployment with the command:
kubectl scale --replicas=2 -f fibonacci-deployment.yml

Now going back to the web tester, let’s clear the results, restart our ping loop and inject a crash.

The results are self-explanatory:

With two instances or replicas, if one goes down, the other one can still serve requests. A few parallel requests that were being handled by the instance crashing may fail (as observed for our Cloud Foundry application) but overall the availability of the service is increased.

We scratched the surface of high availability

With our simple service and the ping loop, we highlight one important feature of cloud platforms when it comes to ensuring the application is always running. Obviously this is only part of the answers to providing high availability. Multiple instances in the same region as we have configured them here is one step towards high availability but what if the region or datacenter becomes unavailable? Having a failover to another region will address this issue. If you want to find out more on this approach, this practice implemented by the IBM Cloud Garage Method website is a good starting point..

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter: @L2FProd.

Offering Manager - IBM Cloud

More Compute Services stories
April 29, 2019

Transforming Customer Experiences with AI Services (Part 1)

This is an experience from a recent customer engagement on transcribing customer conversations using IBM Watson AI services.

Continue reading

April 26, 2019

Updated Tutorial: Database-Driven Chatbot

The tutorial on how to build a database-driven chatbot has been updated. It's now simpler to deploy and and offers more options—Slack, Facebook Messenger, Wordpress, and more.

Continue reading

April 24, 2019

How To Use IBM Cloud Object Storage with Veeam

As you may have heard, Veeam 9.5u4 now includes an integration with IBM Cloud Object Storage. This integration can result in up to 10x savings on long-term data retention and an overall reduction in IT and primary storage costs.

Continue reading