Apps may be composed of many microservices—save your admins headaches by enabling proper distributed tracing
In a cloud-native application, multiple microservices are collaborating to deliver the expected functionality. If you have hundreds of services, how do you debug an individual request as it travels through a distributed system? For Java enterprise developers, the Eclipse MicroProfile OpenTracing specification makes it easier.
In our reference implementation, our storefront application is composed of several independent microservice applications, as depicted below:
To assure transactions are delivered reliably and thus keep the database data consistent, the microservices shown above coordinate updates through the open source message queue, RabbitMQ. For the sake of demonstrating the need for microservices-specific tracing, let’s assume RabbitMQ has a problem and has stopped working. If the standard logging was only going to the server hosting the service, finding the source of the problem could require the admin to traipse through logs server-by-server, adding delay to problem resolution:
With distributed tracing, it’s much easier to find where the problem lies and resolve it (e.g., by restarting the server, allocating more disk space, etc.).
With a proper distributed tracing system in place, you have important clues to help you debug the problematic services. Fortunately for Java enterprise developers, you can easily enable it in your MicroProfile application without any explicit code, thanks to MicroProfile OpenTracing.
This blog series is based on my team’s experience with migrating from Spring Boot-based microservices to MicroProfile, an optimized enterprise Java programming model for a microservices architecture. Both the projects are hosted on GitHub. You can access the Spring Boot version of our reference application here and MicroProfile version of our reference application here.
Monitoring your microservices app in Spring Boot
A commonly used distributed tracing tool for Spring Cloud is Spring Cloud Sleuth. Adding logging traces helps us to debug the transaction flows. Using the traceIDs and spanIDs provided by the Spring Cloud Sleuth, the log statements help identify the individual traces for the existing microservices and allow developers and admins to troubleshoot the cause.
Optionally, you can integrate Zipkin with Spring Cloud Sleuth to add more information to the traces. Simply add Zipkin as a dependency, enable it in your application using the @EnableZipkinServer annotation, and provide the required configurations in the .properties file. It is pretty simple to integrate it and get a nice insight into the transactions and communication happening between different microservices in our application.
In order to enable custom tracing in your application, we need to implement the OpenTracing Tracer interface. In WebSphere Liberty, we have the Zipkin server implementation defined as a user feature. To install and download this feature, add the Maven dependency in your pom.xml as shown below:
<!-- Plugin to install opentracing zipkin tracer -->
Once the MicroProfile OpenTracing mpOpenTracing-1.0 feature is enabled, by default, the distributed tracing is enabled for all the JAX-RS methods in our application. You can also further customize the traces using the @Traced annotation and an ActiveSpan object to retrieve messages.
Now, let’s consider the Catalog service in our reference implementation. This is a sample snippet that gives you an idea of how we defined custom traces in our sample application:
.buildSpan("Grabbing messages from Messaging System")
// Return a default fallback list
Scroll to view full table
Defining custom traces is very easy. @Traced includes an option to disable the default tracing. You can set the value parameter to false to disable it. With this, you can also name your spans using the operationName parameter. You can also define customized traces using a custom tracer object by injecting it with the @Inject annotation.
Now that you are all set to use distributed tracing in your application, what does the result look like? Let’s take a look at the Zipkin traces of an actual logging of a problem in our storefront app. In this example, I’ve intentionally terminated the RabbitMQ service. When this happens, the updated stock will not be passed to the inventory service and thus the MySQL database will not be synchronized. A clear indicator of a problem is the trace depth of “3” instead of the expected “6” from a completed transaction sequence:
If the admin doesn’t have logging indicating the source of the problem (i.e., RabbitMQ is down), it’s not easy to identify the source of the problem because from the user’s point of view, the web interface is (almost) working.
Below is the Zipkin trace details for the sequence above; it includes a log of the HTTP 500 error interrupting the workflow of the application:
Compare this to a normal operational logging of the same transaction:
You can speed problem resolution with proper distributed tracing
In a large organization, your applications may be composed of hundreds of microservices—save your SREs headaches by enabling proper distributed tracing. It will speed problem resolution by simplifying the effort to isolate a failure to a specific microservices application and providing important context surrounding the problem. MicroProfile OpenTracing does (most of) the hard work for you.
This blog excerpts the code and specifications for distributed tracing in our sample application using MicroProfile OpenTracing. You can see all the code available on GitHub. The microservices in our simple storefront application can be run individually using Maven—you can import them and run them as-is locally. You can also run them on IBM Cloud and IBM Cloud Private.