Mobile and cloud development have dramatically changed the way companies operate and connect with customers. Enterprises increasingly seek back-end solutions that are easy to scale, ready for cloud and mobile deployments, and rapidly produced.
The Extreme Blue team from the IBM lab in RTP, NC, was challenged to develop an entire backend in Node.js for IBM Passes. The team successfully built it in 40% less time than required by an alternative Java solution, while offering the same functionality. The team also performed comprehensive performance tests that demonstrated easier scalability and better hardware utilization of the Node.js backend (compared to Java).
This article introduces the key features and advantages of Node.js used with MongoDB. You will learn about the types of solutions best suited for Node.js production as well as the advantages and disadvantages of Node production.
IBM Passes is a solution built around Apple Passbook, which was released by Apple in Fall 2012. Apple Passbook is an application that allows users to conveniently store in one location boarding passes, movie tickets, retail coupons, loyalty cards, and other materials for customer engagement (referred to as passes). Apple released the application together with specifications for passes. However, the company did not provide a way for businesses to create and manage these passes. IBM Passes addresses this problem. It features a RESTful web service that provides an API that other applications can consume. This service interacts with Apple's Push Notification Service via IBM PushWorks and handles analytics using IBM WebSphere® Analytics Platform.
Figure 1. IBM Pass builder interface
The current implementation of the IBM Passes backend uses Java servlets to access DB2. With this solution, JSON documents are stored as strings in DB2's relational database. This requires serialization and deserialization for all transactions that involve these JSON documents, and leads to additional overhead. Furthermore, querying properties of these documents become more complex as nesting grows deeper.
An alternative to Java: a Node.js and MongoDB technology stack
Our project was focused on an alternative implementation of the RESTful backend to IBM Passes solution. We used a completely different technology stack: Node.js and MongoDB. This enabled us to replace the Java implementation while still supporting the frontend client. Our implementation provides the same set of APIs as the Java implementation. As a result, it is interchangeable as a backend to the Pass builder client.
In addition to recreating the functionality of the Java version, we conducted extensive performance testing on the two applications. Furthermore, the technology stack we used allowed us to implement the core functionalities in three weeks' development time as compared to the Java team's five weeks (40% reduction in time to value). We chose to use this technology stack for several reasons:
- Apple Passbook specification (strictly JSON)
- RESTful web service with JSON being primary data exchange format
Why we chose Node.js
Table 1. Key technology concepts used for IBM Passes
|Pass builder frontend||Passes backend|
Node contains features that support the workloads of the modern web: small and frequent structured data exchanges. This makes Node perfect for systems of engagement. In other words, it is an ideal candidate for applications that involve exchanges of information rather than computational processes. Technically speaking, this means choosing Node for I/O-bound applications rather than CPU-bound ones. IBM Passes is a perfect example of such an application. The only computationally expensive operations are cryptographic signing and compression (zipping). The rest of the application revolves around the exchange of JSON data and image resources.
Node differs from most web application runtimes in the manner that it handles concurrency. Rather than using threading to accomplish concurrency, Node relies on an event-driven loop running in a single process. Node supports an asynchronous (non-blocking) model, whereas technologies such as Java support a synchronous (blocking) model. To clarify the key differences between these two concepts, consider the following restaurant metaphor:
Think of a web application as a restaurant and its incoming requests as customers making orders. An asynchronous application would allow a single waiter to serve multiple customers at once. The waiter would serve the customers as orders are completed. In the down time, the same waiter would handle new orders from customers.
A synchronous "restaurant" would dedicate a single waiter to a single customer from the start of its order to completion. As a result, the restaurant needs as many waiters as customers. Furthermore, there would be times when waiters would wait while other work could be done.
The difference illustrated by this metaphor is simply the mechanism used to achieve concurrency. Java accomplishes it with threads, while Node uses an event loop. A result is that Java must suffer additional overhead in context switching between threads. In other words, more threads means more time spent switching contexts and less time doing work on incoming requests. This makes scaling a Java application more expensive.
Node supports asynchronous I/O, based on events such as the completion of a file read operation or a database query. These events are handled with callback functions, which enable the application to proceed while I/O is being performed. The event loop handles these events. For more on the event loop, refer to "Node.js for Java developers," a developerWorks article by Andrew Glover.
Why we chose MongoDB
MongoDB was a great solution for our problem domain because, as mentioned, Apple's Passbook specification is highly reliant on JSON. In fact, the passes themselves are described as JSON objects. As a result, storing them as-is with MongoDB is a better approach than reducing the semi-complex JSON structure to multiple relations in a relational database (such as DB2).
We wanted to compare our Node implementation of the IBM Passes to its Java version. We decided to benchmark the two applications' end to end performance, with the metric of interest ultimately being the response times of HTTP requests made to the various API endpoints. The main purpose for this approach was to capture the performance of the entire stack behind each solution. By accounting for variables such as operating system and network, we achieved objective comparison of the two applications. Moreover, we believed that the end-to-end response times were a better reflection of an actual user experience than more granular analysis such as the timing of database queries or timing individual functions.
For the purpose of benchmarking both applications, we wrote a Node wrapper around Apache Benchmark, an HTTP benchmarking command line tool. From there, we created a benchmarking framework that ran against generic test definitions and generated results. Apache Benchmark gave us the mean response time, its standard deviation, the number of requests completed per second, the number of unsuccessful (non-200) responses, the number of server errors, and the number of timeouts. The parameters were concurrency level, number of requests, and request configuration. We added test definitions and setup functions into this framework and proceeded to benchmark our application.
We addressed the tooling side of our benchmarking process. The test environment is of equal, if not greater, importance due to the intended scientific approach to this process. We decided to use OpenStack to mimic a cloud environment. We configured identical instances for the Node application, the Java application, and the load generation application. Each instance was given a 2.4 GHz quad core (virtual) processor, 8 GB (virtual) RAM, and 80GB of storage. Figure 2 shows the topology of the OpenStack deployment used.
Figure 2. OpenStack deployment topology
These instances were configured with the same base image (Ubuntu Server 12.04 64-bit). Furthermore, only software necessary for running and monitoring the applications was installed. When it comes to runtime configuration, we used the default configuration for Node (the same that Chrome V8 Engine relies on).
We consulted the Java team regarding runtime configuration of the JRE and used default configurations for the servlet container and DB2. We acknowledge that further configuration could potentially improve the performance of the back end. However, that is also true for the Node runtime and misses the point. Our purpose is to prove that it meets performance criteria and can be scaled horizontally rather than vertically, in addition to facilitating agile development.
Benchmarking jobs were queued manually via the load generator instance's web interface, or as a result of changes to the code base. Benchmarking proceeded within the OpenStack environment. We defined a benchmarking job as the execution of all test suites (such as groups of API endpoint runs) given the parameters of branch name, concurrency level, and total requests to make. Within a benchmark, a fixed set of API endpoints were tested by the benchmarking tool. These tests were run sequentially (that is, without overlapping traffic). Furthermore, benchmarks (groups of tests) were run sequentially for the same reason.
Our test case demonstrated that the Node and MongoDB implementation had faster response times when the concurrency level exceeded 50. We can attribute this to the I/O-bound nature of the Passes application. Below the concurrency level of 50, the Java application was faster because the application degraded from I/O to CPU-bound under the reduced load. With higher concurrency levels, the single Node instance clearly outperformed the Java/DB2 instance for every API call.
Figure 3. Performance outcomes (counting “faster” endpoints per application)
To compare hardware utilization of Node vs. Java implementation, we needed to profile the instances that the two applications were running on. For this purpose, we used two popular statistic collection applications: CollectD and StatsD. Installing CollectD on our instances introduced negligible overhead and made CPU, memory, and disk usage transparent. We used StatsD on the monitoring instance to collect these statistics. Graphite was used on the same instance for viewing time series data as graphs presented in a dashboard format.
We were able to correlate each benchmark with time series hardware utilization data. While we have average usage values for memory and CPU usage during the benchmarking of each endpoint, we found it more valuable to show general trends. Figure 4 shows these metrics over the time period of a long-running benchmark. The Node application used a lower percentage of the CPU and less memory on average. The low CPU usage might be interpreted as a result of the I/O-bound nature of the application.
Figure 4. Performance results of hardware utilization benchmarking
While Node might not be a panacea for all the challenges of the modern web, it is perfect for demands of systems of engagement. Node was designed specifically for I/O-bound applications and frequent exchanges of information. Its lightweight runtime enables agile development and immediate iteration. As demonstrated in our test case of IBM Passes, Node led to 40% reduction in time to value while allowing us to double the traffic served with half the servers (in comparison to Java implementation). Thus, Node was able to deliver the same functionalities as Java, and it outperformed Java in terms of rapid development and better hardware utilization.
Special thanks to our mentors, whose wisdom and experience guided us throughout our internship: Joshua A. Alger, Andy Dingsor, Curtis M. Gearhart, Christopher Hambridge, and Todd Kaplinger. A big thank you to Ross Grady, RTP Lab Manager for IBM Extreme Blue, whose patience and push for constant improvement ensured our success. We would also like to express our gratitude to Jeff Jagoda for his Node.js experience and invaluable feedback.
- IBM Pass builder comprehensive introduction and video: Learn more about the IBM Pass builder interface.
- "Use Node.js as a full cloud environment development stack" (developerWorks): Embrace the concurrency model using asynchronous I/O via callbacks, and build a chat server.
- "Node.js for Java developers" (developerWorks): Lightweight, event driven I/O for your web apps.
- "Node.js beyond the basics" (developerWorks): Rapid web application development for the cloud.
- "Will Node.js be as popular as Ruby on Rails?" (developerWorks): Build event-driven dynamic webservers with Node.
- "Explore node.js" (developerWorks): Build more scalable web applications.
- At the developerWorks Mobile development site, try the latest tools and technologies for mobile application developers in the comprehensive IBM MobileFirst product portfolio. Explore our free software downloads and cloud trials, sample code, expert how-to advice, videos, training, and discussion.
- developerWorks Java technology site: Find hundreds of articles about every aspect of Java programming.
Get products and technologies
- Download IBM product trials: Get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- Get involved in the developerWorks community: Connect with peers and experts as you explore the developer-driven blogs, forums, groups, and wikis.