Developing mobile apps with Node.js and MongoDB, Part 2: Hints and tips

Avoiding the pitfalls of using an alternative stack

Get implementation details on using Node.js (server-side JavaScript), rather than Java technology, to develop systems of engagement. In this article, the IBM Extreme Blue team who developed a RESTful backend application using Node.js and MongoDB shares their thought process and recommendations. For an introduction to their goals and challenges, start with Part 1 of this series.

Zach Cross, IBM Extreme Blue intern, IBM

Author photo - Zach CrossZach Cross holds a B.S. in Computer Science from the University of North Carolina (UNC) at Chapel Hill. He is currently pursuing a graduate degree in Computer Science at UNC. Zach's expertise lies in software engineering and systems, especially network protocols. He also has a strong interest in distributed systems and GPGPU programming. He enjoys biking, reading and cooking in his spare time. He completed an IBM Extreme Blue internship in August 2013.



Aga Pochec, IBM Extreme Blue intern, IBM

Author photo - Aga PochecAga Pochec has dual Masters Degrees in International Business from the Warsaw School of Economics and the prestigious Community of European Management Schools and International Companies (CEMS) Program in Europe. She is currently completing her Masters in Business Administration at Harvard Business School in Boston and is an Extreme Blue intern at IBM. Aga is an experienced project manager with over five years of business intelligence and consulting experience spanning Europe and Asia Pacific. Her expertise lies in marketing, strategy, and business development. She loves traveling and exploring new cultures and cuisines in her free time. She completed an IBM Extreme Blue internship in August 2013.



Daniel Santiago, IBM Extreme Blue intern, IBM

Author photo - Daniel SantiagoDaniel Santiago is currently a computer engineering student at the University of Puerto Rico, Mayagüez. He is passionate about programming and interested in mobile, backend, and web development. Daniel likes working with emerging technologies such as Node.js and MongoDB. In his free time, he enjoys cooking, running and assembling gaming computers. He completed an IBM Extreme Blue internship in August 2013.



Divit Singh, IBM Extreme Blue intern, IBM

Author photo - Divit SinghDivit Singh is currently an undergraduate student at Virginia Tech, pursuing a B.S. in Computer Science. Divit's expertise lies in mobile technologies as he works with PhoneGap and Sencha Touch to create cross-platform applications. He enjoys developing applications and is a published developer on the Google Play Store as well as GitHub. He also likes working with emerging web technologies such as Node.js. He spends his spare time playing sports, music, and video games. He completed an IBM Extreme Blue internship in August 2013.



19 August 2013

Also available in Japanese

As discussed in Part 1, the Extreme Blue team from the IBM lab in RTP, NC, was challenged to develop an entire backend for IBM Passes in Node.js. The team successfully built it in 40% less time than required by an alternative Java solution, while offering the same functionality. Performance tests demonstrated easier scalability and better hardware utilization of the Node.js backend (compared to Java).

About the IBM Extreme Blue team

The Extreme Blue program is IBM's internship program for talented students pursuing software development and MBA degrees. Learn more at ibm.com/extremeblue.

In this article, we share lessons we learned in the process of developing a RESTful backend application using Node.js and MongoDB. We highlight our thought process in several high-level design and implementation decisions. We'll cover these topics:

  • Leveraging NPM (Node Packaged Modules)
  • Understanding asynchronous programming
  • Structuring a large Express.js application's routing configuration
  • Application-level logging
  • Approaching data persistence and modeling
  • Solving Passbook-specific challenges including cryptographic functions not supported by Node

And we address subtle "gotchas" in hopes of expediting the learning process for those interested in Node.js and MongoDB development:

  • Don't make assumptions about objects or their properties in a dynamically typed language
  • Avoid implicit control flow
  • Understand your dependencies' behavior
Developers who are familiar with the basic concepts of Node.js, JavaScript, JSON, and RESTful web services will get the most out of this article.

 

NPM

Node Packaged Modules (NPM) is the package manager used for Node. It allows fast and simple dependency management (including the installation of third-party libraries). Its approach to organizing dependencies results in highly contained applications that reside in a single directory alongside their dependencies.

Installation

Dependency installation is as simple as typing npm install [module name] at the command line in any supporting operating system. For example, to install the Express web framework module, simply run npm install express. Assuming that the current working directory is the project root, this module and its dependencies would be installed in a folder called node_modules.

Product configuration

How are dependencies codified with NPM? A single file in the project root, package.json, details the module. It contains a single JSON object that describes the project and its dependencies. You can generate this file with the npm init command. You can add dependencies at installation time by adding the --save parameter. For example, running npm install async --save will add the most recent version of the async module to package.json. Furthermore, running npm install mocha --save-dev will add mocha as a developer dependency.

With dependencies and developer dependencies enumerated in a list (with versioning information), installation is as simple as running npm install with no additional parameters. The end result of this approach is that getting an application up and running is very simple. Essentially, a user can download an application's code base, change into its project's root directory, and run npm install. At that point, the application should be ready to run.

Selecting modules

When it comes to selecting modules from NPM, the choices are plentiful. There are many libraries that strive for the same high-level functionality. For example, there are many unit testing modules. However, as you can see on the NPM registry, there are modules that stand out in their popularity. Because this package repository is community-driven, information about usage, updates, and bugs is abundant. We recommend choosing modules that are the most popular within their category, heavily depended upon, and frequently updated, if possible.

Command line tools as NPM modules

In addition to NPM modules that may be used as libraries within an application, some include command line tools. Examples include Mocha (unit testing), JSHint (style checking), and JSDoc (code documentation generation). Modules installed with npm install are only installed to the local project directory. The underlying "binaries" are not added to the user's PATH. However, with the global (-g) option, modules are added to PATH. As a result, rather than running such CLIs with ./node_modules/[module name]/bin/[script name], a user can run [script name]. This is useful, but should be used sparingly because keeping dependencies local leads to a more lightweight and contained installation. Excluding a module and assuming that it is globally installed on end users' machines is bad practice and limits portability.

Automation with NPM

Automation beyond dependency management is easy with NPM. A project's package.json file can include a scripts key at the top level. This key corresponds to a dictionary of key-value pairs that represent a command alias with a literal command to run. Essentially, you can alias more complicated tasks (whether single commands or bash scripts) for usage with NPM. For example, you can define a "test" command to run mocha when npm run-script test is typed by using the following configuration:

Listing 1. Defining a "test" command
"scripts": {
     "test": "mocha -R spec"
}

Module licensing

NPM is a community-driven package manager with code contributed by a number of developers. When developing enterprise applications, be sure to keep track of the licenses used by each module. In addition to keeping track of the specific modules' licensing, also be aware of the licensing of dependencies of each module. From what we observed, the most prevalent license seems to be the MIT License.


Asynchronous programming

Coming from a synchronous programming language like Java, a developer may face a steep learning curve while working with Node. This might occur even if you are an expert in frontend JavaScript development and DOM manipulation. One dimension of this is getting used to conventions related to modules, the event loop, etc. Another dimension is writing asynchronous code. The latter may be less of a problem for JavaScript developers, but the complexity of asynchronous Node code probably makes asynchronous DOM manipulation and event handling seem trivial.

We discussed how Node handles dependencies and how modules are used. Let's now review the asynchronous programming in more detail.

Consider the following code snippet:

Listing 2. Defining a "test" command
1   console.log("Hello World!")
2   aFunction( aParam, function (aCallbackParam) {
3       console.log("Node is asynchronous");
4   });
5   console.log("Life is great");

The expected output of this fictitious code snippet is likely:

Hello World!
Life is great
Node is asynchronous

It is not feasible to think along the lines of a single path of execution in Node. You must be aware of the flow of execution within a particular function call (and any asynchronous functions that it calls). It's eash to recognize this asynchronous execution, but there are subtleties that must be respected. For instance, any sort of application-wide state should be minimized or used in a read-only fashion. Any number of asynchronous functions could be operating on such state at any time, with no guarantees of atomicity. This is not a huge leap from other programming languages with synchronization primitives, but there are no such primitives in JavaScript/Node.js.

What's going on with line 2 of Listing 2? Functions may be passed as parameters in JavaScript, which enables asynchronous execution. These functions-as-parameters are typically referred to as callback functions. Essentially, you call functions with a parameter that is expected to be called upon completion of the function. When calling asynchronous functions like this, execution proceeds directly to the next line of code without waiting. Utilizing this characteristic of JavaScript allows you to write asynchronous Node applications that are highly concurrent. In addition to writing functions with callbacks, you should prefer asynchronous functions to synchronous functions provided by modules. For example, you should use functions like readFile instead of readFileSync to read from the file system.

Asynchronous code is not just about callbacks. Node is event-driven. Let's examine the behavior of making an asynchronous HTTP request using Node's core http module:

Listing 3. An asynchronous HTTP request
var req = http.request("http://www.ibm.com", function(res) {
  res.on('data', function (chunk) {
    console.log('BODY: ' + chunk);
  });
});

req.on('error', function(e) {
  console.log('problem with request: ' + e.message);
});

// write data to request body
req.write('data\n');
req.write('data\n');
req.end();

This code in Listing 3 might be confusing. In the first line, we are making an HTTP request to a URL and providing a callback to act on the response. But notice that we work with chunks of the response rather than the entire object. This is achieved by registering a callback to the "data" event that logs each response chunk. Below, we register an event handler related to the request rather than the response: an error logger. The most confusing may be the last three lines, which involve actually writing the request payload. The final call to req.end() is what actually causes the request to begin sending, which leads to the possibility of events like req.error and res.data being fired in the event loop.

Fortunately, most core Node libraries and community libraries abstract away most of this event management. They tend to simply provide functions that take callbacks as a parameter, guaranteeing the asynchronous execution but not placing the concern of events on the end user.

The learning curve for this new stack is not prohibitive. You might pick it up over the course of a few small projects. The tedious points tend to sink in after a few sessions of debugging around an elusive race condition resulting from improper control flow. On a final note, the ["async"] library is a very popular solution for managing the control flow of asynchronous code.


Express.js in large applications

Express is the most widely used web application framework for Node. It is fundamentally built on the concept of middlewares. Middlewares are functions applied to an HTTP request. Express applications map requests to a series (a stack) of middlewares based on the request (path, method, etc). You can use Express to construct simple model view controller (MVC) web applications in little time.

There are many guides to using the Express web application framework in Node. However, these tend to reflect the simplest features and usage of Express. As an Express-based application grows in size, the issue of structuring can make or break maintainability. While there is no canonical or best approach to structuring a large Express application, we would like to share our recommendations and lessons based on our experience.

Express "routes" are defined given an HTTP method, a regular expression, or string for the path portion of the request, and a middleware function to apply. For example, to route all GET requests for images/[imageID] to a function that returns the particular image object, we would use something like the following code snippet:

Listing 4. Routing GET requests
var express = require("express"),
      app = express();
app.get("/images/:imageID", function (req, res, next) { … code to return image object ...});

A naïve approach to structuring an Express application might be to have a line like the above for every route. As our project grew in complexity, we refactored our route configuration to match the modularity of the rest of the application. We moved our route configurations to the modules where the middleware functions were defined. Each of these modules had an instance of Express (ultimately, a small stack of middleware) that we routed to from the main application via prefixes. For example, refer to the way we configure routes related to image resources:

Listing 5. File: app.js
…
var imagesAPI = require("./routes/images.js");
…
app.use("/images*", imagesAPI);
app.use("/users*", usersAPI);
Listing 6. File: images.js
…
function getImage(req, res, next) {…}
function postImage(req, res, next) {…}
function init(req, res, next) {
	…
	return next()
}

var api = express();
// Init middleware for API
api.all("/*", init);
// Path: /images/:imageID
api.get("/:imageID", getImage);
// Path: /images/
api.post("/", express.bodyParser(), postImage);

// Export the small Express app composed of the routes representing this API module
module.exports = api;

This example illustrates how middlewares (and middlewares composed of multiple middlewares) can be used to define Express routes in a modular way at the top level of the application. This modularity is maintainable, and the application is thus more extensible. It would be particularly easy to version an API (for example, "/v1/mobile/images") following this method. Furthermore, the capability to have "init" functions for groupings of middlewares is convenient for reducing duplicated code (adding state to requests, for example). There is probably room for improving or standardizing this approach, but it is definitely a better solution than leaving all routes at the top level of a complicated Express application. This is slightly different from using app.map because based on our knowledge, that approach does not let you apply intermediate middleware like the built-in body parser (express.bodyParser).

This approach to structuring an Express application was convenient for our case of building a RESTful web service (IBM Passes). Our application does not include templating because resources are only served in JSON. That keeps us from making any sort of authoritative statement regarding this approach, but it is hard to imagine that templating would complicate this in any way. Templating is simply an additional step in the middleware stack, which could fit in well with the modular approach. Overall, this is a simple change that can bring an Express application from something "built to work" to a solution built to maintain and extend.


Application-level logging

Six levels of logging

  • Fatal (60) – Application is going to or has become unusable
  • Error (50) – Fatal level for a particular feature, but the application is still running
  • Warn (40) – Warning that the operator should look into
  • Info (30) – Information explaining a regular function of application
  • Debug (20) – Includes more detail regarding a regular function
  • Trace (10) – Very detailed application logging

For the important function of logging, we used a popular module called Bunyan. Bunyan is a simple and fast JSON logging module for Node. It lets you filter details of log messages based on their level. Furthermore, it is easy to configure as it requires only a declaration to the module along with the level of logging needed. In addition to easy configuration, it lets you intelligently format output into readable logs. For example, you can print out correctly formatted JSON objects instead of unstructured lines containing object properties. Bunyan also offers an easy-to-use command line interface for reading logs. This can be powerful when combined with command line tools like Unix tail for real-time logging.

The numerical values for each level represent priority values. For example, if you want to see all the logs at the warning level, you can specify it by its numerical value (40) or the name itself. And you will receive the logs for the level you specify as well as the logs at every level above it. For example, if you want to see the logs at the debug level, you will see logs at that level as well as at the info, warn, error, and fatal levels.


Approaching data modeling

Early in the project, we decided to use Mongoose, a popular object modeling tool for MongoDB. This Node module is analogous to an object relational mapper you might use with a relational database, but it works with MongoDB. Mongoose provides a familiar declarative approach to data modeling (with schema conveniently represented as a native construct of the host programming language). Furthermore, it lets you define validation logic on the properties of these objects, including logic to be executed before and after actions like creation, retrieval, updating, and deletion.

Listing 7. Example of a schema and usage of a data model with Mongoose
var catSchema = mongoose.Schema({ name: String, required: true });
var Cat = mongoose.model('Cat', catSchema);
var cat1 = new Cat();
cat1.name = "Felix";
cat1.save(callback);

This approach made sense for our strictly defined pass objects. The JSON behind passes is directly interpretable as a JavaScript object, and we thought we could model it one-to-one with a Mongoose schema. Unfortunately, we realized that Mongoose had limitations regarding nesting. The pass object structure has levels of nesting, with several dimensions of validation. In concept, the validation functionality of Mongoose was a convenient solution to this. However, we could not define nested schemas. In other words, we would need schemas for every “sub-document” of the pass and we would need to reference these at the top level. This would essentially degrade to a relational database approach. This seemed contrary to the point of using MongoDB. Furthermore, it would significantly increase the number of database queries per operation relating to passes.

So we decided to use the native MongoDB driver for Node. Since we did not restrict ourselves to schema, we had the flexibility to iterate on our data modeling. We essentially inserted our complex pass objects into the database, maintaining their exact structure. Our approach was the standard "garbage-in, garbage-out" line of thinking. Because we lost Mongoose's convenient validation enforcement, we had to implement our own validation logic. It was easy to write general-purpose, declarative-style validator classes that looked similar to Mongoose's in use, but that also supported nesting (and thus recursion). With these in place, we recreated the validation logic of Apple's Passbook specification one-to-one.


Application-specific challenges and lessons learned

The Apple Passbook specification for the files representing passes is very detailed. It involves producing a zip file containing several JSON documents and images that are used to create the pass. The zip also contains a manifest file with a checksum of every file. The last requirement is that a detached PKCS#7 signature of this manifest is included to verify that it (and therefore the checksums inside it) have not been tampered with or corrupted.

Unfortunately, PKCS#7 is not supported in the native cryptography module ("crypto") of Node. This shortcoming is due to Node's immaturity. As a result, we were forced to investigate alternatives. We were unable to find any publicly available modules that were asynchronous. While we could have possibly read the specification for PKCS#7 and implemented a Node module (whether in pure JavaScript or in native code with JavaScript bindings), we determined that it was not an optimal solution because of our time constraints. We went with the path of least resistance: using OpenSSL, which is a dependency of Node. We reasoned that using a dependency of Node should not negatively affect portability. We used Node's capability to spawn children processes and simply used OpenSSL just like the command line tool.

This is an area that is obviously in need of improvement. Not only is it shaky practice to use a command line tool, but the performance implications were not positive. Our benchmarks revealed that even though we were faster at generating finalized passes, the actual code path of signing was a bottleneck within the Node application. While we were able to spawn detached processes that did not block the event loop, synchronous code was unavoidable in the long-running signing process. The slowest part of pass generation was the signature calculation. Naturally, cryptography is inherently computationally expensive. However, we still think that this process is a critical and unnecessary bottleneck. A production application would need a native implementation with proper Node bindings.

In addition to the bottleneck of signing, the compression portion of the pass generation pipeline was relatively slow. This is more or less unavoidable, as compression is another CPU-bound task. This is a clear reminder that Node is not the best for computationally intensive workloads.


Common pitfalls in Node development

Avoid implicit control flow

We discussed the importance of using asynchronous functions in Node development. However, using asynchronous functions with callbacks complicates control flow. In essence, even using asynchronous patterns, you may wind up with multiple paths of execution when a single logical path was intended. A common cause of this problem is calling asynchronous functions or their callbacks without explicit control flow. In such a situation, it might be appropriate due to implicit control flow (such as the last statement in a block). However, in a growing code base, changes might add additional instructions beyond such implicit returns, with unintended consequences.

Essentially, while implicit control flow may be a general best practice, it can lead to side effects that are hard to diagnose in Node applications. We recommend always making control flow explicit as shown in Listings 2 and 3:

Listing 8. Use return statements with callbacks and asynchronous functions that should be the last statement in their code block
function (callback) {
          if (someCondition) {
                    return someAsyncFunction(param1, param2, function (err, results) {
	                     if (err) {
		                     log.error(err);
	                     }
	                     return callback(err);
                     });
          }
          return callback();
}
Listing 9. Avoid calling a callback multiple times by using a single callback at the end of an asynchronous method
function (callback) {
	return someAsyncFunction(param, function (err, results) {
		if (err) {
			log.error(err);
                                // Calling callback in this branch is a Bad Idea
		}
		return callback(err, results);
           });
}

Don't make assumptions about objects or their properties in a dynamically typed language.

This sounds common-sensical, but is easy to skip in practice. If you are accustomed to statically typed languages, it is especially important to recognize the shift of responsibility from the compiler to the developer. In the context of web application development, you must assume "garbage in, garbage out." In this spirit, a generally useful solution is to check for the negation of your assumptions (or expectations) and return early if they are not met. This is especially true for JavaScript objects and their properties. There is even more nuance when you operate on nested properties, in that you must check that every property (or key) referenced exists.

Listing 10. Checking for the negation of your assumptions
function (someParam) {

          if (!assumptions about param) {
                     // Handle this unexpected value
          }

         // Proceed with assumptions tested
}
Listing 11. Nuances of nested properties
function (someParam) {

         if (!someParam ||
             !someParam.someProperty ||
             !someParam.someProperty.nestedProp) {
	         // Handle unexpected value
         }

        // Proceed with assumptions
}

Understand your dependencies' behavior

This statement tends to go without saying, but is important in the context of Node development because of the importance of using asynchronous code. Using a library without understanding its guarantees, or its actual implementation if those guarantees are unclear, may introduce bottlenecks in your application.

Also consider the side effects left by dependencies. The ideal approach to writing an Express-based Node application is to write fully functional stateless) middleware. However, even some core components of Express have side effects. For instance, the bodyParser middleware, which automatically parses the request body (such as converting the body from a JSON string to a native JavaScript object) has side effects: temporary files that must be deleted by the user. After weeks of load testing our application, we had a mysterious abundance of temporary files bogging down the operating system. We resolved this by deleting temporary files left by the bodyParser.


In closing

Our goal with this article was not to authoritatively prescribe approaches to Node development. Instead we chose share some of the challenges we faced and the lessons we learned in the process of solving them. We do believe there is value to the approaches we have highlighted. We also want to emphasize those philosophical points that extend beyond the scope of our particular application. More generally, we believe in the importance of constantly challenging the status quo. Only through continual evaluation can we be certain that we are approaching problems in the most optimal way.


Acknowledgments

Special thanks to our mentors, whose wisdom and experience guided us throughout our internship: Joshua A. Alger, Andy Dingsor, Curtis M. Gearhart, Christopher Hambridge, and Todd Kaplinger. A big thank you to Ross Grady, RTP Lab Manager for IBM Extreme Blue, whose patience and push for constant improvement ensured our success. We would also like to express our gratitude to Jeff Jagoda for his Node.js experience and invaluable feedback.

Resources

Learn

Get products and technologies

  • Download IBM product trials: Get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

  • Get involved in the developerWorks community: Connect with peers and experts as you explore the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Mobile development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Mobile development, Java technology
ArticleID=941249
ArticleTitle=Developing mobile apps with Node.js and MongoDB, Part 2: Hints and tips
publish-date=08192013