Refactoring to microservices, Part 2: What to consider when moving your data

When you should and should not move your data from traditional middleware architectures to microservices

Video: Refactoring data for microservices

In Part 1 of this series, I introduced some key reasons and considerations for refactoring your code to a microservices-based approach. I ended Part 1 with the question of what to do with your data. In large-scale enterprise applications, data is often the thorniest issue and well worth an in-depth treatment.

First, consider what data you're actually storing

When you look at the structure of your application, choosing the best approach for managing your data often comes down to this question: "What are you actually storing in your database?"

Since the early 1990s, I've helped companies build, maintain, and often torture object-relational mapping (ORM) frameworks. In several cases, the data a company was storing actually did not map well to a relational data model. In those cases, we found ourselves having to either "twist" the relational model to fit, or more likely, jump through hoops in the programs in order to force the code to match the relational data store.

Now that we've moved on to an era of polyglot persistence choices, we can re-examine some of those earlier decisions and make better ones. In particular, let's look at four cases where the relational model was not the best option, and then consider one case where the relational model was the best option (and refactoring the data would not have been the right option).

Case 1: Blob storage

Many times I've looked through the persistence code of enterprise systems only to find out, to my surprise, that they've actually been storing binary representations of serialized Java™ objects in their relational database. These objects are usually stored in "Binary Large Object" ("Blob") columns, often because the team threw their hands up at the complexity of trying to map their Java objects into relational tables and columns. Blob storage, of course, has the disadvantage of never being queried on a column basis, being slow, and being sensitive to changes in the structure of the Java objects themselves: Older data may not be readable if the object structure changes significantly.

So if your application (or more likely, a subset of your application) is using Blob storage in a relational database, that's a pretty good sign that you might be better off using a key-value store like Memcached or Redis.

On the other hand, if your application is storing just a structured Java object (perhaps deeply structured, but not natively binary), then you may be better off using a document store like Cloudant or MongoDB. What's more, by putting a little bit of effort into how you store your documents (for instance, both Cloudant and MongoDB databases are JSON document stores, and JSON parsers are widely available and easy to customize), you can easily handle any "schema drift" issues much more easily than you could with a Blob store approach, which is much more opaque in its storage mechanism.

Case 2: Flat objects and the Active Record pattern

Years ago when Martin Fowler was writing Patterns of Enterprise Application Architecture, we had an active correspondence and several lively review meetings about many of the patterns. One pattern, in particular, always struck me as an odd duck: the Active Record pattern. It was odd because I personally had never encountered it, although Martin assured me that it was common in the Microsoft .NET programming community. But what really struck me about it, especially when saw a few Java implementations of it using open source technologies like iBatis, was that it seemed like the best case for using it was when the objects were, well, flat.

If the object you are mapping to a database is completely and totally flat — with no relationships to other objects (with the limited exception perhaps of nested objects) — then you probably aren't taking advantage of the full capabilities of the relational model. In fact, you are much more likely to be storing documents, such as electronic versions of paper documents like customer satisfaction surveys, problem tickets, etc. In that case, a document database like Cloudant or MongoDB is probably a better match for you. Splitting your code out into services that work on that type of database will result in much simpler code that is easier to maintain.

Case 3: Reference data

Another common pattern that I've seen in object-relational mapping systems is the combination of "reference data in a table sucked into an in-memory cache." Reference data consists of things that are not often (or ever) updated, but that are continually read. A good example is the list of U.S. states or Canadian Provinces, but other examples include medical codes and standard parts lists. This kind of data is often used to populate drop-downs in GUIs. The common pattern is to start by reading the list from a table (usually a flat two- or three-column table) each time it is needed. However, the performance of this pattern is prohibitive, so instead, the system reads it into an in-memory cache like Ehcache at startup.

Whenever you have this problem, it is begging to be refactored into a simpler, faster caching mechanism. Again, this is a situation where Memcached, the Data Cache service on Bluemix, or Redis would be perfectly suited. If the reference data is independent of the rest of your database structure (and it often is, or is at most loosely coupled), then splitting the data and its services away from the rest of your system can help.

Case 4: The query from Hell

In one customer system that I worked on, we were doing complex financial modeling that required very complicated queries (on the order of six- or seven-way joins) just to create the objects that the program was manipulating. Updates were even more complicated. We had to combine several different levels of optimistic locking checks just to find out what had changed and if what was in the database still matched the structure we had created and manipulated.

In retrospect, what we were doing would have been more naturally modeled as a graph. Situations like this (in this case, we were modeling tranches of funds, each made up of different types of equities and debt obligations, each of those priced in different currencies and due at different times, with different rules surrounding each valuation) almost beg for a data structure that will allow you to easily do what you really want to do: traverse up and down the graph and move parts of the graph around at will.

This is where a solution like Neo4J or Apache Tinkerpop (which is behind the IBM Graph database service on Bluemix) would be a good approach. By modeling the solution directly as a graph, we could have avoided a lot of complicated Java and SQL code, and at the same time, probably improved our runtime performance significantly.

Case 5: Your data model is working

Now that we've been through four cases of data "dissonance" to the relational model, I want to show you a case where you would not want to try to split off your data from the main enterprise data store.

One of my customers had the following problem: They were a governmental agency that existed to serve one constituency, the citizen. But they had hundreds of representations of the citizen in their enterprise systems. It was impossible to find out the answer to simple questions like "Is this person who asked this question the same person who filled out this form?" It frustrated the government, it frustrated the citizens, and it frustrated the IT staff who were caught in the middle. They knew they had to try something different.

You see, whenever I begin talking to customers about microservices, they often tell me "Oh, we can't do that. We've invested too much in our enterprise data modeling." In a sense they're right: You don't want to go into your enterprise data model and pull things out that are deeply connected with other concepts. That's a fool's errand. If your data model is working and isn't giving you trouble, there's no reason at all to change it.

The trend toward master data management

Not all data models are equally trouble-free. In fact, there's another trend that is moving in a similar direction. This simultaneous trend is not coming from developers, but rather from data modelers and DBAs. It's the trend toward master data management, or MDM. In a sense, it's the same idea: the notion that you shouldn't have multiple views of an important concept like a "customer" in an enterprise. MDM tools help you combine all these different views of the concepts and thus eliminate the unnecessary duplication.

The difference is that the MDM solutions are data-centric rather than API-centric. However, a common outcome of applying an MDM solution is to create a centralized set of APIs to represent the access to those common concepts through the MDM tool. There's no reason why you can't treat those APIs as the foundation of a microservices-based approach. Just know that the implementation isn't done in the same tool. But in the end, the take-away of the microservices architecture is that it's the API that matters and not the implementation.


We've now gone through four cases where a specific "code smell" leads you to believe that, at the heart, you have a data modeling problem that would be solved with a microservices-based refactoring of your data. If you find you have one or more of these particular coding issues, then you may be better off splitting away from your existing enterprise data store and re-envisioning it with a different type of data store.

In the next and final article in this series, we'll take what we've learned in these first two articles and show how you can follow a step-by-step procedure for evolving your existing applications from monoliths to microservices.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=Cloud computing, Java development
ArticleTitle=Refactoring to microservices, Part 2: What to consider when moving your data