Skip to main content

The Geronimo renegade: The push for clustering, Part 2

Highlights of an interview with Jules Gosnell

Neal Sanche (neal@nsdev.org), Java Developer, Pure Technologies
Author photo
Neal Sanche is a Java developer recently beached in the Microsoft® .NET world and fighting for any ties back to his old, comfortable roots. His experience includes development of several commercial J2EE applications, as well as several stand-alone Java applications. In his spare time, he writes music, takes photographs, and writes technical articles. Visit his Web site to see several examples of each. You can reach the author at neal@nsdev.org.

Summary:  With all the buzz that "The push for clustering" (developerWorks, December 2005) created, I'm revisiting this topic with a detailed interview with one of Apache Geronimo's clustering experts. You'll find out exactly what clustering is and get a look, in detail, at the WADI Application Distributed Infrastructure (WADI) project, one of the projects being combined with Geronimo to provide its clustering functionality.

View more content in this series

Date:  21 Mar 2006
Level:  Introductory
Activity:  821 views

History of clustering in Geronimo

On a snowy winter morning, I rode the bus to my downtown Calgary office, mindful of the Voice-over-IP call I was intending to make that day with Julian (Jules) Gosnell, creator of the WADI project on Codehaus, cofounder of the Apache Geronimo application server project, and one of a group of committers focused on clustering solutions for this project. Initially I was unsure how to go about the interview, but once it was underway, it was clear that Jules was passionate about the topic -- the conversation developed quickly and lasted for over an hour. This article brings to light specific details about the Geronimo clustering effort that were missed, or alluded to but not explored fully, in the first article on this topic.

So, while snow was falling outside my office, and while snow was falling on the garden outside a log cabin office near a small house in the U.K. 45 minutes south of London, Jules and I talked about the history of clustering in Geronimo -- from his perspective.

The road to WADI

I started by asking Jules how he first got involved with the Java™ Enterprise Edition space. He mentioned he was working for a bank in London, writing Java servlets to replace an older system written in C++. It was through this work that he became aware of the high-quality development efforts of Greg Wilkins and Jan Bartel, creators and developers of the lightweight HTTP server and servlet runner called Jetty. His interest in Jetty led him to participate in some JBoss application server development work to integrate Jetty. After this, he worked to implement a clustered Web session implementation for Jetty. Shortly after the Geronimo project began, back in August 2003, talk began about implementation of the clustering functionality with the project.

I asked Jules why WADI was developed in the Codehaus repository instead of inside the Geronimo project source tree. He said that initially, James Strachan suggested building a general clustering API (ActiveCluster) to support the cluster abstraction for Geronimo or anything else that needed such functionality. Codehaus was chosen as license-neutral ground. ActiveCluster was intended to connect with implementations that were both Lesser General Public License (LGPL) and Apache Software Foundation (ASF) licensed, and they wanted to involve both communities in the project.

Jules explained that since his interest and experience was mainly in Web sessions, his focus soon shifted from the fundamental clustering problems of ActiveCluster to those more specific to his area of interest, resulting in the creation of the WADI project at Codehaus in April 2004 (originally Web Application Distributed Infrastructure, now WADI Application Distributed Infrastructure, because WADI has become a more generalized session management framework. Note the traditional recursive acronym.). James continued with ActiveCluster, and his interest in clustered caching led him to create the ActiveSpace project at Codehaus, Jules said, which, like WADI, makes use of the ActiveCluster API.

Obviously there are a few forces at work within the application server space that we need to consider to understand why these projects have been written. What problems are they intended to solve?


The clustering strategy

Clustering, in short, is a strategy meant to solve a number of fundamental problems. We will introduce two of the most significant of these and show how they are resolved in the Geronimo application server.

Availability and scalability problems

The first problem is formally called the availability problem. As enterprise applications become more popular, any downtime becomes very noticeable. So you can increase the availability of a system by adding more computers, designed to take over if something goes wrong. This is called failover -- the mechanism by which high availability is achieved. The second problem, formally called the scalability problem, arises as more people use a system. Some computers can become overloaded, so if you add more computers and combine load balancing into the solution, each computer takes a fraction of the load, leading to better response times overall.

Session state

For a user to be able to interact with an application, often information about what the user has been doing to the application is kept on the server. This information is called session state and is an important ingredient of a useful application. For example, say you have logged in to an online store, added a few items to your shopping cart, and are about to check out and purchase the items. Without session state, this operation would be difficult to handle. Jules points out that session state makes it possible for a user to have a conversation with a remote application, like a Web application. Without it, every request you make would have no relationship to the last. He said, in effect, it would be like talking to someone who forgets everything about the conversation after making each reply.

Session affinity

So, session state is very important to preserve in an application. Often, in a clustered situation, session state is kept on one computer in the cluster, and the load balancer always directs requests from a user to the machine that is storing his or her session. Traditionally, this is known as session affinity. But this doesn't solve the availability problem at all. If the computer storing the session crashes, the shopper is going to have a bad experience. Many open source solutions broadcast a backup copy of the session to every other node in the cluster each time it changes. If any of the machines goes down, the session is still safe. You can find one of those backup copies and reactivate it.

This theory works for small deployments (three to four nodes), but Jules tells me that this approach doesn't scale. Each extra session requires extra memory for its backups on every node in the cluster, so you quickly run out of memory. Furthermore, every change to any session also creates work for every node, because it has to process an incoming update message and make its backup copy of the session consistent with the changes. As you increase the number of nodes, sessions, and users, each node spends an increasing amount of effort maintaining backup copies, leaving less time to service user requests. This causes degraded performance as increasing amounts of time and bandwidth are absorbed in maintaining consistency of session backups across the cluster rather than processing user requests.

Static partitioning

Some open source solutions attempt to compensate for this scaling issue by employing a strategy called static partitioning. This is where you break your cluster into smaller ones to avoid hitting these memory and bandwidth limits. However, Jules points out that static partitioning is not only more complex to configure, but it also impacts availability. You end up managing many smaller clusters, and smaller clusters have lower availability, because availability is a function of the size of the cluster.

Dynamic partitioning

WADI began, in a way, because Jules saw the problems associated with static partitioning first-hand and became frustrated that he could not offer his clients a better solution using open source components. He also realized that this was his way of contributing to Geronimo. As the WADI project despot (which is the term used for the main project overseer) at Codehaus, he was able to work towards solving the problems of session state distribution and a better solution to partitioning a cluster called dynamic partitioning.


The WADI solution

Jules had a problem to solve, and he spent the better part of two years working on it, starting in April of 2004 when he imported the initial WADI source code into Codehaus. In those two years, the source base for WADI grew from nothing to around 44,000 lines of code. I had originally looked at the Codehaus repository for WADI to understand a little about the history of the project, but of course this didn't give me a valid picture of the people Jules credits for helping it come to fruition.

He told me that the design was influenced -- right from the beginning -- first by James Strachan and Hiram Chirino, because WADI makes use of ActiveCluster and ActiveMQ, and second by Greg Wilkins and Jan Bartel who wrote Jetty, which WADI interfaces with. A friend of Greg's, Simone Bordet (MX4J and LiveTribe), became interested in the project early on. Gianni Scenini, who worked with Jules for a client, spent a lot of time with Jules talking about how WADI should work. At the end of 2005, Jules was joined by Jeff Genender, Bruce Snyder, James Goodwill, and Bill Dudney. Around this time Gianny Damour started working with WADI to integrate OpenEJB to provide it with a stateful session bean clustering solution.

Distributed hash-map

Jules explained to me that in WADI the session information is distributed among the nodes in the cluster and the location of each session is stored in an index. This index is broken up and also shared among the nodes as a kind of distributed database called a distributed hash-map so that each node has access to the location of each session, which is a much smaller piece of information, therefore reducing the overall memory load.

Rather than each session having a backup copy on every node, the application deployer (the piece of the application server that is responsible for getting an application up and running) decides on their required safety level and configures a fixed number of backups (typically one or two). No matter how large the cluster grows, no more than this number of backups of a session is made. This avoids the memory limitations mentioned earlier. Also, because fewer session backups need to be updated when the session is changed, WADI avoids issues with the degradation of service discussed earlier.

Under normal operation, with session affinity enabled on the load balancer, requests are always routed to the node that has the relevant session, and the rendering of the request involves only this node and no further overhead associated with clustering. However, a number of situations exist in which session affinity cannot be maintained, such as when machines crash unexpectedly or even when they are shut down cleanly. In these situations, WADI uses its map of session locations to either relocate the incoming request to the session's new location or arrange the session's migration from this new location to the node on which the request is arriving.

During the operation of the WADI-based cluster, computers can be added or removed at any time, dynamically. Computers added to the cluster add their memory and resources to the pool, take on responsibility for parts of the overall session location map, and form replication partnerships with other cluster members, causing them to be sent replicated copies of session information. These replicated sessions are insurance measures against machines in the cluster crashing unexpectedly. Machines can also undergo a controlled shutdown. When this happens, sessions are evacuated to other surviving computers. In case memory is low, unused sessions are written out to permanent storage so they can be retrieved if the user comes back to the computer before the session expires.

Storing replicated copies

To choose where replicated copies are stored, a pluggable election mechanism is employed to choose replication partners: far-away machines, lightly loaded machines, or machines with faster network connections to the machine holding the session to be replicated might be preferred. Some heuristics could go a long way toward choosing better replication partners, such as adding a penalty factor to machines on the same rack in the server room, so that if a whole rack goes down due to a power failure, the session would have been replicated to a machine on a different rack and would survive.

Session data has different characteristics from business data. Business data tends to live in databases, and it can be read from the database when needed. Session data is transient, usually short lived, and needs to be accessed quickly during the rendering of pages. Session data must be replicated quickly as a form of backup. Jules tells me that Gianny Damour is working on this replication problem.

Session equality

Geronimo has a number of different sessions -- Web application sessions, Enterprise JavaBeans (EJB) stateful session beans, Web service sessions, and more. Jules is working with the folks at the Geronimo project to converge some of the different approaches with WADI, with the idea that Geronimo will treat all sessions in exactly the same way. It will also group together related sessions so that, for example, sessions in the Web and EJB tiers that are associated with the same user can be kept in the same place within the cluster, and when a move of the session data is required, the related sessions will move together.

Keeping caches consistent

Another area related to clustering in an application server is caching. Entity beans are an abstraction of data that is usually stored in a database. To improve application performance, the data in the database is cached in the application server. Changes to the data are written back to the database, and any cached versions of the data are either thrown away (so they'll be retrieved on the next request), or the cached data is modified as well to be consistent. This works fine on a single computer, but when you move to a clustered model, it's important that cached data remain consistent across all nodes. This is commonly referred to as the cache consistency problem.

So, the picture of the Geronimo clustering solution looks much like the picture in Figure 1.


Figure 1. The Geronimo clustering picture
Geronimo clustering picture

To explain this diagram, start at the bottom where the information sources lie. Information can either come from the database or from the other nodes in the cluster that are participating in peer-to-peer information sharing. Database information is mapped using a JCache implementation, cached in an ActiveSpace clustered cache and distributed to the application through an object relational mapping. On the other side of the picture, ActiveCluster is used by WADI to manage all of the various user-session data that exists in the application server cluster, including Web sessions, stateful session beans, Web service sessions, and so on. ActiveSpace also uses ActiveCluster to preserve the consistency of caches holding, such as entity bean, deployment, and JNDI directory information.

So, as you see, WADI is part of the solution, not the whole solution. Other projects are being used to fulfill the requirements for clustering. It also seems that Codehaus played an important role in nurturing many of these projects at the early stages, allowing the people that cared deeply for them to have a place to develop in an innovative and friendly environment. Now, after many years of development, these projects are finding their way back into the fold, taking their various places within the Geronimo project.


Summary

Geronimo's clustering solution is still being actively developed, but I hope that through this article you see some of the problems that the Geronimo developers have endeavored to solve. Let's imagine we're transported to the future where Geronimo is completed so I can list the clustering functionality of Geronimo:

  • Highly scalable -- Geronimo has been architected to handle session state replication in novel ways to reduce the memory and network bandwidth overhead of clustering, leaving more resources available for users of the clustered application.
  • Highly available -- Through the dynamic partitioning features of Geronimo, larger clusters can be built, and the larger the cluster, the higher the availability potential.
  • Highly manageable -- Geronimo makes it easier to configure your cluster. Simply add more nodes, and all of the important work will be done for you automatically.

Finally, Jules would like me to point out that open source software is a collaborative effort of whole communities of developers. While we have only been able to draw attention to a few individuals in this article, he would like to acknowledge the contributions of a far wider collection of individuals to Geronimo, WADI, and associated projects.


Resources

Learn

Get products and technologies

  • Get ActiveSpace (now part of the ActiveMQ Apache incubator project), a simple-to-use yet powerful toolkit for building distributed systems in a SEDA like way.

  • Explore ActiveCluster (now part of the ActiveMQ Apache incubator project).

  • Download ActiveMQ, a fast open source JMS 1.1 provider and message fabric supporting clustering, peer networks, discovery, TCP, SSL, multicast, persistence, and XA that integrates seamlessly into Java 2 Platform, Enterprise Edition (J2EE) 1.4 containers, lightweight containers, and any Java application. ActiveMQ is released under the Apache 2.0 License.

  • Get Jetty, a 100% Java HTTP server and servlet container.

  • Get WADI from Codehaus.

  • Download OpenEJB, an open source, modular, configurable, and extendable EJB container system and EJB server.

  • Get MX4J, a project to build an open source implementation of the Java Management Extensions (JMX) and of the JMX Remote API (JSR 160) specifications, and to build tools relating to JMX.

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.

  • Download Apache Geronimo, Version 1.0.

  • Download your free copy of IBM WebSphere® Application Server Community Edition V1.0 -- a lightweight J2EE application server built on Apache Geronimo open source technology that is designed to help you accelerate your development and deployment efforts.

Discuss

About the author

Author photo

Neal Sanche is a Java developer recently beached in the Microsoft® .NET world and fighting for any ties back to his old, comfortable roots. His experience includes development of several commercial J2EE applications, as well as several stand-alone Java applications. In his spare time, he writes music, takes photographs, and writes technical articles. Visit his Web site to see several examples of each. You can reach the author at neal@nsdev.org.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Java technology, WebSphere
ArticleID=106217
ArticleTitle=The Geronimo renegade: The push for clustering, Part 2
publish-date=03212006
author1-email=neal@nsdev.org
author1-email-cc=ruterbo@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers