Skip to main content

The Geronimo renegade: The push for clustering

The inside scoop from Jeff Genender

Neal Sanche (neal@nsdev.org), Java Developer, Pure Technologies
Author photo
Neal Sanche is a Java developer recently beached in the .NET world and fighting for any ties back to his old, comfortable roots. His experience includes development of several commercial J2EE applications as well as several stand-alone Java applications. In his spare time, he writes music, takes photographs, and writes technical articles. Visit his Web site to see several examples of each. You can contact Neal at neal@nsdev.org.

Summary:  Clustering allows an application server to support multiple nodes with failover, session data sharing, and load balancing across many network nodes. This article provides details -- direct from the developers involved in the Apache Geronimo clustering effort -- on the clustering technologies they are considering implementing. Find out who is working on the details, how they work together to get the code written, and the ramifications these efforts are having on the open source community.

View more content in this series

Date:  13 Dec 2005
Level:  Introductory
Activity:  862 views

News flash! Clustering being added to Geronimo

Start by imagining the sound of ten people typing simultaneously on their computer keyboards. It's clear these sounds are coming from all over the world, some in the U.S., some in Italy, some in Britain, and some are from the computer keyboards of Australians masquerading as Europeans. Now imagine yourself in a medieval town nestled in the lush folds of the Southern Maritime Alps, not too far from Nice, France, in the Italian Riviera. Imagine that you've bought a home in this village, which has been wired with 155-megabit fiberoptic Internet service. The keyboard noises floating out of the windows and into the hills are actually coming from two avid software developers who live and write software from this amazingly beautiful (Colletta di Castelbianco) location! Next, imagine all of these lovely keyboard sounds are contributing to free software. Software with a vengeance: free and easy to use in your business.

You've just imagined the software developers on the Web Application Distributed Infrastructure (WADI) project, nearly all of whom are separated by great physical distances, and yet they are constantly making progress on some of the world's most advanced cluster-enabling software. WADI and Apache Geronimo have recently paired to build a suite of clustering functionality under the Geronimo banner. What's really amazing about this is that from start to finish the feature set has been put together and committed to the Geronimo source tree in a little less than three months!

In this article, I'll focus on clustering and the reasons it's so important to the long-term viability of Geronimo. I'll provide some inside insight from the developers involved in the clustering effort on how the implementation is being done and what the ramifications have been for the open source community. At the same time, I hope to inject into the discussion a few personal details about these ten people and how they work together.


What is clustering, and why would you do it?

Clustering describes a set of technologies applied to an application server that allows multiple instances of an application server, usually running on separate machines, to communicate with each other and synchronize data between them to increase performance and improve stability. Geronimo was lacking clustering functionality until a small development team decided to take on the challenge and find ways to bring these features to life. Their dedication will most likely result in this whole feature set being available in the 1.0 release of Geronimo due out this month.

The main reasons for clustering an application server are to increase its performance and stability. A clustered system can be designed either to maximize the uptime of the system -- so that if any node of the system fails, other nodes in the system take over the functionality of the disabled node -- or to minimize the length of time the user has to wait for any operation to take place, which is usually called load balancing.

Clustering technologies often make use of a fast communication protocol, such as User Datagram Protocol (UDP) or Multicast protocols. This allows all of the cluster nodes to communicate quickly with each other, sharing necessary information so they can all be synchronized with the current session information from other nodes in the cluster. This communications protocol, and the software associated with it, provides a shared data space that is highly transactional and may actually reside in the memories of each of the nodes on the cluster. If anything changes in that shared data space on any of the nodes of the cluster, it's also changed on all of the other nodes in the cluster. So, effectively, all of the nodes are on the same page with respect to what's going on in the applications hosted in the application server. If any of the nodes in the cluster were to fail, a user would be redirected to another functional node in the system, and his or her session would continue as if nothing had happened.

Finally, you'll usually find a shared file system associated with clustering in an application server. This allows quick distribution of application contents to each of the nodes in the cluster so that each node runs the same applications. Sometimes this is referred to as farming. The files associated with an application are farmed out to each node in the cluster.

In a Java™ 2 Platform, Enterprise Edition (J2EE™) server, a number of different parts within the server must perform clustering operations. The Web container -- in the case of Geronimo, this can be either Jetty or Tomcat -- must coordinate with the Web containers of other cluster nodes to share user session information and load measurements. This sharing provides a good way to balance load and reduce the probability that a single node in the cluster gets more requests than it can handle. Also, the Enterprise JavaBeans (EJB) container needs to ensure that stateful session beans are replicated to other cluster nodes so that if a user is redirected to another node in the cluster, his or her application appears to be in the same state as the node previously being communicated with. Another major required component is a highly available Java Naming and Directory Interface (JNDI) directory. This directory is where Java enterprise applications keep names and references that help the applications find the objects they need at run time. It's important that this directory does not lose information if a node in the cluster fails, so it's usually replicated to each node in the cluster.

According to Jeff Genender, Geronimo won't be adopted on a large scale or into the enterprise without horizontal scalability. He notes that high availability is an important concept in mission-critical systems that depend on 24/7 uptimes and high-transaction loads. These things are necessary to get Geronimo to the next level and are currently among Jeff's highest priorities.


Codehaus, a hotbed of open source technology

Now let's examine where the Geronimo team has been looking to find suitable components for adding clustering to its application server. The team needs software for performing the clustering operations, data sharing, and so on. Where do they go to get it? They could, of course, write these components from scratch, but that would defeat the purpose of open source software, which is to openly share and build upon libraries that exist. It turns out that Codehaus, an open source hosting operation akin to SourceForge, has many suitable projects. According to Jeff Genender, the barrier to entry for projects on Codehaus is higher, meaning that it's tougher to get a project hosted there. In turn, this means that the quality of the projects is generally higher and the projects are more mature. This doesn't mean that it's harder to develop a project on Codehaus. Quite the opposite: It's a place where development can progress quickly and smoothly, without a lot of the bureaucracy present in other software hatcheries. The Geronimo project has been looking at a number of projects that fit within their current strategy.

One of the projects at Codehaus that has been used very much in the past is the Java messaging library called ActiveMQ (see Resources for a link), which is used to provide the Java Message Service (JMS) functionality within Geronimo. And, of course, many other Codehaus projects are used within Geronimo, such as ActiveIO, ServiceMix, and, more recently, WADI.


Adding clustering to Geronimo

As Jeff and I further discussed clustering in Geronimo, he pointed clearly at the WADI project and identified its unique features, and how he hopes those features will allow Geronimo to play in the commercial space. In fact, recently the WADI project was being considered for migration into the Geronimo project from Codehaus, which is discussed at length in its incubator proposal (see Resources for a link).

His first statement was that WADI is quite unique because it allows developers to cluster in non-homogeneous environments, such as clustering Tomcat and Jetty implementations together. He hopes to take this to other application servers as well; so, for instance, Geronimo might be able to have IBM WebSphere® nodes in its Web server cluster. He thinks that this concept is a novel one and will allow companies who are happy with some commercial vendors to be able to cluster with open source alternatives and make a cost-effective application server farm. This would certainly provide a migration path to open source and give Geronimo an edge.

I think this shows an important philosophy at work that is often absent in many of the alternatives to Geronimo. The Geronimo developers are looking at ways to allow their application server to interoperate with many other products instead of being concerned with being the best application server. In my opinion, it's much more important to foster cooperation than it is to assume a stance of pure competition.

I've just learned from Jeff that WADI has in fact been committed to the Geronimo source tree. Not only that, but he is confident that the functionality that WADI adds for clustering will allow Geronimo 1.0 to be fully cluster enabled!


Meet the clustering team

After asking Jeff about who would be building out the clustering features within Geronimo, he talked about the clustering team and how they are putting together the various pieces they need to deliver. I wondered if there was any friction in the team, and he laughed, assuring me that everyone on the team works well together. They have a number of Geronimo and Apache committers on the team already, as well as a number of non-Apache folks. He said the people leading the push for integrating the WADI clustering component into Geronimo are himself and Jules Gosnell who is famous for his prior integration work in putting Jetty into JBoss as well as for being the principal designer on the WADI project. Remember the castle-like village in the Italian Alps that I mentioned? Well, that is where the developers of Jetty -- Greg Wilkins and Jan Bartel -- live with their beautiful views of the surrounding countryside and their outstanding Internet connection. Greg has been an advocate of WADI for over a year and has worked with Jules to ensure that Jetty works well with WADI.

I did a little research on WADI to understand where it plays in the clustering space and found out that it used to be concerned mainly with Web application distributed state management. But further reading in the WADI FAQ (see Resources for a link) has revealed that they are also working out how to integrate WADI with the EJB tier of an application server. In fact, work is nearly completed on a full OpenEJB integration with WADI. OpenEJB is, of course, the default (and currently only) EJB container used within Geronimo.

Jeff elaborated on this by telling me that Gianni Damour is responsible for development of the mentioned OpenEJB components, and they expect to be integrating clustering GBeans shortly. He told me that Jules and the team are getting things ready by splitting up WADI into different modules to deal with the Lesser General Public License (LGPL) dependency issues for other application servers and components that it supports (such as JBoss and JGroups). This is further evidence of a licensing conflict, because it appears that the Apache Software Foundation (ASF) cannot (or, more likely, will not) distribute any components licensed under the LGPL license. So, they must split WADI into modules and distribute them separately. The WADI project on Codehaus will remain for development of LGPL connector modules.

Jeff also hinted about the future work that he and Jules will be doing. Jules will be completing work on grid-based distributed cacheing, and Jeff will be working on the GBean integration of WADI into Geronimo. In the meantime, Jeff has integrated Tomcat clustering GBeans into Geronimo as a quick solution to get the Web tier clustered until his WADI GBeans are completed. He says this has temporarily satisfied a few users who needed clustering immediately until the final solution is in place.

Wait, there are ten people. So to be fair I have to add Bruce Snyder, Gianni Scenini, James Strachan, James Goodwill, and Bill Dudney to the list. This is a stellar roster of open source committers and innovators.


The great Codehaus land grab

I find it interesting that the push to create a set of clustering technologies for Geronimo has precipitated the potential movement of a number of open source projects into the Geronimo project from Codehaus. This type of open source consolidation is healthy. It brings a broader community to bear on the implementation of the software and will undoubtedly result in stronger software.

I couldn't help noticing that the JBoss project has also recently brought a Codehaus project to live under the JBoss group's banner. This project -- called Drools -- is a system for developing a rules-based expert system within an application. I can only surmise that JBoss is concerning itself with increasing its niche platform portfolio. I hope that this doesn't mean there will be anything like a big Codehaus turf war between Geronimo and JBoss. Likely not, since JBoss has already chosen many of its building-block components.

Obviously, they're doing something right at Codehaus to have so many great projects and such a demand for these projects to become components of other open source projects. Bob the Despot (Bob McWhirter), the founding father of Codehaus, has done very well.


All the eggs find a basket

It's not only the keyboards of the Geronimo team, but the keyboards of the WADI team developers that are smoking these days. Somewhere, drowned out by all of the keyboard noise, is the sound of jet planes, all flying out to San Diego, California, for ApacheCon US 2005, where the canvas will be torn off of Geronimo revealing the bright LEDs of the flashing 1.0 sign. That's right, Geronimo will reach its first non-milestone release. All of its most important features will be in place. All of its eggs will be in the 1.0 basket just in time for the holiday season when Java developers worldwide get that much-needed free time to download, compile, and kick the tires of some cool software. I'll definitely be kicking Geronimo's tires.


Resources

Learn

Get products and technologies

Discuss

About the author

Author photo

Neal Sanche is a Java developer recently beached in the .NET world and fighting for any ties back to his old, comfortable roots. His experience includes development of several commercial J2EE applications as well as several stand-alone Java applications. In his spare time, he writes music, takes photographs, and writes technical articles. Visit his Web site to see several examples of each. You can contact Neal at neal@nsdev.org.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, WebSphere
ArticleID=100543
ArticleTitle=The Geronimo renegade: The push for clustering
publish-date=12132005
author1-email=neal@nsdev.org
author1-email-cc=ruterbo@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers