News flash! Clustering being added to Geronimo
Start by imagining the sound of ten people typing simultaneously on their computer keyboards. It's clear these sounds are coming from all over the world, some in the U.S., some in Italy, some in Britain, and some are from the computer keyboards of Australians masquerading as Europeans. Now imagine yourself in a medieval town nestled in the lush folds of the Southern Maritime Alps, not too far from Nice, France, in the Italian Riviera. Imagine that you've bought a home in this village, which has been wired with 155-megabit fiberoptic Internet service. The keyboard noises floating out of the windows and into the hills are actually coming from two avid software developers who live and write software from this amazingly beautiful (Colletta di Castelbianco) location! Next, imagine all of these lovely keyboard sounds are contributing to free software. Software with a vengeance: free and easy to use in your business.
You've just imagined the software developers on the Web Application Distributed Infrastructure (WADI) project, nearly all of whom are separated by great physical distances, and yet they are constantly making progress on some of the world's most advanced cluster-enabling software. WADI and Apache Geronimo have recently paired to build a suite of clustering functionality under the Geronimo banner. What's really amazing about this is that from start to finish the feature set has been put together and committed to the Geronimo source tree in a little less than three months!
In this article, I'll focus on clustering and the reasons it's so important to the long-term viability of Geronimo. I'll provide some inside insight from the developers involved in the clustering effort on how the implementation is being done and what the ramifications have been for the open source community. At the same time, I hope to inject into the discussion a few personal details about these ten people and how they work together.
What is clustering, and why would you do it?
Clustering describes a set of technologies applied to an application server that allows multiple instances of an application server, usually running on separate machines, to communicate with each other and synchronize data between them to increase performance and improve stability. Geronimo was lacking clustering functionality until a small development team decided to take on the challenge and find ways to bring these features to life. Their dedication will most likely result in this whole feature set being available in the 1.0 release of Geronimo due out this month.
The main reasons for clustering an application server are to increase its performance and stability. A clustered system can be designed either to maximize the uptime of the system -- so that if any node of the system fails, other nodes in the system take over the functionality of the disabled node -- or to minimize the length of time the user has to wait for any operation to take place, which is usually called load balancing.
Clustering technologies often make use of a fast communication protocol, such as User Datagram Protocol (UDP) or Multicast protocols. This allows all of the cluster nodes to communicate quickly with each other, sharing necessary information so they can all be synchronized with the current session information from other nodes in the cluster. This communications protocol, and the software associated with it, provides a shared data space that is highly transactional and may actually reside in the memories of each of the nodes on the cluster. If anything changes in that shared data space on any of the nodes of the cluster, it's also changed on all of the other nodes in the cluster. So, effectively, all of the nodes are on the same page with respect to what's going on in the applications hosted in the application server. If any of the nodes in the cluster were to fail, a user would be redirected to another functional node in the system, and his or her session would continue as if nothing had happened.
Finally, you'll usually find a shared file system associated with clustering in an application server. This allows quick distribution of application contents to each of the nodes in the cluster so that each node runs the same applications. Sometimes this is referred to as farming. The files associated with an application are farmed out to each node in the cluster.
In a Java™ 2 Platform, Enterprise Edition (J2EE™) server, a number of different parts within the server must perform clustering operations. The Web container -- in the case of Geronimo, this can be either Jetty or Tomcat -- must coordinate with the Web containers of other cluster nodes to share user session information and load measurements. This sharing provides a good way to balance load and reduce the probability that a single node in the cluster gets more requests than it can handle. Also, the Enterprise JavaBeans (EJB) container needs to ensure that stateful session beans are replicated to other cluster nodes so that if a user is redirected to another node in the cluster, his or her application appears to be in the same state as the node previously being communicated with. Another major required component is a highly available Java Naming and Directory Interface (JNDI) directory. This directory is where Java enterprise applications keep names and references that help the applications find the objects they need at run time. It's important that this directory does not lose information if a node in the cluster fails, so it's usually replicated to each node in the cluster.
According to Jeff Genender, Geronimo won't be adopted on a large scale or into the enterprise without horizontal scalability. He notes that high availability is an important concept in mission-critical systems that depend on 24/7 uptimes and high-transaction loads. These things are necessary to get Geronimo to the next level and are currently among Jeff's highest priorities.
Codehaus, a hotbed of open source technology
Now let's examine where the Geronimo team has been looking to find suitable components for adding clustering to its application server. The team needs software for performing the clustering operations, data sharing, and so on. Where do they go to get it? They could, of course, write these components from scratch, but that would defeat the purpose of open source software, which is to openly share and build upon libraries that exist. It turns out that Codehaus, an open source hosting operation akin to SourceForge, has many suitable projects. According to Jeff Genender, the barrier to entry for projects on Codehaus is higher, meaning that it's tougher to get a project hosted there. In turn, this means that the quality of the projects is generally higher and the projects are more mature. This doesn't mean that it's harder to develop a project on Codehaus. Quite the opposite: It's a place where development can progress quickly and smoothly, without a lot of the bureaucracy present in other software hatcheries. The Geronimo project has been looking at a number of projects that fit within their current strategy.
One of the projects at Codehaus that has been used very much in the past is the Java messaging library called ActiveMQ (see Resources for a link), which is used to provide the Java Message Service (JMS) functionality within Geronimo. And, of course, many other Codehaus projects are used within Geronimo, such as ActiveIO, ServiceMix, and, more recently, WADI.
As Jeff and I further discussed clustering in Geronimo, he pointed clearly at the WADI project and identified its unique features, and how he hopes those features will allow Geronimo to play in the commercial space. In fact, recently the WADI project was being considered for migration into the Geronimo project from Codehaus, which is discussed at length in its incubator proposal (see Resources for a link).
His first statement was that WADI is quite unique because it allows developers to cluster in non-homogeneous environments, such as clustering Tomcat and Jetty implementations together. He hopes to take this to other application servers as well; so, for instance, Geronimo might be able to have IBM WebSphere® nodes in its Web server cluster. He thinks that this concept is a novel one and will allow companies who are happy with some commercial vendors to be able to cluster with open source alternatives and make a cost-effective application server farm. This would certainly provide a migration path to open source and give Geronimo an edge.
I think this shows an important philosophy at work that is often absent in many of the alternatives to Geronimo. The Geronimo developers are looking at ways to allow their application server to interoperate with many other products instead of being concerned with being the best application server. In my opinion, it's much more important to foster cooperation than it is to assume a stance of pure competition.
I've just learned from Jeff that WADI has in fact been committed to the Geronimo source tree. Not only that, but he is confident that the functionality that WADI adds for clustering will allow Geronimo 1.0 to be fully cluster enabled!
After asking Jeff about who would be building out the clustering features within Geronimo, he talked about the clustering team and how they are putting together the various pieces they need to deliver. I wondered if there was any friction in the team, and he laughed, assuring me that everyone on the team works well together. They have a number of Geronimo and Apache committers on the team already, as well as a number of non-Apache folks. He said the people leading the push for integrating the WADI clustering component into Geronimo are himself and Jules Gosnell who is famous for his prior integration work in putting Jetty into JBoss as well as for being the principal designer on the WADI project. Remember the castle-like village in the Italian Alps that I mentioned? Well, that is where the developers of Jetty -- Greg Wilkins and Jan Bartel -- live with their beautiful views of the surrounding countryside and their outstanding Internet connection. Greg has been an advocate of WADI for over a year and has worked with Jules to ensure that Jetty works well with WADI.
I did a little research on WADI to understand where it plays in the clustering space and found out that it used to be concerned mainly with Web application distributed state management. But further reading in the WADI FAQ (see Resources for a link) has revealed that they are also working out how to integrate WADI with the EJB tier of an application server. In fact, work is nearly completed on a full OpenEJB integration with WADI. OpenEJB is, of course, the default (and currently only) EJB container used within Geronimo.
Jeff elaborated on this by telling me that Gianni Damour is responsible for development of the mentioned OpenEJB components, and they expect to be integrating clustering GBeans shortly. He told me that Jules and the team are getting things ready by splitting up WADI into different modules to deal with the Lesser General Public License (LGPL) dependency issues for other application servers and components that it supports (such as JBoss and JGroups). This is further evidence of a licensing conflict, because it appears that the Apache Software Foundation (ASF) cannot (or, more likely, will not) distribute any components licensed under the LGPL license. So, they must split WADI into modules and distribute them separately. The WADI project on Codehaus will remain for development of LGPL connector modules.
Jeff also hinted about the future work that he and Jules will be doing. Jules will be completing work on grid-based distributed cacheing, and Jeff will be working on the GBean integration of WADI into Geronimo. In the meantime, Jeff has integrated Tomcat clustering GBeans into Geronimo as a quick solution to get the Web tier clustered until his WADI GBeans are completed. He says this has temporarily satisfied a few users who needed clustering immediately until the final solution is in place.
Wait, there are ten people. So to be fair I have to add Bruce Snyder, Gianni Scenini, James Strachan, James Goodwill, and Bill Dudney to the list. This is a stellar roster of open source committers and innovators.
I find it interesting that the push to create a set of clustering technologies for Geronimo has precipitated the potential movement of a number of open source projects into the Geronimo project from Codehaus. This type of open source consolidation is healthy. It brings a broader community to bear on the implementation of the software and will undoubtedly result in stronger software.
I couldn't help noticing that the JBoss project has also recently brought a Codehaus project to live under the JBoss group's banner. This project -- called Drools -- is a system for developing a rules-based expert system within an application. I can only surmise that JBoss is concerning itself with increasing its niche platform portfolio. I hope that this doesn't mean there will be anything like a big Codehaus turf war between Geronimo and JBoss. Likely not, since JBoss has already chosen many of its building-block components.
Obviously, they're doing something right at Codehaus to have so many great projects and such a demand for these projects to become components of other open source projects. Bob the Despot (Bob McWhirter), the founding father of Codehaus, has done very well.
It's not only the keyboards of the Geronimo team, but the keyboards of the WADI team developers that are smoking these days. Somewhere, drowned out by all of the keyboard noise, is the sound of jet planes, all flying out to San Diego, California, for ApacheCon US 2005, where the canvas will be torn off of Geronimo revealing the bright LEDs of the flashing 1.0 sign. That's right, Geronimo will reach its first non-milestone release. All of its most important features will be in place. All of its eggs will be in the 1.0 basket just in time for the holiday season when Java developers worldwide get that much-needed free time to download, compile, and kick the tires of some cool software. I'll definitely be kicking Geronimo's tires.
Learn
- See the WadiProposal for a recent discussion of the migration of the WADI into the Geronimo project from Codehaus.
- Read the WADI FAQs.
- Get the Java Management Extensions (JMX), an API developed by Sun Microsystems.
- Read "Geronimo GBean Architecture," an article that shows you how to use the GBeans architecture.
- Find helpful resources for beginners and experienced users at the Get started now with Apache Geronimo section of developerWorks.
- Join the mailing list at the Apache Geronimo Web site.
- Check out Tom McQueeney's site, Geronimo Live, which is packed with information and resources on Geronimo.
- Read Applying the Apache License, Version 2.0 for software developer guidance, both inside and outside the Apache projects, to understand what you need to do to apply the Apache License, Version 2.0.
- Read more great content on Geronimo:
- "Building a better J2EE server, the open source way" (developerWorks, May 2005)
- "Geronimo! Part 1: The J2EE 1.4 engine that could" (developerWorks, May 2005)
- "Geronimo! Part 2: Tame this J2EE 1.4 bronco" (developerWorks, May 2005)
- "Three ways to connect a database to a Geronimo application server" (developerWorks, June 2005)
- "Create, deploy, and debug Apache Geronimo applications" (developerWorks, May 2005)
- "Apache Geronimo uncovered" (developerWorks, August 2005)
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
- Check out the developerWorks Apache Geronimo project area for articles, tutorials, and other resources to help you get started developing with Geronimo today.
- Check out the IBM Support for Apache Geronimo offering, which lets you develop Geronimo applications backed by world-class IBM support.
- Browse all the Apache articles and free Apache tutorials available in the developerWorks Open source zone.
Get products and technologies
- Download Apache Geronimo.
- Get the Java messaging library called ActiveMQ.
- Innovate your next open source development project with IBM trial software, available for download or on DVD.
- Download your free copy of IBM WebSphere Application Server Community Edition V1.0 -- a lightweight J2EE application server built on Apache Geronimo open source technology that is designed to help you accelerate your development and deployment efforts.
Discuss
- Participate in the discussion forum.
- Stay up to date on Geronimo developments at the Apache Geronimo blog.

Neal Sanche is a Java developer recently beached in the .NET world and fighting for any ties back to his old, comfortable roots. His experience includes development of several commercial J2EE applications as well as several stand-alone Java applications. In his spare time, he writes music, takes photographs, and writes technical articles. Visit his Web site to see several examples of each. You can contact Neal at neal@nsdev.org.
Comments (Undergoing maintenance)





