More of a good thing? It depends
Last summer, I contributed a Comments Line column with the same title (sans Part 2), and received feedback that led me back to do this again. Whether that means I am brave or foolhardy remains to be seen. As before, this is not a discussion of things you're afraid to ask about WebSphere Application Server, but rather a look at some of the questions I am asked repeatedly. While my most oft used response, it depends, is at the ready, I really will try to provide some guidance to help you determine the best answers for your specific situation, as well as provide some definitive answers to a couple of common questions.
A: I'm going to begin my answer with another question I received recently that brings up several points that are often misunderstood.
Q: How does EJB client workload management (WLM) behave?
I have a customer that is trying to design a workload distribution solution with EJB clients calling EJBs that are in the same cluster. The workload of the EJBs is significant and chunk,y so he wants to make sure that they don't all hit just one server of the cluster. We are concerned that process affinity will stop any workload sharing; that is, the called EJB will always be on the server of the calling EJB. We can turn off local affinity, but process affinity only appears to be documented in the performance Redbooks and not anywhere else. This is (for) WebSphere Application Server V5.1x, are there any guidelines around EJB WLM and how to get over process affinity?
Let me try to clear up a common misconception about EJB WLM and the confusion that exists regarding "prefer local" and "process affinity," terms that are often used synonymously but are in fact two different things:
Process affinity (in the context of the question above) refers to the fact that WebSphere Application Server will not make an out-of-process call from an EJB client to an EJB. Therefore, when a Web application component is an EJB client and the EJB is deployed in the same WebSphere Application Server EJB, WLM does not come into play. WebSphere Application Server will always call the EJBs residing in the local application server and will not distribute the application client requests across the other application servers in a WebSphere Application Server Server cluster running the same application (and hence same EJBs). EJB WLM, or, more precisely, request distribution, only comes into play when the EJB client and the EJB are running in separate processes, be the client a standalone Java client or some component in a Web application that is running in a remote (from the EJB) application server.
Local affinity in the question above refers to the "prefer local" option that you can configure for a WebSphere Application Server cluster. This option only applies to EJB WLM, not to the HTTP server plug-in WLM. When this option is enabled, and the EJB client is running in a remote process from the application server, then WebSphere Application Server will "prefer" application servers that are located on the same machine as the EJB client, hence the term "prefer local." If there are no running application servers on the same machine as the EJB client, and "prefer local" is enabled, then and only then will WebSphere Application Server distribute EJB client requests to remote machines. From a practical perspective, this means that "prefer local" only has an impact when you have chosen to deploy the components from a single application (J2EE EAR file) across multiple application servers. Typically, this means the Web components are deployed in one application server (or server cluster) and the EJB components are deployed in another application server (or server cluster). This normally occurs when one is trying to promote greater reuse of their business objects, typically thinking that the EJBs can be considered "shared services." While ascetically pleasing, this likely is not practical in many cases. Doing so does imply some tradeoffs, performance will be 20 to 30% slower due to the out-of-process calls, and application maintenance will be more complex (See Deploying multiple applications in J2EE 1.2).
Let's move on to some other points that are asked or implied in this question. First, there's the concern about how to distribute the requests to avoid overloading the server. As stated above, request distribution will only occur if the client and EJB are in separate processes, so in many cases there is no distribution; in fact, that's the case in this question since the EJB client and EJB are in the same process. This leads to some subtle WLM behaviors that are employed to ensure that the workload is evenly distributed across the WebSphere Application Server cluster.
For a remote EJB client, the InitialContext request goes through the Object Request Broker (ORB) and a JNDI context object. In turn, the lookup on the context returns a home object of the bean. This is an indirect Interoperable Object Reference (IOR), which actually points to the Location Service Daemon (LSD) on the local Node Agent. As a result, the first request goes to the LSD and the LSD randomly selects one of the cluster members by using the WLM plug-in in the LSD; the LSD then returns a direct IOR to the specific cluster member. The random selection of a cluster member is the first mechanism used to ensure even request distribution.
Next, the first request is then processed by that server. Future requests are then distributed using WLM (see Resources). EJB WLM in WebSphere Application Server V5.0x (and above) also makes use of outstanding requests in determining where to send a request. It starts out as standard weighted round robin (decrementing the weights as it goes), but once there are several outstanding requests to multiple cluster members, an outstanding request weight algorithm comes into play. This algorithm compares each cluster member's weight to the number of outstanding requests that have been sent to that cluster member and ensures it has the correct proportion relative to its weight and the weight of the other cluster members. For instance, if the weights were 2 and 2, and the number of outstanding requests to the first server was higher than the second server, the next request will go to the second server even though it was the first server's turn (based on weighted round robin). This additional step helps to ensure the servers are balanced based on their weight.
(In WebSphere Application Server V6.0 there are some additional enhancements; the weights are normalized. and the WLM algorithm will spread out the requests. For example, with weights of 2 (server "a") and 7 (server "b") the resulting WebSphere Application Server V6 request distribution will be a-bbbb-a-bbb while in WebSphere Application Server V5.x the request distribution would be a-b-a-bbbbbb. This further helps to eliminate overloading of a given server.)
(The discussion above assumes that you are already familiar with Chapter 6 of WebSphere Scalability or Chapter 21 in IBM WebSphere: Deployment and Advanced Configuration; see Resources. If you are not acquainted with this material, refer to them first and then revisit this discussion.)
Q: Can I run a WebSphere Application Server cell over multiple data centers?
A: Yes. But there are probably good reasons not to do so.
First, let's look at the most obvious issue, network speed and reliability between the data centers. In many cases, the performance and reliability of a WAN is not as good as a LAN, though there are environments where the WAN is highly reliable and also provides LAN bandwidth; in such a case, the WAN appears to be the same as a LAN to applications (such as WebSphere Application Server). So the simple answer would be: Sure, go ahead, if you have a fast and reliable WAN. However, overlooked is the far more important question: Why have multiple data centers? Normally, you do so to increase reliability, thus if one entire data center fails or is lost (in other words, a true "disaster"), you want the remaining data center to be able to handle work without major problems. Given that, one needs to plan for data center outages that are not brief, and reliability in this state will become very important as a result. Additionally, failover in this condition would be likely very difficult to test correctly, since this is outside the realm of "normal" WebSphere Application Server failover, which is at the component level (server, Web, EJB, and so on), not the data center level.
What happens to WLM endpoints in such a case, specifically when the clients are in one data center and the servers are in another? This can arise with either EJB WLM or HTTP server plug-in WLM depending on the deployment and network architecture, and while both WLM implementations will recover (via timeouts), this is one more situation to consider (and likely avoid).
The WebSphere Application Server Network Deployment deployment manager is a single point of failure. As a result, if you lose the data center where the deployment manager is running, you lose the ability to manage the cell until a backup deployment manager is brought up in the surviving data center. While it is possible to deal with this, it's still one more thing to deal with during a failure.
If you have chosen to distribute your HTTP session objects, and you're using a database for this purpose, what happens to your session information if the database is in the now non-functioning data center?
While it's certainly possible to construct (and test) a cross center clustering solution -- if you exercise extreme care -- there is always the risk that you've missed something that will occur during a real disaster. It is because of these issues that I have not mentioned here (and those I haven't thought of) that leads me to advise against running a WebSphere Application Server cell across multiple data centers. As I often mention, a disaster is not a time you want to be learning on the job.
Q: Can I share sessions across WebSphere Application Server cells?
A: Yes. But again, there are likely good reasons not to.
One reason was mentioned above in the context of running a WebSphere Application Server cell across data centers. In addition, if you share sessions across cells, this means that you're relying on a single database server (or database server and failover server). As a result, maintenance on the database server of any type results in an outage (or the steps to avoid an outage) across two cells. This adds complexity.
In a like fashion, updates to the application server runtime can also make planning and executing an outage more difficult, as a software update (for example, to WebSphere Application Server) could result in the existing database server version no longer being supported. On the other hand, if each cell is independent, including the database server, then maintenance and updates can be applied on a cell by cell basis (meaning the WebSphere Application Server cell and its associated infrastructure), and an outage in one cell doesn't impact the other cell, which can continue to run and service requests.
That's the end of this installment of ramblings, for the time being at least. I hope that this information has provided some insight into how you can go about determining the best solutions for your environment, if not some direct answers to a couple of common WebSphere Application Server queries.
Thanks to Keys Botzum and Bill Hines for their review and comments.
- Comment lines from Tom Alcott: Everything you always wanted to know about WebSphere Application Server but were afraid to ask
- Deploying multiple applications in J2EE 1.2: Planning for reuse in your Enterprise JavaBeans components
- IBM Redbook: WebSphere Scalability: WLM and Clustering Using WebSphere Application Server Advanced Edition
- IBM WebSphere: Deployment and Advanced Configuration