In each column, The EJB Advocate presents the gist of a typical back-and-forth dialogue exchange with actual customers and developers in the course of recommending a solution to an interesting design issue. Any identifying details have been obscured, and no "innovative" or proprietary architectures are presented. For more information, see Introducing the EJB Advocate.
Dear EJB Advocate,
My first J2EE project was to build a Web application that renders products from our company catalog for customers to add to their shopping cart for purchase. I was a big advocate of object technology and the idea of entity EJBs. That is, until I started testing a prototype on my laptop using IBM® WebSphere® Studio Application Developer's test environment. To make a long story short, it took over five seconds for my servlet to load a page-worth of local entity EJBs for display. When I recoded the servlet to use JDBC, it took less than one second.
I hate to say this, but sign me:
It was pretty obvious right away that the entity EJB architecture was flawed because an apples-to-apples comparison of well designed CMP entity EJBs to hand-coded JDBC components has never shown this level of difference. However, there were some good things to point out about their focus on early testing to explore first in my reply. My thought was that by pointing out an extra test they could try, the team may be able to figure the problem out for themselves.
Dear No Longer,
It was good to see that you were using local versus remote entity EJBs, so that you didn't pay the additional performance penalty of remote method invocations. This overhead could have easily doubled the amount of time it took to load the page. One of the best practices associated with using entity EJBs is to always use local ones. We briefly discussed this approach in last month's column.
I was also very happy to hear that you immediately built a prototype to test the performance of local entity EJBs versus directly-coded JDBC as the persistence layer. Many teams don't do candidate architecture performance testing until they build out the entire application, and ultimately end up with a lot of rework -- and maybe a critical situation -- in production. Others will attempt to hide the details of the persistence mechanism behind an "access bean" to make it easy to switch later, adding yet another layer to the system (both JDBC and entity EJBs are already abstractions of the persistence mechanism).
I suspect I know what the problem is, but I'd like you to turn on tracing of the JDBC datasource to see how many SQL statements are getting executed (see the WebSphere Application Server Information Center). You can also turn on EJB tracing which, as Matt Oberlin points out in his Meet the Experts article, is pretty easy to do in WebSphere Studio Application Developer (Figure 1).
Figure 1. Turning on EJB trace in WebSphere Studio Application Developer
Further, and as Stacy Joines points out in her excellent book on WebSphere performance, gathering precise performance measurements is really important to finding and fixing bottlenecks. The reason I ask you to capture this more precise measure is that it is likely you will see many more SQL statements for the entity bean case than for the direct JDBC case, and that accounts for the difference in performance. In fact, I predict that you will see one SQL statement executed for each attribute read from each product entity, plus one!
Let me know what you find out.
Dear EJB Advocate,
How did you guess? I display up to ten products per page, with five attributes per product (SKU, description, price, a link to an image, and a date when the item will be available). I ended up with fifty-one SQL statements executed for the entity EJB case and only one SQL statement for the direct JDBC case! It is no wonder that the entity EJBs did not come anywhere close to the performance. It seems like I made the right choice to go with JDBC.
Sign me still:
I had hoped that the SQL trace data would make it obvious what the problem was -- that "No Longer" would no longer rely on imprecise "stopwatch" measurements, and would try to find the cause of the difference or problem before giving up. Helping someone get over disillusionment is harder than I thought! Here was my reply:
Dear No Longer,
I suspect that you are invoking your entities outside the scope of a global transaction, which will cause each call to an entity to be executed in a separate transaction and result in its own SQL statement (depending on your deployment options). Specifically, there will be one call to the home for the finder returning the next ten entities, and five calls to each entity for the get() methods associated with the attributes displayed. Calling entities outside of a global transaction is definitely a "worst" practice (more commonly called an anti-pattern). In fact, it is so important to avoid calling entities outside of a transaction that some EJB developers suggest (as this EJB Advocate does) to declare transactions to be "mandatory" in the EJB deployment descriptor: use the <trans-attribute>Mandatory</trans-attribute> within the <container-transaction> tag. This declaration will cause an exception to be thrown, if there is not already a transaction scope initialized when the entity is accessed.
There are two ways to wrap logic that may be calling entity EJBs in a global transaction and greatly improve performance. One is the "easy way" and one is the "right way.".
The easy way is to explicitly add code to your servlet to start and end a global transaction around the calls to the EJBs, like so:
Some teams go a little further and create a superclass servlet to handle this behavior using a technique called template inheritance. The superclass would be declared abstract . Its doGet() method would be declared final and would call down to an abstract doGetYourParent() method implemented by YourServlet (the ones inheriting that behavior). The parent class code might look like the following:
The changes required to your subclass servlets in order to use template inheritance are pretty simple:
One major benefit of the template inheritance approach is it makes it easy to consistently and transparently add qualities of service, like transaction start and stop, cache checking, error handling, and other behaviors, that your team will likely want to include to ensure robustness.
Regardless of your approach to starting a global transaction, you should notice a drastic drop in the number of SQL statements in the trace (depending on the access intents and other deployment options; see the WebSphere Application Server Information Center on how to set attributes like the Collection Increment to the number you would like to read -- ten in your case).
But even if you make these changes and eliminate all the places where CMPs are called outside of the scope of a global transaction, a load analysis tool (one that measures the performance of the system under near production conditions), like IBM Rational® Performance Tester, will still show a significant difference between the throughput and CPU utilization of JDBC and entity EJB code, even if profiling tools like JInsight and path analysis tools like IBM Tivoli® Monitoring for Transaction Performance do not show a difference.
The "right way" to fix the code depends on the details of your design. You may already be pretty close, so let me ask you a question: are you using a JavaServer™ Page to render the page from a "data transfer object" (a POJO with get/set methods only) loaded by the servlet (a J2EE best practice)? Or does the servlet render the HTML reply directly?
Dear EJB Advocate,
You are right that I used a JSP to render the page following the Model 2 approach. In other words, the servlet loads an array of up to ten "ProductView" objects (which is the same as your data transfer object except that it is Serializable), and then calls the JSP. Just to be clear, here is the relevant code in the servlet, written in the style you suggested in your previous reply:
By the way, the JSP uses a custom tag to navigate through an array of ProductView objects (actually this tag navigates through an array of any kind of object simulating the "bean tag"), from which the "bean property tags" can be used to substitute the properties. I hope this is enough detail.
I was glad to find that this "easy way" code actually brought the performance of entity EJBs close to that of JDBC (as measured by JInsight). I also used the "mandatory" CMT attribute, which verified that this took care of all the calls to the CMP outside of a transaction, However, using JDBC was still significantly better in a head-to-head comparision using our load testing tool (we use LoadRunner now, but will take a look at the Rational Performance Tester you mentioned.)
Thanks, but I am still signed,
This time, I got more than I had hoped for. No Longer provided code samples, which are far more precise than descriptions or block diagrams. And it seemed like they took the hint to use more precise measures of system performance using load and path analysis tools.
If No Longer had intermixed the HTML rendering code (the view) with that to retrieve the data, I would have wanted to delve into servlet best practices (not a stretch, since I am an all around J2EE Advocate, too). I would have had to explain the best practice to use data transfer objects to flow data from the servlet to the JSP. The code that I include in my reply below would not have looked so familiar to No Longer.
Dear No Longer,
Thanks for the code sample. I much prefer that to any other form, since code is where the "rubber meets the road" with respect to performance. Nothing beats a static analysis to see if the design follows best practices. Together with load and path analysis, you get a pretty complete picture of how to find and fix the bottlenecks.
It is great that you are following Model 2 best practices and have gone a step farther to provide a custom navigation tag for arrays. It is also great that you are already using data transfer objects, basically the same as Service Data Objects. Having this architecture makes the "right way" to get a global transaction the "easiest way" as well.
In other words, you can create a session facade EJB to encapsulate the logic associated with gathering the data for the page (the array of ProductView objects). This pattern is discussed in last month's column, as well as in Kyle Brown's book (see Resources), among others. The session facade might look something like this:
As you might notice in the example above, a benefit of using the Model 2 architecture with data transfer objects is that most of the logic in your servlet doGet() method simply moves into the session facade getCatalog() method. A major benefit of this move is that the logic to get the next page of products is now usable outside of the context of a servlet (like from within a message-driven bean, or another EJB). A remote interface can be provided as well (automatically generated by tooling in WebSphere Studio Application Developer), making it available from a J2EE client. The use of the data transfer object minimizes the chattiness between the layers -- only one stateless call is needed. In any event, the servlet no longer needs to deal with a transaction. It looks something like:
I know it seems like this best practice trades code to start and end a transaction with code to invoke an EJB method, but there are a couple of additional benefits beyond the ability to reuse the logic in other situations. First, the local session reference can be cached in the servlet init() method to eliminate the lookup in the doGet(). Second, and most importantly, handling transactions can be very complex, especially where exceptions are concerned. Improper handling can lead to "leaks" that result in their own kind of performance problems. In short, another best practice is to use container managed transactions wherever possible.
Regardless, this "right way" code will run essentially as fast as the "easy way" code (given that a local session interface is used). But the problem will still remain that the end-to-end performance using a local entity will still perform significantly worse than when using JDBC (now behind the session EJB, which effectively encapsulates the model from the view). The reason is that even though the get<Property>() methods on the entity are local, there is still a lot of overhead to check for security and transactions. Some estimates of this overhead place it around ten thousand instructions, which for this example would be insignificant when measured by a path analysis tool: 50 x 10000 = 500000 total. But what if the "count" above were 100 and the number of properties accessed was 100. The total number of instructions is 100 million, which is starting to be a measurable difference. This phenomenon associated with "scale" is why load testing is best to find real performance differences. Path testing lets you find the likely culprit with which you can follow up with static analysis (a code review). In this case, the number of instructions it takes to access a property on a data transfer object is estimated in the 10s, not 10s of thousands -- orders of magnitude better than accessing even a local EJB under a global transaction.
The key to optimal EJB performance is to use data transfer objects and custom methods to create, get, and set all the needed properties for a given use case in one call. This minimizes the chattiness between the session facade and entity EJB. In general, it is possible to design your entity EJBs with the right set of methods such that a user need only make one call after a find -- either a create, retrieve, update, or delete method. The following code illustrates what the entity EJB would look like with a custom get:
To enforce the use of the custom methods, many EJB designers only expose custom methods on the interface, like the following:
The home would have the custom creates and finds:
You might have noticed that I did not use "Local" as part of the entity EJB interface name (for either the home or the bean). Since I never expose a remote interface to an entity EJB, it seems like overkill to add length to the class name.
In any event, the session facade would change as follows to exploit these custom methods:
Note that we simply moved the code that was in the loop of the session EJB getCatalog() method into the entity EJB getProductView() method; then we replaced the moved code with a single method call. If you load test the code implemented in this manner, you will likely notice a much more reasonable overhead of using entity EJBs with respect to JDBC. Given that entity EJBs will be far more maintainable than JDBC, the tradeoff will be well worth it.
And just for future reference, here is a UML diagram showing the general end-to-end architecture. It shows the relationship between JSPs and servlets, session and entity EJBs, and key and view data transfer objects:
Figure 2. UML diagram showing end-to-end architecture
I hope this discussion helps to renew your enthusiasm for entity EJBs. At the very least, I hope you have some new tools and techniques for precisely measuring the performance differences and evaluating the tradeoffs.
Through this exchange, we saw why using a session facade in front of a local entity EJB is so important to ensure that only one transaction is executed per UI event, and to minimize the number of SQL calls sent to the database. Just as significant, we saw the importance of using a data transfer object (now being called an SDO) and custom methods on entity EJBs to minimize the chattiness between layers, even when the local interface is used. And, incidentally, we saw that using an SDO enables you to flow data all the way from the entity to the JavaServer Page rendering the HTML, passing through the session facade and the servlet (the model and view controllers respectively).
We discussed how template inheritance can be used to add behaviors (like starting and committing a global transaction) transparently to the servlet code. Even though the use of session facade minimizes the need for this approach, template inheritance in servlets may still be useful in cases where it calls more than one session EJB in the context of the doGet() or doPost().
We also discussed declaring transactions on entity EJBs to be mandatory, as well as not exposing the individual CMP attributes to the entity interface. Both of these best practices help to enforce your usage policies.
One process (rather than design) best practice we covered was related to measuring performance using load and path analysis tools, then analyzing the code and configuration to find and fix bottlenecks. The idea is to use tools like Rational Performance Tester, Tivoli Monitoring for Transaction Performance, and JInsight that capture number of calls, the amount of data passed back and forth, and the amount of time that each call takes. Also, we hinted at a best practice that static analysis should be based on code, not on class or high level sequence diagrams (even though these are useful to get an overview).
If you have an interesting problem associated with using EJBs of any type, please feel free to contact the EJB Advocate. Otherwise, in the next column,we will examine Service Data Objects, and how EJBs (both entities and sessions) will play a role within a Service Oriented Architecture.
- The EJB Advocate: Getting EJB cross references right
- Java 2 Platform, Enterprise Edition (J2EE) specification, the definitive source
- Enterprise Java Programming with IBM WebSphere, Second Edition, by Kyle Brown, Gary Craig, Greg Hester, Russell Stinehour, W. David Pitt, Mark Weitzel, JimAmsden, Peter M. Jakab, Daniel Berg. Foreword by Martin Fowler.
- Performance Analysis for Java Websites, by Stacy Joines, Ruth Willenborg, and Ken Hygh.
- IBM Rational Performance Tester
- IBM Tivoli Monitoring for Transaction Performance
- WebSphere Application Server Information Center
- Meet the Experts: Matt Oberlin on WebSphere Studio Application Developer
- Browse for books on these and other technical topics.
Geoff Hambrick is a lead consultant with the IBM Software Services for WebSphere Enablement Team and lives in Round Rock, Texas (near to Austin). The Enablement Team generally helps support the pre-sales process through deep technical briefings and short term Proof of Concept engagements. Geoff was appointed an IBM Distinguished Engineer in March of 2004 for his work in creating and disseminating best practices for developing J2EE applications hosted on IBM WebSphere Application Server.