Skip to main content

Optimizing container-managed persistence EJB entity beans

Avoid unnecessary store operations with CmpOPT

Matt Hogstrom (hogstrom@us.ibm.com), WebSphere performance analyst, IBM, Software Group
Matt joined IBM in 1999 as a performance analyst for WebSphere Application Server. He conducts performance work on several platforms including AIX, Windows, Solaris and z/OS. Currently Matt is representing IBM as a member of JSR-131, which is working on the ECperf 1.1 benchmark as well as the Spec Organization's OSG-Java workgroup. He resides and works in North Carolina. You can contact Matt at hogstrom@us.ibm.com.

Summary:  Optimizing EJB entity beans for performance can be done several ways. One way is to ensure that entity bean information that hasn't been modified during a transaction is not stored unnecessarily at the end of that transaction. By avoiding unnecessary store operations you can avoid expensive database operations, which means better performance and concurrency. This article includes a sample application that you can use to automate this process.

Date:  01 Nov 2001
Level:  Intermediate
Activity:  1034 views
Comments:  

Optimizing container-managed persistence of EJBs

The J2EE programming model provides significant boosts in productivity. The boost in productivity is derived from a platform that allows applications to be deployed on several different hardware/software platforms without change. Code can be reused easily and access to data residing in a database table is simplified by using entity beans. CMP (container-managed persistence) entity beans are one of the benefits of the J2EE programming model.

CMP beans are used to access data located in a relational database. A CMP entity bean defers all interaction with the database to the EJB container and exposes a set of methods that allow the data to be referenced and or updated by other programs. Typically, the container reads the data from the database and places it into the fields in the CMP bean, where it can be referenced or updated. At the end of a transaction the container accesses the data in the bean and updates the underlying row in the table.

This method of programming provides a significant benefit, since access to data in a database is simple and usable by a number of different programs. In a simple model, an unfortunate occurrence is that data which has only been referenced is written back to the database unnecessarily. Storing data that has only been referenced during a transaction causes expensive database operations that yield no benefit. In addition to the execution of SQL to update the table, often it is necessary to upgrade a lock from read-only to an exclusive lock. At best, this increases response time, and at worst, it can lead to deadlocks.

Avoiding unnecessary store operations

There are a variety of ways in which a container provider can attempt to avoid unnecessary store operations. One approach is to keep a copy of the data read from the database and compare it to the data in the bean at the end of the transaction. If any modifications have been made, the updated data will be written back to the database. If there was no change to the data, a costly store operation can be avoided. This strategy, on average, is less expensive than the SQL call. However, it has some drawbacks. Additional objects are allocated to store the copies of the data and extra processing is required to determine whether a store is needed. The additional objects increase heap usage and place an additional load on the garbage collector. The end result is an increase in CPU utilization. However, this is still less expensive than storing unmodified data unnecessarily.

Keeping track of methods that modify data and those that merely reference it is more efficient. If the methods executed during a transaction only refer to the data, the expensive store can be avoided. This approach is optimal because it avoids unnecessary store operations, additional objects are not allocated, and the CPU cost to compare the objects is completely avoided. This approach, however, also has some challenges.

How are the methods that only reference data differentiated from those that update it? A programmer could look at the code and make the determination manually ( WebSphere Application Server currently supports this method). The programmer can set the appropriate flag using the Application Assembly Tool. Unfortunately, for complex objects with many methods, the probability increases that the programmer will incorrectly identify a method with an access intent of "update" when it really only refers to a field, causing an unnecessary update. Or worse, the programmer might miss an update and mark the method as read-only, causing data updates to be lost at the end of a transaction. Fortunately, there is another way.


Introducing CmpOPT

IBM developed a utility, CmpOPT, that analyzes the Java bytecode for remote methods defined for CMP beans. CmpOPT loads CMP beans and any classes they reference and tracks the access to all fields defined as container-managed for the entity. In this way the actual access to any given field can be determined with great accuracy and the flags that determine whether a method has an access intent of "read" or "update" can be set appropriately. You can eliminate potential errors made by manual examination and achieve optimal performance for the CMP entity bean.

This utility is offered as a technology preview. Because it is a new technology it is still evolving (your feedback is appreciated). Also, since it is new, there are a few things to consider:

  • CmpOPT runs for a long time. This is because the process of analyzing multiple classes is time-consuming. It also requires significant amounts of JVM heap memory.
  • The default options for execution are quite conservative. When there is a doubt we assume updating. Reviewing the analysis results may reveal that relaxing the analysis criteria yields more optimal results.
  • Currently only EJB JAR files are processed. You can't process an entire EAR file, though it's a possible future enhancement.
  • The analysis is static. Changes to class files might invalidate previous results. If you make changes to the application, CmpOPT must be re-run against the target JAR file to update the results.
  • CmpOPT is currently supported on the Windows platform only.

Types of fields the analysis encompasses

CMP fields can be either Java primitive types (such as int, long, double, and so on), or objects. For primitive types, determining the access intent is fairly straightforward. The code in a method either refers to the variable or updates it. The other type of CMP field is a reference to an object. This is a little more complicated because either the reference itself could be updated or primitive types or references within the object could be modified. When examining objects, both the reference to the object and reference to elements within the object are tracked. Where a dogmatic determination of access intent is determined, it is used to update the appropriate flags. Where an exact access intent cannot be identified, a conservative intent of "update" is assumed.

Reasons for not being able to positively identify access intents

Reasons for not being able to complete the analysis include use of reflection, calls to native methods, or missing classes.

  • The target method uses reflection. Although this may be analyzeable, CmpOPT currently does not. Most application logic does not use reflection, so this should not be a significant concern for most programs.
  • Calls are made to native methods not previously identified as "safe." Since native methods are written in a language other than Java, CmpOPT can't analyze their access patterns, and assumes a conservative answer of "update." There are, however, a set of native methods that can safely be eliminated from consideration (those that perform some of the I/O operations or get the system time, for example). When a native method has been identified as safe it can be placed in CmpOPT's property file. CmpOPT accepts methods that appear in the property file as safe.
  • Classes are missing.A missing class cannot be analyzed, so again, a conservative answer is used.

Analysis results

There are three possible outcomes from the analysis of a remote method:

  • The method only references one or more CMP fields.
  • The method updates one or more CMP fields.
  • The access intent cannot be determined.

Analysis process

Here's the process CmpOPT uses to analyze a JAR:

  1. Validate that the supplied JAR is a valid EJB 1.1 JAR.
  2. Read the EJB-JAR.xml file and build a list of CMP Entity beans to process.
  3. Iterate through the list and process each CMP Entity bean in turn.
  4. For each bean,
    1. Read the list of CMP fields.
    2. Read the list of Remote Methods.
    3. Iterate through each of the methods and analyze the bytecode to determine access intent for CMP fields.
    4. Optionally (depending on the options you select) update the access intent metadata for the method.

To use the tool, invoke the CmpOPT command line utility, specifying the directory and JAR to be analyzed. CmpOPT examines the JAR to confirm it is a valid EJB-JAR, according to the EJB 1.1 specification. It then opens the XML deployment descriptor file that defines the contents of the JAR and gets a list of beans. This list is used to iteratively process each bean. When CmpOPT identifies a CMP Entity bean, it loads the bean and analyzes the bean's bytecode. Depending on the command line options you provide, the JAR is automatically updated with the appropriate access information for the methods analyzed.

By default, CmpOPT provides a report of this analysis. The report lists the remote methods and signatures defined for the Entity bean, an overall status of the method (read-only or update), and, if a CMP field is updated, the line number where the field was modified is also reported. If the access intent is unknown, the reason is stated.

Requirements

CmpOPT requires a machine running Windows NT, Windows 2000, or Windows XP; WebSphere Application Server, Version 4.0 or later; and enough memory to support a 384MG Java virtual machine.

Invocation syntax

Information is updated in the following ways:

Read-OnlyUpdate
FindByPrimaryKeyIf all remote methods simply refer to CMP data but do not make any modifications the findByPrimaryKey method is marked as read.If any of the remote methods modify one or more CMP fields the finder can optionally be specified as "update." This causes a FOR UPDATE clause to be appended to the SQL that is used to locate the data that populates the bean.
Remote bean methodsEach method is marked based on its behaviour. If a method merely refers to data it is marked Read-only.When a CMP field is updated during a remote methods invocation the method is considered as having updated the bean. This flag indicates that the data should be persisted at the end of the transaction.
-ignoreOpenFields
By default CmpOPT is very conservative in its analysis and will choose "update" if there is any question about the possibility of an update to a CMP field. However, there are many times when an object may have a field that is set as public and could accessed by any object. Often these situations do not actually affect the CMP fields. Since the analysis cannot prove that access is read-only, update is assumed. When you specify this option CmpOPT will report these observations as warnings but ignore them.
-ignoreNativeMethods
Many native methods are benign, since they relate to CMP fields. Again, if an EJB method ultimately drives native code the analysis cannot be certain of the outcome and will default to "update" intent. Often though, the native methods are as simple as writing to a file (for example, logging) and do not relate to modifying a CMP field. When you specify this option CmpOPT will report these observations as warnings but ignore them.
-noWarn
Suppresses warning messages

A sample run

We took the WebSphere benchmark sample and ran CmpOPT against it to show how to use the utility. You can see the output from that run here.

Questions and answers

Why do I need to have WebSphere installed?
The CmpOPT uses classes provided by WebSphere to manipulate the JAR files, so you need to have WebSphere installed on the system where you are running CmpOPT.

What platforms can be used to run the CmpOPT tool?
Because this is a technology preview, we've limited the tool to the Windows platform.

Can I use CmpOPT on EJB 1.0 JARs?
No. The utility is specifically designed for EJB 1.1.

Can CmpOPT process an entire Enterprise Archive?
Not currently. The tool is currently limited to operating on a single EJB JAR file.

Downloading and installing CmpOPT

The code offered with this article is considered a technology preview and is offered as is with no warranty expressed or implied.

To make installation and execution simpler, install CmpOPT on the same drive as WebSphere Application Server. Simply unzip the download into a directory called CmpOPT. For example, C:\CmpOPT. All the files you need to run CmpOPT are in this directory. If WebSphere is installed on the same drive as CmpOPT and is in \WebSphere\AppServer, no additional modifications are necessary. Otherwise, simply edit the file cmpopt.bat and make appropriate changes as indicated in the batch file.


Acknowledgements

Matt would like to acknowledge the assistance of the following people with this project: Aaron Kershenbaum, Larry Koved, Bert Laonipon, and Marco Pistoia.



Download

NameSizeDownload method
cmpopt.zip HTTP

Information about download methods


Resources

About the author

Matt joined IBM in 1999 as a performance analyst for WebSphere Application Server. He conducts performance work on several platforms including AIX, Windows, Solaris and z/OS. Currently Matt is representing IBM as a member of JSR-131, which is working on the ECperf 1.1 benchmark as well as the Spec Organization's OSG-Java workgroup. He resides and works in North Carolina. You can contact Matt at hogstrom@us.ibm.com.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Sample IT projects
ArticleID=10136
ArticleTitle=Optimizing container-managed persistence EJB entity beans
publish-date=11012001
author1-email=hogstrom@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).