This article discusses performance tuning techniques and debugging methodologies for J2EE™ applications developed on DB2 Content Manager V8.x. Although most of the techniques and methodologies are derived from performance testing with DB2 Content Manager eClient 8.x, it applies to all J2EE applications on DB2 Content Manager.
The article focuses on the high-level architecture of the J2EE application, introducing the concept of queuing a network, identifying key components in the queuing network, and listing key tuning parameters for each component in the queuing network. We also cover the impact of JVM "garbage collection" and tuning parameters. Finally, the article focuses on identifying three different types of performance goals and their tuning and debugging methodologies.
As the figure below shows, an HTTP request is issued by a browser as a result of a user's action (clicking on a link, button etc). A Web server receives the request and forwards it to the Web server plug-in, which identifies a Web container that should serve the request. The Web container, upon receiving the request, assigns a Java thread for execution of the request and invokes the appropriate servlet. The servlet, then, invokes appropriate DB2 Content Manager APIs, which returns DDO/XDO objects. The DB2 Content Manager API internally uses a JDBC™ connection pool to communicate with a DB2 Content Manager server. The request is then forwarded to a Java server page (JSP). The JSP generates a dynamic HTML page using the DDO/XDO objects. The DDO/XDO objects may or may not be cached by the application for future use.
J2EE application architecture
As described above, the request is passed though various components during its processing. Together, these components form a queuing network. To optimize performance of an application, you should attempt to minimize wait time at each component.
WebSphere queuing network
Tuning parameters for each component
We do not give all the details on tuning parameters and suggested values for them. Instead, we focus on describing the impact of the parameters on the performance and scalability of an application. Please refer to the eClient 8.2 Tuning Guide for information on the suggested values of the parameters.
-
KeepAlive:
This parameter controls whether or not connection to a browser should be kept alive once the Web server serves the request. By keeping the connection alive, the overhead of establishing a new connection is eliminated and hence future requests from the browser is served faster. At the same time, keeping the connection alive increases the resource utilization by the Web server -
MaxKeepAliveRequests:
This parameter controls the maximum number of live connections to a web server. Higher number of live connections allows more browsers to keep the connection alive and hence improves response times of the requests. But, as mentioned earlier, it increases resource utilization by the Web server -
MaxRequestsPerChild:
This parameter controls the maximum number of requests a Web server process serves before it is restarted. This parameter is useful on platforms where Web server processes are known to leak memory (ex. Solaris). -
ThreadsPerChild:
This parameter controls the maximum number of concurrent requests handled by a Web server. Once this limit is reached, new requests are rejected and hence limits scalability of the system.
-
Thread pool size:
This parameter controls the number of threads in a Web container. A high number of threads allows the Web container to serve more concurrent requests from the Web server and hence improves the performance of the system. But it also increases CPU usage and hence limits the scalability of the system -
Connection backlog:
When a thread pool is fully utilized, more concurrent requests to the Web container are queued. This parameter controls the maximum number of requests a Web container can queue.Therefore, the value of thread pool size and connection backlog together determines the maximum number of concurrent requests a Web container can serve (for example, for thread pool size value set to 50 and connection backlog value set to 500, a Web container can only server 550 concurrent requests.). It is important to make sure that the maximum concurrent requests served by a Web server is never more than the maximum concurrent requests a Web container (or set of Web containers in a clustered environment) can serve.
-
Min/Max size
This parameter controls maximum and minimum number of JDBC connections in a connection pool. To avoid overhead of creation/destruction of connections in the pool, set both the parameters to same value. A high number of connections in the pool increases resource (memory, network I/O etc.) consumption by both mid-tier and backend server and a low number of connections in the pool increases the response time of a request. Hence, the size of a connection pool should be optimized for the expected workload on the system. A rule of thumb is one connection for every 10 users. -
Statement Cache size
This parameter controls the maximum number of prepared statements cached by a connection pool.
Impact of JVM garbage collection on performance and scalability
Garbage collection (GC) reclaims portion of the heap by removing objects not referenced by any other object in the heap. All threads in the JVM are suspended until GC is complete (with the exception during a minor GC in hotspot JVM). GC is called a "stop-the-world" scenario since all activity in the system is blocked for the duration on GC. Hence, the frequency and duration of GC has a great impact on both performance and scalability of J2EE applications
The frequency and duration of GC depend on three factors:
-
Java heap size
A smaller heap causes frequent GC; on the other hand, a larger heap causes GC to run a longer time. -
Number of short-lived objects
If an application creates too many short-lived objects, GC occurs frequently. -
Garbage collection algorithm
GC algorithms are vendor specific. More information about IBM's JVM and its GC algorithm can be obtained from the JVM Diagnostics Guide. More information about Sun's hotspot JVM and its GC algorithm can be obtained from Tuning Garbage Collection.
Interpreting "verbosegc" for IBM's JVM
IBM's JVM 1.3.1 uses mark-and-sweep garbage collection (i.e., GC) algorithm. GC occurs in three phases: mark, sweep, and optionally compact. In the mark phase, all the "live" objects are added to mark vector. In sweep phase, all the objects that have been allocated but are no longer referenced are identified. When the garbage has been removed from the heap, GC can consider compacting the resulting set of objects to remove the spaces between them. Because compaction can take a long time, the GC tries to avoid it if possible and in fact compaction is a rare event.
<AF [5 ]:Allocation Failure.need 32 bytes,286 ms since last AF> <AF [5 ]:managing allocation failure,action=1 (0/6172496)(247968/248496)> <GC(6):GC cycle started Tue Mar 19 08:24:46 2002 <GC(6):freed 1770544 bytes,31%free (2018512/6420992),in 25 ms> <GC(6):mark:23 ms,sweep:2 ms,compact:0 ms> <GC(6):refs:soft 1 (age >=4),weak 0,final 0,phantom 0> <AF [5 ]:completed in 26 ms> |
Above is a sample GC output for IBM's JVM. The first line in the tag shows that the thread needed 32 bytes and it has been 286 milliseconds since the last memory allocation failure (i.e., since the last GC occurred). Memory allocation failure happens when the heap has reached maximum size and cannot allocate more memory. The allocation failure kicks off GC to free up memory. The fourth line in the tag shows the current heap size is roughly 6MB; in this GC cycle roughly 1.7 MB was freed and currently 31% of the heap is free. The last line in the tag shows the GC cycle was completed in 26 milliseconds.
Interpreting "verbosegc" for Sun's JVM
SUN's JVM 1.3.1 uses a generational garbage collection algorithm. This algorithm assumes that the majority of objects in an application die "young." Hence, memory is managed in generations: memory pools holding objects of different ages. Garbage collection occurs in each generation when it fills up; objects are allocated in Eden, and because of infant mortality most objects "die" there. When Eden fills up, it causes a minor collection, in which some surviving objects are moved to an older generation. When older generations need to be collected, there is a major collection that is often much slower because it involves all living objects. The longer an object survives, the more collections it will endure and the slower GC becomes. By arranging for most objects to survive less than one collection, garbage collection can be very efficient.
[GC 325407K->83000K(776768K), 0.2300771 secs] [GC 325816K->83372K(776768K), 0.2454258 secs] [Full GC 267628K->83769K(776768K), 1.8479984 secs] |
The first two lines above show minor GCs. Each GC took roughly 230 milliseconds and freed roughly 0.7 MB memory. The third line shows Full or Major GC. It took 1.8 seconds and freed roughly 0.7 MB. It shows that Major GCs are very expensive and therefore should be avoided as much as possible.
Our experiments show that, for heap size larger than 512 MB, major GC takes more than five seconds, which causes the system to destabilize. Therefore, we concluded that on the Sun platform, heap size larger than 512 MB results in performance degradation. Further information about JVM tuning parameters can be obtained from the eClient Tuning guide.
Understanding performance goals
Performance goals can be divided into three major categories:
-
Response-time goals
Goals are set for an "acceptable" response time for a single operation. For example, response time for a simple parametric search with 100 hits should be <= 1 sec. -
Throughput goals
Given "acceptable" response time for a set of operations, determine maximum transaction throughput a system can achieve. Goals are set to achieve, for example, 25000 views/hr on a particular setup. -
Scalability goals
Given maximum transaction throughput, determine how many concurrent users can be served by the system. Goals are set to achieve, for example, 25000 views/hr with 400 concurrent users on a system
Debugging methodology for single-user response time issues
Performance logs from various components can easily isolate problems with singler-user reponse time issues. Collect eClient (or your custom application), DB2 Content Manager API, LS and RM logs in "PERF" mode. Identify where the time is spent and do code path analysis if needed. It is important to make sure time is not spent on the network, especially for view and ingest of large documents.
Here is a scenario where TCP stack tuning improves the response time for view operations. Let's say that eClient trace logs indicates that loadDoc() takes three seconds. The DB2 Content Manager API log, dklog.log, shows that calls to RM to retrieve document is taking 2.9 seconds. RM reveals that the request for retrieving a document was served within 0.2 seconds. One can easily conclude that most of the time is spent in data transfer from the RM server to the mid-tier server. This is especially true when Windows® mid-tier communicates with the AIX®/SUN back-end server. In this case, you should tune the TCP stack on both mid-tier and RM machines.
Debugging methodology for throughput issues
Throughput issues are generally caused by increased wait time during request processing due to contention of resources (CPU, memory, disk, thread pool, connection pool, etc.). For example, if thread pool size in a Web container is set to 50, only 50 concurrent requests can be handled by the container. The 51st request will be queued until one of the 50 requests is finished. Hence, to achieve maximum possible throughput, all components/resources in the queue must be optimally tuned to reduce (or eliminate, in many cases) the wait time. WebSphere® resource analyzer (Tivoli® Performance Analyzer in WebSphere 5.x) is an excellent tool to monitor critical resources (Thread Pool, Connection Pool, JVM, Web Container, etc.) and to identify bottlenecks in the J2EE environment.
Probing a thread pool shows the average number of active threads, number of threads created/destroyed, the average wait time for a request, etc. The average number of active threads is a gauge for the number of concurrent requests reaching the container. The pool is not large enough to serve the workload, if average number of active threads is constantly close to maximum size of the pool. Consider increasing the size of the pool only if the CPU is not fully utilized. If the number of active threads is sharply fluctuating or very low for the expected workload, the requests are not reaching the Web container in an orderly fashion. This could be because they are queued at router/switch on the network due to congestion or because Web server plug-in is not forwarding requests to the Web container.
Probing a Web container shows average response time for a jsp/servlet, the number of concurrent requests to a jsp/servlet etc. If response time of a jsp/servlet is increasing with number of concurrent requests, it is a clear indication of resource contention. Do code path analysis to see if there is a synchronized block in the code that is blocking requests to execute concurrently. In some cases, it is useful to dump the threads to see exactly which method they are blocked on. The DrAdmin tool supplied with WebSphere can be used to take a thread dump (use the following command, DrAdmin -serverPort <serverPort> -dumpThreads').
Probing a connection pool shows the number of connections created, the number of connections destroyed, the number of connections taken out of the pool and returned back to the pool, etc. Observe average wait time for a connection and the percentage of the pool used to find if the threads are waiting to get a connection from the pool. Increase the pool size if that's the case. Also, observe the average number of connections created and destroyed. Increase idle time out if too many connections are being destroyed and re-created.
Probing a JVM shows total memory used, percentage of free memory, percentage of used memory, etc. Observe the percentage free memory to identify memory leaks. Memory leaks cause frequent GCs and result in high CPU usage and low throughput
If probing for the above resources does not identify any bottleneck (i.e., resource contention), observe the system resources (like network I/O, disk I/O, paging etc.). Lastly, monitor CPU usage as the workload increase. CPU usage should grow linearly as the workload increases. If CPU is maxed out before throughput goal is reached, try to reduce thread pool size. If CPU usage goes down and response time is still within "acceptable" limit, we achieved the goal. If we cannot tune the system, it means we hit the limit on maximum throughput the system can achieve. If CPU is not maxed out but is spiking, look for GC behavior. If GC cycles correspond with CPU spikes, garbage collection is using too much CPU. Apply JVM tuning techniques to reduce the frequency and duration of GC.
Performance debugging methodology for scalability issues
Here is a problem description: 10 users with "no think time" achieve the throughput goal but 400 users with "practical think time" do not achieve the same throughput. This type of issue is mainly caused by the growing resource contention with the number of users (for example, shared memory usage grows as the number of users increase). All the technique described above can be used to identify the bottleneck. In this case, pay attention to resource utilization as the number of users grows
Performance tuning and debugging of an J2EE application requires deep understanding of not only J2EE architecture but also of Web servers and memory allocation models of both JVM and operating systems to tune the resources. This article built the necessary background for performance tuning and debugging by first talking about how an HTTP request to a J2EE application is processed by various modules in the form of a queue and parameters to tune each module in the queue. Next, garbage collection algorithms for both Sun and IBM JVM and its impact on performance and scalability of the application were discussed. Finally, typical performance and scalability issues with a J2EE application were divided into three categories and a systematic debugging methodology was described.
- Download the Content Manager V8.2 Performance Tuning Guide.
- Tuning Garbage Collection
- JVM Diagnostics Guide




