Solving memory problems in WebSphere applications

Learn how to identify root causes for and solve memory problems in WebSphere® Commerce during system test.

Share:

Jing Sun (sunjing@cn.ibm.com), Software Engineer, EMC

Author1 photoJing Sun is a Software Engineer at the IBM China Software Development Lab, Beijing, China. She works on IBM WebSphere Commerce system testing.



James Tang (mfjtang@ca.ibm.com), Software Developer, EMC

Author photoJames Tang is a Software Developer at the IBM Toronto Lab, Ontario, Canada. He has been a lead developer of the cache component of WebSphere Commerce. He is currently working as an Advisory I/T Specialist for IBM Software Services for WebSphere.



August 2010 (First published 27 June 2007)

Introduction

In Web applications based on WebSphere Application Server, memory utilization can impact system performance significantly. One of the most common memory problems is memory leak, which causes severe performance degradation. In theory, memory leaks should not happen in Java™ because it has Garbage Collection (GC). However, GC only cleans up unused objects that are not referenced anymore. Therefore, if an object is not used, but is still referenced, GC does not remove it, which leads to memory leaks in JVM problems. Beside memory leaks, other memory problems that you might encounter are memory fragmentation, large objects, and tuning problems. In many cases, these memory problems can cause the application server to crash. Many users first notice that application server performance gradually declines, and eventually crashes with OutOfMemory exceptions.

Memory problems are hard to troubleshoot because they have multiple causes. This article provides methods of identifying the root causes of different memory problems and their corresponding solutions. It also introduces a methodology for memory problem determination used in WebSphere Commerce V6 testing. WebSphere Commerce is one of the largest J2EE applications deployed on WebSphere Application Server. This methodology detects and solves memory leak problems in WebSphere Commerce during system test.

Method overview

Figure 1 shows the whole process for determining and solving memory problems. There are five kinds of solution listed in this diagram, which are further explained in the following sections:

  • Tuning the max heap
  • Tuning Xk/Xp
  • Identifying by swprofiler
  • Tuning the cache size
  • Performing the heap dump
Figure 1. Process diagram for memory analysis methodology
Figure 1. Process diagram for memory analysis methodology

Setting verbose Garbage Collection to get WebSphere Application Server logs for memory monitor

To analyze memory problems in the application server, the first step is to gather GC information. You need a tool to analyze this information.

Setting verbose GC to get WebSphere Application Server logs for memory monitor

To monitor the usage of JVM memory, get verbose garbage collection (GC) logs from WebSphere Application Server; that is, native_stdout.log or native_stderr.log under the WebSphere Application Server Installation dir/profiles/default/logs/server1 directory. The default setting of WebSphere Application Server does not enable this, but you can enable it using the following WebSphere Application Server v6.0 example:

  1. Open the WebSphere Application Server administration site by typing http://hostname:port/ibm/console. The port is the number of the HTTP administrative port, which is 9060 by default. Type an ID (any ID without a password) and log in to it.
  2. Select Servers > Application servers > server1 > Java and Process Management > Process Definition > Java Virtual Machine.
  3. Select Verbose garbage collection as shown in Figure 2.
  4. Click Apply and click Save at the top of this page.
  5. Restart WebSphere Application Server.
    Figure 2. Enable verbose GC in WebSphere Application Server V6
    Figure 2. Enable verbose GC in WebSphere Application Server V6

After restarting WebSphere Application Server, you see the verbose GC output in native_stdout.log or native_stderr.log.

Analyzing verbose GC

There are many tools for verbose GC log analysis, such as Tivoli® Performance Viewer, Dump JVM (DMPJVM), and WebSphere's Resource Analyzer. These tools can abstract useful information, and illustrate the trend of JVM heap size usage over time.

After you analyze your native_stdout.log or native_stderr.log, you should generate charts with the following information:

  • Occupancy (MB)
  • Allocation Rate (KB/sec)
  • Total GC pause time (ms)
  • Mark and Sweep Time (ms)
  • Compact Time (ms)
  • GC Cycle length and distribution (ms)
  • Free Space after GC (MB)
  • Free Space before AF (allocation failure) (MB)
  • Size of Request that caused AF (Bytes)

Among these charts, some are helpful in monitoring the effects of GC and detecting many problems. You can use "GC Cycle length and distribution" to analyze GC frequency and distribution, "Free Space after GC" to analyze memory leak, and "Free Space before AF" and "Size of Request that caused AF" to analyze fragmentation or large objects. Other charts can also assist in the analysis.

Figures 3 and 4 show some examples of "Free Space after GC". In a normal "Free Space after GC" graph, where the application is using the Java™ heap properly, the red line should be approximately on a horizontal line, as in Figure 3. In Figure 4, the declining red line means that the free space available to allocate is decreasing. If you suspect a memory problem, continue running the test until an OutOfMemory exception occurs because some downward trends in free space will stabilize after a period of time. This helps you to get better support from WebSphere Application Server and the JDK if the problem is related to WebSphere Application Server or the JDK.

Figure 3. Example of normal "Free Space after GC" chart
Figure 3. Example of normal
Figure 4. Example of "Free Space after GC" chart with problem
Figure 4. Example of

Analyzing fragmentation

If there is a memory problem, but no reduction in free space after GC, check the charts for "Free Space Before AF" and "Size of Request that caused AF". AF means that an object needs heap space, but there is not enough contiguous space available in the JVM heap for it. Generally, AF occurs when the JVM heap is used up. However, AF also occurs if all the free space is fragmented, so that there is no contiguous space for this object. This problem is greatly magnified if there are large object allocations within the application because it becomes unlikely for the heap to have large contiguous space for these large objects. "Free Space before AF" means the size of free space when AF occurs. This space should be a small value because the heap size is nearly used up when AF occurs. Therefore, the red line is always near the bottom of the chart. Figure 5 shows normal usage without fragmentation problems.

Severe fragmentation causes frequent GC cycles, and thus performance degradation. Rising GC frequency is another indicator of fragmentation.

Figure 5. Normal "Free Space before AF" chart
Figure 5. Normal

Tuning the max heap size

If the free space after GC does not decline, check the GC cycle length and distribution, and the total GC pause time. If the time since the last AF in "GC cycle length and distribution" is not too small, but the complete time in "Total GC Pause Time" is high, it means that GC is not very frequent and GC duration is very high. The duration of each GC cycle should be monitored and not exceed 10 seconds, except for a compaction occurring within the cycle. In this situation, the heap is probably too large for the application and GC takes a long time to clean up objects in this large heap, so reduce the maximum heap size. In the other cases where GC frequency is too high, the heap is probably too small for the application and GC needs to run frequently, so increase the maximum heap size.

Solution

To tune the JVM Max Heap size:

  1. Open the WebSphere Application Server administrative console, http://hostname:port/ibm/console, and log in.
  2. Expand Servers > Application servers > server1 > Java and Process Management > Process Definition > Java Virtual Machine.
  3. Change the Max Heap Size to a larger value.
  4. Click Apply and click Save at the top of this page.
  5. Restart WebSphere Application Server.
  6. Try your test case again and see if the problem disappears.

Note that i5/OS® should have no maximum heap size according to the System i Tuning Guide because the allocation model is different from other platforms. After it is set to unlimited, it may take a couple of days to stabilize at 3 GB.

Tuning Xk/Xp

If the free space after GC does not decline, but the time since the last AF in the "GC cycle length and distribution" is always small, there might be some large objects or heap fragmentations. You can try to tune the Xk/Xp parameters to remove most of the fragmentations.

Solution

The solution is to group immovable objects together into pools so that they do not fragment the Java heap. In Java SDK 1.4.2, the GC allocates a kCluster as the first object at the bottom of the heap. A kCluster is an area of storage that is used exclusively for class blocks. The GC then allocates a pCluster as the second object on the heap. A pCluster is an area of storage that is used to allocate any pinned objects. When the default kCluster size is not sufficient to allocate all class blocks, you can use the -Xk and -Xp options to specify kCluster and pCluster sizes.

GC trace provides a guide for optimum Xk and Xp values in version 1.4.2. You can launch this command by typing -verbosegc -Dibm.dg.trc.print=st_verify in the Generic JVM arguments field in the WebSphere Application Server administrative console. After you save the change and rerun the testcase, you can find this line in native_stdout.log:

<GC(VFY-SUM): pinned=4265(classes=3955/freeclasses=0) 
dosed=10388 movable=1233792 free=5658>

The pinned and classes sizes are about the right size needed for the -Xk parameter. Adding 10% to the reported value (3955) is recommended.

The difference between pinned (=4265) and classes (=3955) provides a guide for the initial size of pCluster. You can specify the pCluster and pCluster overflow sizes using the -Xp command-line option:

-Xpiiii[K][,oooo[K]]

iiii specifies the size of the initial pCluster in KB, and oooo optionally specifies the size of overflow (subsequent) pClusters in KB. Default values are 16 KB for iiii and 2 KB for oooo.

Figure 6 is an example of setting Xk and Xp, which specifies -Xk22000 -Xp64k,16k in Generic JVM arguments. If the problem persists, experiment with higher initial pCluster settings and overflow pCluster sizes. After this tuning, if there are fragmentations left, you can suspect large object problems.

Figure 6. Tune Xp/Xk in WebSphere Application Server administrative console
Figure 6. Tune Xp/Xk in WebSphere Application Server administrative console

Identifying by swprofiler

There are two situations which you can suspect that unusual large objects exist. One is if, after tuning Xk and Xp parameters, there are still fragmentation problems. In this situation, the free space before AF has a large value, and stays at a high level as shown in Figure 7. The free space is even larger than 500 MB when AF occurs. In this case, suspect some unusual large objects.

Figure 7. Large free space when AF occurs
Figure 7. Large free space when AF occurs

In another situation, the free space after GC is declining. The AF request space is not only large, but also growing as shown in Figure 8. In this case, you can also suspect large object problems.

Figure 8. Free space after GC declining with AF request space growing
Figure 8. Free space after GC declining with AF request space growing

Solution

To identify the large object problem, you to use the swprofiler tool. This tool helps you print the stack information of an object by setting the allocation limit and depth. To use this tool:

  1. Download the profiler.zip file, and unzip it. Note that with the later version of the JDK, a built-in function has been created to replace the use of the swprofile. For details, see Technote: How to identify the Java stack of a thread making an allocation request larger than a certain size.
  2. Get the proper lib file from the unzipped folder, and copy it to the bin path under the WebSphere Application Server installation folder. For example, copy this file to WAS_Home\java\jre\bin.
  3. For an AIX, Linux, Linux on zSeries, or Windows® platform, type -Xrunswprof in the Generic JVM arguments field in the WebSphere Application Server administrative console. If your platform is Solaris, this command is -Xrunallocprof.
  4. Restart WebSphere Application Server.

You see information about the swprofiler in native_stdout.log or native_stderr.log after restarting WebSphere Application Server, such as the following lines:

Swprofiler loaded OK
Allocation limit: XXXX, Depth: YYY

You can try to configure this allocation limit and depth. After that, when the JVM needs to allocate an object that requires space larger than the allocation limit, the tool records this allocation in the stderr log. In WebSphere Application Server V6, to set the limit and depth values:

  1. Open the WebSphere Application Server administrative console, http:// hostname:port/ibm/console, and log in.
  2. Expand Servers > Application servers > server1 > Java and Process Management > Process Definition > Custom Properties.
  3. Click New and add two pairs of properties and values, for example:
    ALLOC_LIMIT 600000
    ALLOC_DEPTH 10
  4. Save your changes and restart the server.

After setting the allocation limit and depth (10 levels of thread stack in this example), you see the following information about the allocation stack printed in the stderr log:

<AF[591]: completed in 106 ms>
Large object allocated: size 22616428
at testalloc in class com/test/OOMTest 21616424
at service in class javax/servlet/http/HttpServlet

The final step is to locate the suspect large object by finding "Large objects" in native_stderr.log. If your problem is with growing objects as illustrated in Figure 8, find and fix objects with increasing size. After fixing the large object problems, confirm whether the new Free Space after GC, Free Space before AF, and AF Request Size graphs are normal. It is important to know that you cannot remove the fragmentation. You can only reduce or minimize to a degree.

Figure 9. Free space before AF improves after removing suspected large objects
Figure 9. Free space before AF improves after removing suspected large objects

Tuning the cache size

If the free space after GC declines with no growing AF, then most likely cache tuning is causing the memory problems. Caching does not necessarily mean DynaCache; it includes other types of cache that are being used. One solution is reducing the in-memory cache size and letting the overflow entries use the disk cache if possible.

Sometimes an excessive number of cache objects look like a memory leak since the cache grows as the application receives an increasing load. Customers and testers need to find a stabilization point where the system does not generate OutOfMemory errors due to too many cached objects. Tune the cache size according to how much heap size is being used. In Figure 10, the example shows an OutOfMemory exception. After further examination, you see that the cached page is 60-100 KB in size and the number of cache entries is set to 5000. Therefore, half of this 1 GB heap is allocated to the cache!

Figure 10. Free space reaching zero because of increasing cache size
Figure 10. Free space reaching zero because of increasing cache size

Solution

In most cases, you can adjust the DynaCache size to eliminate such problems. To tune the DynaCache size:

  1. Log in to the WebSphere Application Server administrative console.
  2. Expand Servers > Application servers > server1 > Container Services > Dynamic Cache Service > Cache size.
  3. Set the value of the cache size you want as shown in Figure 11.
  4. Save your changes and restart the application server.
Figure 11. Tuning the DynaCache size
Figure 11. Tuning the DynaCache size

Figure 12 shows the memory usage of the same test case after tuning the number of cache entries to 3000, and enabling the disk offload to let the overflow entries use the disk cache. The application has stabilized at 100 MB free space in the heap after 3 times the duration of the first test where the OutOfMemory exception was encountered. From Figure 12, you can see at the beginning the cache is warming up, causing a drop in free space. However, towards the end the free space, the cache is stabilizing, which means it is now fully warmed.

Figure 12. Free space after reducing cache size
Figure 12. Free space after reducing cache size

Performing the heap dump

If all the charts are normal except "Free Space after GC declining", suspect a memory leak. In this situation, performing a heap dump is recommended.

Solution

There are some tools that can help you perform this analysis. We used IBM® HeapDump, a utility shipped with the IBM JDK. It lets you dump all the living objects in the Java heap into a text file called heapdump. This tool analyzes the memory usage of every Java object. This is a step to find which of them are consuming JVM space.

Here is an example of setting up the heapdump in WebSphere Application Server v6.0. To configure IBM_HeapDump, add these Name and Value pairs in the WebSphere Application Server administrative console as shown in Figure 13.

Figure 13. Name and Value pairs
Figure 13. Name and Value pairs

To do this:

  1. In the administrative console, open Servers > Application Servers > server_name > Java and Process Management > Process Definition > Environment Entries > New. From there, you can set these name-value pairs. This setting is specific for WebSphere Application Server v6.0. If you do not set the IBM_HEAPDUMPDIR, the default output directory is the root directory of your application server.
  2. Save your modification and restart the application server. Record the JVM PID of your application server process.
  3. At the time when you want to collect a heapdump, you can signal it by running kill -3 JVM_PID. This generates files named heapdump.date.time.pid.txt and javacore.date.time.pid.txt under your IBM_HEAPDUMPDIR. This tool signals multiple heapdump points at the point after a memory leak occurs. This is important to analyze the root cause of the leak.

The commands for generating the heapdump file are different on different operating systems:

  • On Solaris, use kill -HUP JVM_PID.
  • On most UNIX platforms, use kill -3 JVM_PID.
  • On a Windows® system, there is a sequence of commands that causes a heapdump:
    WAS_HOME\profiles\instance_name\bin\wsadmin.bat
    wsadmin>set jvm [$AdminControl completeObjectName  type=JVM,process=server1,*]
    wsadmin>$AdminControl invoke $jvm dumpThreads

With the generated heapdump and javacore files, you can analyze what caused this memory leak problem. Various analysis tools exist to help with this investigation. One useful tool is the HeapAnalyzer tool that you can download. This tool displays a list of leak candidates after analyzing the heap dump files.

Conclusion

This article described the process of problem determination for memory problems in a J2EE application based on WebSphere Application Server v6.0. This process helps system administrators, testers, and support staff to find and determine memory leak problems in a WebSphere Commerce v6.0 system. Initial problem determination from testers can help reduce the problem solving cycle, thereby reducing the time to find a solution.


Download

DescriptionNameSize
Sample code for this articleprofiler.zip12KB

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=236623
ArticleTitle=Solving memory problems in WebSphere applications
publish-date=08272010