When there is a suspected memory leak in a Portal application, or in the Portal itself, I typically follow this process to collect the heapdumps necessary to properly debug the problem:
- Configure the JVM to take a heapdump when signaled, which differs by JVM and OS. Instructions for this can easily be found on the Web.
- Find some portlet's help JSP in the filesystem (under installedApps) and add a call to System.gc() somewhere in the JSP. That allows me to "strongly suggest" to the JVM to perform a GC at the moment I call up that portlet's Help screen in the portal. I say "strongly suggest" because the exact time when the JVM actually performs a GC, and what kind of GC it is (full or partial), is really up to the JVM.
- Run some amount of load on the system for enough time to warm up all the applications and get a constant number of users running (don't increase the number of users after this point).
- Stop the test, leave Portal running, and allow the system to quiesce, including allowing enough time for any remaining HTTP requests to complete and idle HTTP sessions to expire (30 minutes, by default).
- Click the Help link on that portlet to force a GC. Yes, this will require a user session, but you could put this portlet on a public page as well.
- Restart the test without restarting Portal. Let it run for a long enough time to where you think enough additional memory is consumed by the supposed leak that you would be able to spot the leak in an analysis of the heapdumps.
- Repeat steps 4, 5 and 6.
- Repeat steps 4 and 5.
Now that you have three heapdumps spanning a relatively long period of time, you need to analyze the heapdumps to look for possible leak suspects. I use HeapAnalyzer from alphaWorks (http://www.alphaworks.ibm.com/tech/heapanalyzer). Be warned, though, that for heapdumps taken from heaps of size 1.5GB or more, you will need a LOT of memory on the system where you run HeapAnalyzer to analyze the heapdump. I would recommend running it on a 64-bit system where you can configure the tool itself with a massive heapsize (7GB or more). It will take a long time to analyze it too.
Once analyzed, the tool can be used to point out suspected memory leaks. By having the system quiesce (no active requests or session), there should be a large disparity between the leak suspects and other allocated "noise" in the heap, especially as you analyze the two older heapdumps.
In terms of detecting whether you actually have a memory leak situation versus simply running out of memory because of running too many requests through a single portal, look at the verboseGC output. The JVM heap will fill up over time, and sometimes quickly depending on traffic patterns, but once it reaches 90% capacity or so, the JVM should perform a full GC with compaction, to defragment the heap and claim as much memory as possible. I call the point it returns to the "low water mark". If over time, during a load test with a constant number of users, you see this low water mark creep upwards, then you may have a memory leak. Ideally, it should return to about the same point each time.
I have used AlphaWorks' PMAT tool (http://www.alphaworks.ibm.com/tech/pmat) to graphically detail the GC cycles. It is very simple to visually see the pattern and determine if you see the low water mark creeping upwards over time.