The Memory Analyzer Tool (MAT) is the recommended tool for analyzing heapdumps and system dumps, whether to find the cause of an OutOfMemoryError (OOM) or memory leak, to size a JVM, or other types of debugging . The tool is available for free with the IBM Support Assistant Workbench (ISA) or it can be downloaded from eclipse.org and extended with updates to support IBM dump types.
One common memory issue is a Java memory leak. This is not like a traditional memory leak in C/C++ where memory is mis-managed, but rather that the application... [More]
Information Centers are one place where IBM publishes official product documentation. A list of all information centers can be found here: http://www.ibm.com/support/publications/us/library/ . The WAS InfoCenters are: WAS 8: http://publib.boulder.ibm.com/infocenter/wasinfo/v8r0/index.jsp WAS 7: http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp WAS 6.1: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp WAS 6.0: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp Each WAS InfoCenter is split up by product... [More]
If you often have to create new application servers, you should consider application server templates: http://www14.software.ibm.com/webapp/wsbroker/redirect?version=compass&product=was-nd-mp&topic=trun_create_templates The process is simple: create one application server, add your customizations (JVM arguments, data sources, etc.), then click Templates... > New, and select that server. Starting in WAS 7, you can even edit the template after it has been created. Then, when you create a new application server, either through the... [More]
I was recently at a customer who believed that they had a Java memory leak. They compared heapdumps and couldn't find anything. They had experienced production OutOfMemoryErrors (OOMs) before (for a different reason), and they were so worried about what they perceived, that they increased the maximum heap size to 4GB so that the JVM could handle a day's worth of work, and then they put in a process to restart the JVMs every night.
At a first glance of verbose garbage collection, I agreed with them (loaded in the wonderful Garbage... [More]
Garbage Collection and Memory Visualizer (GCMV) is a great tool to visualize verbosegc and it is available for free in the IBM Support Assistant . One of its lesser well known features is the ability to compare verbosegc from two different JVM runs. This is particularly useful if you changed something and you want to see the effect. First, load the baseline verbosegc as normal. Next, right click anywhere in the plot area and click 'Compare File...': Next, ensure that the X-axis uses a relative format such as hours, instead of date. Otherwise,... [More]
IBM Research recently made its WAIT tool (Whole-system Analysis of Idle Time) available to the public: https://wait.researchlabs.ibm.com/
The WAIT tool takes javacores and operating system statistics snapshots for a period of time as input, and produces a rich webpage as output that visualizes this data. Since WAIT only uses javacores and operating system scripts, you don't need to install anything or even restart the Java processes to use WAIT. For administrators familiar with the performance, hang, and high CPU MustGathers , the... [More]
The IBM Extensions for Memory Analyzer (IEMA) are a set of free plugins that extend the powerful Memory Analyzer Tool with product-specific knowledge when analyzing heapdumps and system dumps. For example, IEMA can list out all HTTP sessions, attributes such as their JSESSIONIDs and user names, and session attributes in a nice tree view for WAS dumps (note that much of the additional information is only available with system dumps on IBM JVMs). For more information on WAS-related extensions in IEMA, see... [More]
The IBM Java HealthCenter is a low overhead agent shipped with the JVM that can provide deep insight into JVM activity for IBM Java 5 and above: http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/ . HealthCenter is similar to VisualVM . HealthCenter can provide information on classloading, JVM arguments, garbage collection, file I/O, locking, native memory, threads, and profiling. The last item, profiling, is my favorite feature of HealthCenter -- it is a very low overhead, sampling profiler that can pinpoint high CPU issues or... [More]
Update : ActiveCount has been made a part of the PMI Basic collection (enabled by default) in WAS 188.8.131.52 and 184.108.40.206 .
WebSphere Application Server comes with the "Basic" level enabled in the Performance Monitoring Infrastructure (PMI) on every server by default . This usually has an overhead of about 2-3% . The Basic level comes with one statistic for all thread pools: PoolSize (see the "Level" column in the link). This is defined as "the average number of threads in pool." This is a bit confusing: it is... [More]
WebSphere Application Server comes with a built-in HTTP(S) server. This post covers different methods of printing the response times of HTTP(S) requests. If all you need are averages, then the built-in Performance Monitoring Infrastructure (PMI) provides average statistics for HTTP(S) response times. However, if you need information on particular requests, then averages may not help . The most robust solution is to use a monitoring product such as ITCAM . This article will cover the basic capabilities that are built-in to WAS. Method #0: Web... [More]
I'm not sure how this applies to modern computer chips and operating systems, but here is interesting research from Liedtke in 1995 and 1997 showing the overhead of system calls:
For measuring the system-call overhead, getpid, the shortest Linux system call, was examined. To measure its cost under ideal circumstances, it was repeatedly invoked in a tight loop. Table 2 shows the consumed cycles and the time per invocation derived from the cycle numbers. The numbers were obtained using the cycle counter register of the Pentium processor. Linux... [More]
Update: For WAS 8, see https://www.ibm.com/developerworks/mydeveloperworks/blogs/kevgrig/entry/using_visual_configuration_explorer_with_was_8
Visual Configuration Explorer (VCE) is a free tool available in the IBM Support Assistant . The major features I like about it are: 1) it can visualize parts of a WAS configuration such as servers, core groups, etc., 2) it can run the configuration against a database of common issues, and 3) it can compare parts of a configuration to each other (e.g. compare two nodes or servers), or compare two or... [More]
This has nothing to do with WebSphere, but I recently had some local fan failures and did some research: The maximum operating temperatures for an IDE or ATA spinning disk hard drive are 0 to +60°C (+32 to +140°F). Hard drives have internal temperature sensors (which computer programs can also read and report) and some may shut themselves off after hitting +65°C: Warnings are issued when the temperature exceeds the customer set threshold (or the default value 60°C). While this threshold can be disabled by the customer, the shutdown threshold,... [More]
The DynaCache MBean is a scriptable interface to interact with DynaCache caches at runtime: http://publib.boulder.ibm.com/infocenter/wasinfo/v8r0/index.jsp?topic=%2Fcom.ibm.websphere.javadoc.doc%2Fweb%2FmbeanDocs%2FDynaCache.html For example, to clear a DynaCache cache instance on a server at runtime, use the following Jython wsadmin code: AdminControl.invoke(AdminControl.completeObjectName("type=DynaCache,node=MYNODE,process=MYSERVER,*"), "clearCache", "MYCACHEINSTANCE") The documentation for clearCache says... [More]
Update: I discussed this issue with Nigel Griffiths (the 'n' in nmon) and he has posted information on mpstat -d which is much easier to use and gives basically the same (and in some senses more) information about process and memory affinity: https://www.ibm.com/developerworks/mydeveloperworks/blogs/aixpert/entry/mpstat_d_and_the_undocumented_stats133?lang=en . You should also read his multi-part series on local, near & far memory: https://www.ibm.com/developerworks/mydeveloperworks/blogs/aixpert/tags/entitlement?lang=en .
When using... [More]
The WAS Data Replication Service (DRS) (  ,  ) provides a mechanism to share data across JVMs. For example, this can be used by Dynacache for servlet caching , or by applications in an object cache using the DistributedMap API . There are a lot of knobs including security, invalidations, disk offload, different mechanisms and topologies for sharing, dependencies, timeouts, ring buffers, tuning such as batching, and more. DRS works on top of the High Availability Manager and DCS.
The sharing mechanisms are: Both Push and Pull... [More]
Wireshark (formerly called Ethereal) is a great, free, open source tool to analyze network packet captures. To gather packet captures on various operating systems, see http://www-01.ibm.com/support/docview.wss?uid=swg21175744 . Wireshark supports decrypting some types of SSL traffic: http://wiki.wireshark.org/SSL . The following developerWorks article covers the basics of using this feature: http://www.ibm.com/developerworks/web/tutorials/wa-tomcat/section4.html . In this post, I'll cover some WAS specific tips.
By default , WAS uses... [More]
Wireshark supports automation through a scripting language called Lua . I've created a simple Lua script which extracts all HTTP requests and responses to a file. For example:
["Dec 8, 2011 10:33:30.267266000 PST" src:127.0.0.1:37204 dst:127.0.0.1:9094 stream:0] HTTP Request: POST localhost:9094/ibm/console/j_security_check
["Dec 8, 2011 10:33:30.292939000 PST" src:127.0.0.1:9094 dst:127.0.0.1:37204 stream:0] HTTP Response: 302 Found
Here's how to run the script:
tshark -r $FILE.pcap -X lua_script:printhttp.lua... [More]
The IBM HTTP Server (IHS) ships with a cool little command line utility called Apache Bench (ab) because IHS is based on httpd . At its simplest, you pass the number of requests you want to send (-n), at what concurrency (-c) and the URL to benchmark. ab will return various statistics on the responses (mean, median, max, standard deviation, etc.). This is really useful when you want to "spot check" backend server performance or compare two different environments, because you do not need to install complex load testing software, and... [More]
The Memory Analyzer Tool (MAT) creates index files after parsing a dump the first time so that reloading a dump is much faster. These index files are saved to the same directory as the dump file. In fact, you can load a dump on a beefier machine and then copy over the index files to your slower machine and get the benefits of loading a dump faster.
One issue I recently hit was that every time I would reload a dump, I would get the message "Reparsing heap dump file due to out of date index file," and the dump would be completely reloaded.... [More]
Update (May 30, 2013) : Note that it is preferable to use IBM Java's capabilities to produce system dumps instead of gcore. See why .
The Linux gcore command (also part of gdb) is a very useful command to create a coredump of a running process without killing the process. This is particularly useful starting with WAS 220.127.116.11 and 18.104.22.168 as a core dump can be read directly by the Memory Analyzer Tool, without running jextract, so that core dumps become as easy to use as PHD heapdumps, while providing much more information .
In a previous post , I covered some of the ways to log per-HTTP response time information in WAS. Good news -- a new and more standard approach has been added in WAS 22.214.171.124 through PM46717 . WAS has had NCSA logging -- which is most commonly know through the LogFormat that httpd/IHS uses by default in its access/error logs -- but the format in WAS was not customizable, and did not have response times. WAS 126.96.36.199 introduces a customizable format which includes %D: "The elapsed time of the request. Millisecond accuracy, microsecond... [More]
Recent versions of both the IBM and Oracle JVMs support late attach functionality. This allows you to "inject" either a native JVMTI or Java agent into a running JVM which can run arbitrary code without having to restart the JVM. For IBM Java, the versions are Java 5 >= SR10 (disabled by default) [WAS >= 188.8.131.52], Java 6 >= SR6 (enabled by default on non-z/OS platforms) [WAS >= 184.108.40.206], Java 6 R26 [WAS 8], and Java 7.
If you're in a really sticky situation where you can't restart a server, but you need to do something... [More]
Someone asked for a quick reference on native eye catchers, and I had a hard time finding a comprehensive reference through the search engines, so here's a quick rundown.
Eye-catchers are generally used to aid in tracking down native memory leaks or native OutOfMemoryErrors. After you've checked all the obvious culprits, at some point you may have to manually page through a hexdump. An eye-catcher, as its name suggests, is some sequence of bytes that has a low probability of randomly appearing in memory. If you see one of your eye-catchers,... [More]
Starting in WAS 220.127.116.11 running on IBM JVMs, the default artifacts produced on an OutOfMemoryError have changed. In addition to the old defaults (first four OOMs create a PHD, javacore, and snapdump), a system dump will be produced on the first OOM: http://publib.boulder.ibm.com/infocenter/wasinfo/v8r0/topic/com.ibm.websphere.base.doc/info/aes/ae/ctrb_java626.html .
Some customers have become used to system dumps meaning crashes, so don't be worried. System dumps are a richer and alternative artifact to understand OOMs using the Memory... [More]
I recently saw an interesting situation where it turned out that a network packet was truly lost in transmission (root cause yet to be determined, but probably in the operating system or some security software). This happened between IHS and WAS and it caused IHS to mark the WAS server down. The symptom in the logs was a connection reset error. There are two points I thought were interesting that I'll cover: 1) If the customer had only gathered network trace from the IHS side, they might have concluded the wrong thing, and 2) It may be... [More]
When building a custom Fedora Linux kernel , after you've created a kernel patch and add PatchN/ApplyPatch lines to kernel.spec and run rpmbuild, you may get a prompt that asks 'File to patch:'. In my case, this is because I added the ApplyPatch line after the last ApplyPatch line in kernel.spec, but this turned out to be in the wrong directory. Instead, search for "# END OF PATCH APPLICATIONS" and add the ApplyPatch line before this comment (confusingly, the comment does not mark the last ApplyPatch line!).
This error can occur on Windows, particularly around socket operations. The error is translated from the Winsock error code WSAENOBUFS , 10055. The most common cause of this error is that Windows is configured for the default maximum of 5,000 in-use ports. This can be monitored by watching netstat or perfmon and can be changed with the MaxUserPort registry parameter .
A more advanced cause for this error is non-paged pool exhaustion. The paged and nonpaged pools are areas of memory for certain Windows kernel-mode allocations such as the... [More]
This is a simple wsadmin script that can start or stop binary PMI/TPV logs (of infinite duration and up to 200MB rolled over 20 files) across a cell. The two parameters required are -action (start, stop or list) and -userprefs with your user name (I couldn't find a way to query the user name that was passed in). With no other arguments, this will start PMI logging (at whatever level configured for, which by default is PMI Basic) on every application server in the cell. If you pass -node, it will only do it on the application servers on that... [More]
A recent customer on RedHat Enterprise Linux 6 (RHEL6) was running WAS 8, 64-bit. We noticed that the virtual size of the process was over 14GB (ps -p $PID -o vsz,rss). The maximum heap (-Xmx) was 5.5GB, so we were concerned there was a native memory leak. In IBM Java 626 (which ships with WAS 8), javacores have a NATIVEMEMINFO structure which tracks most JVM native allocations, including -Xmx itself, classes, classloaders, JIT, shared classes, and even some SDK native allocations like DirectByteBuffers. These are the most common things to leak... [More]
For platforms that run the IBM JVM, the strategic direction for memory analysis is to move away from PHDs and towards system dumps because they have so much more information . System dumps are the operating system core dumps usually produced with crashes (AIX/Linux=core, z/OS=SVCDUMP, Windows=minidump). In recent versions of the IBM JVM, a system dump can be loaded directly into the Memory Analyzer Tool without running jexract on it. The versions are: Java 5 >= SR12, Java 6 >= SR9, Java 626, and Java 7. The matching WAS versions are WAS >=... [More]
On the IBM JVM, the various environment variables used to change dump parameters (e.g. IBM_JAVACOREDIR, etc.) have been deprecated in favor of the -Xdump generic JVM command line argument.
Someone asked how to change the directory where all the dump artifacts go using -Xdump. Here it is for *nix, and just change /tmp/ in each argument to the desired directory (on Windows, use the Windows path syntax):
-Xdump:java:file=/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt -Xdump:heap:file=/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd... [More]
In recent versions of IHS and httpd, the time stamp in the access is log is the time the request arrived in IHS, not when the response completed. Originally, httpd used the time when the request completed . The code is in mod_log_config in the log_request_time function: httpd 1.3 ; httpd 2.0 . This is the primary reason why you may see "out of order" timestamps in the access log. To figure out when the response completed, you'll need %D or %T in the LogFormat and add them to the timestamp. Here's a comment in the code that explains... [More]
I couldn't find a better way to select attributes of a static class instance. Using classof() doesn't help because there could be zero instances of a class. The following example checks if the JVM is a z/OS control region. The trailing space character within the double quotes is important for accuracy.
SELECT c.controller FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class com.ibm.ws.management.util.PlatformHelperImpl ")
WAS exposes a JVM MBean for each process that has methods to create thread dumps, heap dumps, and system dumps. For example, to produce a thread dump on server1, use this wsadmin command (-lang jython):
The dumpThreads functionality is different depending on the operating system:
POSIX (AIX, Linux, Solaris, etc.): kill(pid, SIGQUIT)
z/OS: In recent versions, produces a javacore,... [More]
I'll be presenting a WebSphere Technical Exchange on May 1st @ 11AM Eastern . The topic will be a deep dive on IBM Java Health Center, primarily around its low-overhead, production-ready profiling capabilities to understand CPU issues on IBM JVMs. The slides are already available here: http://www-01.ibm.com/support/docview.wss?uid=swg27024833&aid=1 .
Whether you're using the older listener ports or the newer activation specifications, tuning the relevant thread pools is a key aspect of MDB performance. Thread pools are configured at a server level, while the number and concurrency of the MDBs is configured independently. Therefore, if your maximum thread pool size is too small, messages may queue unnecessarily. Below are equations for the most common setups which define how to setup the relevant thread pool maximums to avoid queuing. The x=1..n items are all the items of that type... [More]
In computing history, there was a famous crusade by Larry Tesler -- a titan of the industry; he worked at Xerox PARC (Smalltalk), Apple, Amazon, and Yahoo -- which he called "no modes." He said, for example, that you shouldn't have to enter a "mode" just to type text. You should be able to click and type. It was a revolutionary idea back then. There are still some popular modal programs today such as vi and Emacs, and the mode combinations make their users look like wizards, but in general, modes are dead.
However, I have... [More]
A customer recently gave me a very large heapdump which I tried to open in the Memory Analyzer Tool. It chugged for a while and then my computer overheated and suddenly shut itself off. I suspect this is a bug in the Linux kernel which improperly ramps the i7 processor and the fan can't keep up (or maybe my fan just needs a cleaning). The workaround for this on Linux was to reduce the maximum speed of my processors before opening the dump:
for i in `ls /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq`; do sudo bash -c "echo 1500000... [More]
A common aspect to a problem is that an application worked and then the environment (WAS, etc.) was upgraded and the application stopped working. Many customers then say, "therefore, the product is the root cause." It is easy to show that this is a fallacy (neither necessary nor sufficient) with a real world example: A recent customer upgraded from WAS 6.1 to WAS 7 without changing the application and it started to throw various exceptions. It turned out that the performance improvements in WAS 7 and Java 6 exposed existing... [More]
A previous post covered an older way of gathering configuration for Visual Configuration Explorer (VCE) using the VCE Headless Runtime, exported from ISA. The newer and preferred approach is to use the IBM Support Assistant Lite data collector, and this will work with WAS 8 (I've also tested this on 7). It does not work on V8.5.
Download the ISA Lite script: https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-isalite&S_PKG=wasunixwin
Extract the ISA Lite script into the <WAS> root directory.
A recent customer was comparing performance between WAS and Tomcat. Tomcat was performing much better. The application used temporary files intensively. After investigating thread dumps, we found that the sampled WAS threads showed much more temprorary file I/O activity than Tomcat threads. Next, we discovered that Tomcat changes Java's default temporary directory to the "temp" subdirectory of the Tomcat installation using the -Djava.io.tmpdir system property.
It turned out that Tomcat happened to be installed on a faster disk than... [More]
One simple and very useful indicator of process health and load is its TCP activity. The following script takes a set of ports and summarizes how many TCP sockets are established, opening, and closing for each port. It has been tested on Linux and AIX. Example output:
$ portstats.sh 80 443
PORT ESTABLISHED OPENING CLOSING
80 3 0 0
443 10 0 2
Total 13 0 2
echo "usage:... [More]
It is generally a malpractice for an application to call System.gc() or Runtime.gc() (hereafter referring to both as System.gc(), since the former simply calls the latter). By default, these calls instruct the JVM to perform a full garbage collection, including tenured spaces and a full compaction. These calls may be unnecessary and may increase the proportion of time spent in garbage collection than otherwise would have occurred if the garbage collector was left alone.
The generic JVM arguments -Xdisableexplicitgc (IBM) and... [More]
Below is a simple wsadmin script that calculates the approximate start time of a set of servers using the UpTime PMI statistic . For example:
$ ./wsadmin.sh -lang jython -username wsadmin -password wsadmin -f uptime.py -server server1
WASX7209I: Connected to process "dmgr" on node localhostCellManager11 using SOAP connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[-server, server1]"
Some applications use native libraries (e.g. JNI; .so, .dll, etc.) to perform functions in native code (e.g. C/C++) rather than through Java code. This may involve allocating native memory outside of the Java heap (e.g. malloc, mmap). These libraries have to do their own garbage collection and application errors can cause native memory leaks, which can ultimately cause crashes, paging, etc. These problems are one of the most difficult classes of problems, and they are made even more difficult by the fact that native libraries are often... [More]
I'll be presenting a WebSphere Technical Exchange on October 16th @ 11AM Eastern . It is free for all to join and includes a remote presentation and a question & answer session. The topic will be AIX Native Memory Problem Determination Techniques and Tools for WebSphere Application Server. Call-in information and slides are available here: http://www-304.ibm.com/support/docview.wss?uid=swg27036053 .
The following is my attempt at understanding how much memory has been malloc'ed at the time of a Linux coredump using gdb. I'm not aware of any built-in way to get this from gdb, surprisingly (or even better, the total virtual memory used by the process). This investigation is assuming a recent version of Linux and the default glibc malloc implementation, the latest source code of which is located here: http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=malloc/malloc.c;hb=HEAD . You'll need the debuginfo packages of glibc and glibc-common... [More]
In my last post , I wrote about understanding how much has been malloc'ed in a coredump through gdb, which was successful. In this post, I'll describe my investigations into total virtual memory usage in a core, which was unsuccessful.
First, I used the same program as before which calls malloc with various sizes and I changed it to sleep at the end. While it was sleeping, I took OS statistics and a core dump.
The first thing I ran was 'ps -o pid,vsz,rss -p 14062':
PID VSZ RSS
14062 44648 42508
VSZ is the total virtual... [More]
One common type of OutOfMemoryError (OOM) occurs when an application thread accumulates too much memory. When looking at a PHD, however, you may only see a large object (and potentially uninteresting child objects) as GC roots , so it's hard to figure out the root cause. Even if there are clear package names in the large object, that still may be insufficient information for the developers to figure out the problem. For example, here is an example PHD heapdump as seen in the Memory Analyzer Tool and HeapAnalyzer:
In both tools, all we... [More]