Before WAS 8.5, the "thread ID" printed in WAS logs (the hexadecimal number after the timestamp) comes from the java/util/logging/LogRecord.getThreadID method. This number was not in javacores, so there was no easy way to correlate javacores with log and trace messages. Moreover, this thread ID was different from java/lang/Thread.getID which might be printed in other components, and that thread ID also wasn't in javacores. (Note: see Mapping Underlying Java Thread Identifiers to those in Logging and Trace for a method to map these values using... [More]
IBM Support Assistant Version version 5 (Team Server) has officially launched . Unlike IBM Support Assistant version 4 which was a heavy, desktop application, ISA5 is fully web-based and either runs from a local application server or you can install its EARs into an existing application server. In addition to a platform for installing and launching tools ( full tools list ), ISA5 comes with a very strong log analysis engine . For example, you can upload a set of WAS logs, click Scan this Case , and it will search for warnings, errors, and other... [More]
Someone asked me how to find who is sending a SIGABRT to a process and below is one technique. If there are better techniques, please leave a comment! Attach to the process using gdb and immediately "continue." When the SIGABRT hits the process, gdb will break execution and leave you at a prompt. Then, simply handle the particular signal you want and print $_siginfo._sifields._kill.si_pid and detach: $ java HelloWorld
Hello World. Waiting indefinitely...
$ ps -elf | grep HelloWorld | grep -v grep
0 S kevin 23947 ...
$ gdb... [More]
I'll be presenting a WebSphere Technical Exchange on November 12th @ 11AM Eastern . It is free for all to join and includes a remote presentation and a question & answer session. The topic will be Linux Native Memory Problem Determination Techniques and Tools for WebSphere Application Server. Slides are available here: http://www-01.ibm.com/support/docview.wss?uid=swg27039764&aid=1
Our SWAT team's common customer malpractices document has been refreshed and published as an IBM Redpaper: http://www.redbooks.ibm.com/redpieces/abstracts/redp5033.html?Open . After most of our engagements, we consider what customer practices caused or contributed to the critical situation. The biggest one? No test environment is equal to the production environment: 35.1%. Please read this paper and pass it on to incentivize management to invest in the right areas to help avoid critical situations.
It may be useful to understand what PID is sending a kill signal to a process on AIX. You can use this kernel trace: Login as root
# rm -rf /tmp/aixtrace; mkdir /tmp/aixtrace/; cd /tmp/aixtrace/
# trace -C all -a -T 10M -L 20M -n -j 134,139,465,14e,46c -o ./trc
... Reproduce the problem ... e.g. kill -3 7667754
# cp /etc/trcfmt .
# trcnm -a > trace.nm
# LDR_CNTRL=MAXDATA=0x80000000 gensyms > trace.syms
# LDR_CNTRL=MAXDATA=0x80000000 gennames -f > gennames.out
# pstat -i > trace.inode
# ls -al /dev >... [More]
The latest version of the Interactive Diagnostic Data Explorer (IDDE) tool, available in the IBM Support Assistant , supports a subset of the IBM Extensions for Memory Analyzer plugins (if installed) by typing Ctrl+Space a second time after typing the first !, and it also supports the old DumpAnalyzer plugins by typing Ctrl+Space a second time. For example, running the old DumpAnalyzer basic analyzer:
This is part five in the series on understanding total virtual memory usage from a core dump on Linux ( part 1 , part 2 , part 3 , part 4 ). In part 4, I alluded to the fact that the gcore command in gdb actually walks the memory regions itself when writing the core dump. This got me thinking: what if this is different than how the kernel writes the core? To get the kernel to produce a core, the only way seems to be to destructively kill the process (e.g. kill -6 or kill -11). By default, IBM Java will catch these signals before the process... [More]
This is part four in the series on understanding total virtual memory usage from a core dump on Linux ( part 1 , part 2 , part 3 ). Previously, I recommended using a script to add a timestamp to a core file name when using gcore so that MAT could know when the dump was taken. Now, I'll make the case for a stronger recommendation to completely avoid using gcore and use IBM Java's built-in capabilities of taking a core dump instead. Gcore is simply a shell script available on some flavors of Linux: $ whereis gcore
We are working to add PDB symbols for WAS native libraries and we ran into a subtle issue. As Microsoft states : PDB files are generated if a project is built by using the /Zi or /ZI (Produce PDB Information) compiler switch... Generating PDB files for release executables does not affect any optimizations, or significantly alter the size of the generated files... For this reason, you should always produce PDB files... We added /Zi to the compiler (cl.exe) and it produced a PDB file; however, when we grabbed a dump of the process and loaded it... [More]
At a recent customer, we improved throughput by 50% simply by restarting with the AIX environment variable MALLOCOPTIONS=multiheap. This only applies to situations where there is heavy, concurrent malloc usage, and in many cases of WAS/Java, this is not the case. The multiheap option does have costs, particularly increased virtual and physical memory usage. The primary reason is that each heap's free tree is independent, so fragmentation is more likely. There is also some additional metadata overhead. malloc is often a bottleneck for... [More]
In what will probably become a 10 part series before we figure things out, I wanted to share more information I've learned about Linux core dumps. Part 1 successfully found how much was malloc'ed in a core, and part 2 failed to find the total virtual memory usage in a core. A colleague recently pointed me to more information in the most obvious of places: 'man core' ! Since kernel 2.6.23, the Linux-specific /proc/PID/coredump_filter file can be used to control which memory segments are written to the core dump file in the event... [More]
If you are using a HotSpot-based JVM, and you are producing HPROF heapdumps that have more than 4GB of object data, and you are using the Memory Analyzer Tool to analyze those heapdumps, then make sure you check the Error Log for any warnings when first loading the dump. We recently discovered that some HotSpot JVMs write an incorrect length field in the HPROF file. MAT will end up reading only part of the heapdump, but other than the warning, there's no sign that you're only looking at a subset of the dump. You can find more details in... [More]
In a previous post , I discussed the importance of symbols for native libraries. Not only are they needed, but you may be actively deceived by the guesses of stack walkers if you don't have them. On Linux, although it is recommended to simply compile executables and libraries with symbols and ship them unstripped, even the operating system vendors do not do this (the simple reason is the size of the symbol data). For example, Fedora and RedHat do not ship binaries unstripped, but instead they separate the symbols into matching debuginfo... [More]