| |  | {tip:title=For discussions or questions...} |
| | To start a discussion or get a question answered, consider posting on the [Linux for Power Architecture forum|http://www.ibm.com/developerworks/forums/forum.jspa?forumID=375]. |
| | |
| | Additional Linux on Power performance information is available on the [Performance page| http://www.ibm.com/developerworks/wikis/display/LinuxP/Performance] |
| | {tip} |
| | |
| | *Contents* |
| | {toc:minLevel=2} |
| | |
| | \\ |
| | h2. An example problem - who's throwing signals? |
| | |
| | A customer reported they were seeing significant performance problems with their Java implementation on a new level of software base. Performance analysis teams were pulled into the discussions because the initial simple indications showed that the CPU utilization had soared to 100% CPU busy. |
| | |
| | In problem determination, the technical team eventually determined they were seeing a lot of unexpected SIGUSR2 signals being generated which was causing the Java engine to do a lot of unnecessary work - in particular repeated garbage collection. In essence, the system was stuck in garbage collection mode. |
| | |
| | In the course of the root cause discussions, concerns were raised that a new rogue process, or even the hardware system itself, was generating the SIGUSR2 signals. So before hardware technicians were called in, the team decided to use pre-canned SystemTap scripts to determine who might be generating the SIGUSR2 signals. SystemTap allows the end user to dynamically "hook into" code running in the system, and provide read'able output. We explain how the script works later on this page. |
| | |
| | So we went to the SystemTap Examples page at http://sourceware.org/systemtap/examples/ and searched for "signal". We found an example SystemTap script named sig_by_pid.stp at http://sourceware.org/systemtap/examples/process/sig_by_pid.stp |
| | |
| | The script was downloaded, and the script was run with SystemTap while the problem was occurring. The resulting output showed: |
| | |
| | {noformat} |
| | |
| | # stap sig_by_pid.stp |
| | Collecting data... Type Ctrl-C to exit and display results |
| | SPID SENDER RPID RECEIVER SIGNAME COUNT |
| | 29561 java 29561 java SIGUSR2 412 |
| | 21656 java 21656 java 61 230 |
| | 29561 java 29561 java SIGTRAP 58 |
| | 19348 java 19348 java SIGTRAP 54 |
| | 19348 java 19348 java 61 53 |
| | 12571 java 12571 java SIGTRAP 51 |
| | 10078 java 10078 java SIGTRAP 51 |
| | 9172 java 9172 java SIGTRAP 45 |
| | 18140 java 18140 java SIGTRAP 45 |
| | 22725 java 22725 java SIGTRAP 43 |
| | 10078 java 10078 java 61 41 |
| | 21580 java 21580 java SIGTRAP 41 |
| | 9610 java 9610 java SIGTRAP 40 |
| | 5825 java 5825 java SIGTRAP 40 |
| | 12571 java 12571 java 61 39 |
| | 14925 java 14925 java SIGTRAP 39 |
| | 27406 java 27406 java SIGTRAP 39 |
| | 6117 java 6117 java SIGTRAP 39 |
| | {noformat} |
| | |
| | With this result, the teams were able to easily see that one of the Java processes (pid #29561) was sending itself a lot of SIGUSR2 signals. This was the clue they were looking for, and the Java team went on to find the defect in their code. |
| | |
| | The nice part is using SystemTap is easy and non-intrusive. There are many example scripts available which can be selected to try. And the scripting language is easy to modify if the technical teams need slight variants of the examples provided. |
| | |
 |  | \\ |
| | h2. Considerations for setting up a system for SystemTap |
| | |
| | SystemTap comes with both RHEL 5.2 and SLES 10 sp2 |
| | |
| | SystemTap depends on kernel-debuginfo. So before you can execute the SystemTap scripts the user will need to install this package. It's a big rpm package, but it doesn't add overhead to a running system. |
| | |
| | |
 | | \\ |
| | h2. What does the sig_by_pid.stp script do? |
| | |
| | Below is the script |
| | |
| | {noformat} |
| | #! /usr/bin/env stap |
| | |
| | # Copyright (C) 2006 IBM Corp. |
| | # |
| | # This file is part of systemtap, and is free software. You can |
| | # redistribute it and/or modify it under the terms of the GNU General |
| | # Public License (GPL); either version 2, or (at your option) any |
| | # later version. |
| | |
| | # |
| | # Print signal counts by process IDs in descending order. |
| | # |
| | |
| | global sigcnt, pid2name, sig2name |
| | |
| | probe begin { |
| | print("Collecting data... Type Ctrl-C to exit and display results\n") |
| | } |
| | |
| | probe signal.send |
| | { |
| | snd_pid = pid() |
| | rcv_pid = sig_pid |
| | |
| | sigcnt[snd_pid, rcv_pid, sig]++ |
| | |
| | if (!(snd_pid in pid2name)) pid2name[snd_pid] = execname() |
| | if (!(rcv_pid in pid2name)) pid2name[rcv_pid] = pid_name |
| | if (!(sig in sig2name)) sig2name[sig] = sig_name |
| | } |
| | |
| | probe end |
| | { |
| | printf("%-8s %-16s %-5s %-16s %-16s %s\n", |
| | "SPID", "SENDER", "RPID", "RECEIVER", "SIGNAME", "COUNT") |
| | |
| | foreach ([snd_pid, rcv_pid, sig_num] in sigcnt-) { |
| | printf("%-8d %-16s %-5d %-16s %-16s %d\n", |
| | snd_pid, pid2name[snd_pid], rcv_pid, pid2name[rcv_pid], |
| | sig2name[sig_num], sigcnt[snd_pid, rcv_pid, sig_num]) |
| | } |
| | } |
| | {noformat} |
| | |