It is difficult to determine the root cause when high CPU issue happens in a customer's environment. To narrow down the problematic area to thread level, this document will introduce some useful utilities and commands for various distributed platforms.
Windbg and Process Explorer utilities are very good utilities to debug Windows programs. You can download them from the following links.
and install Debugging Tools for Windows 32-bit Version then download the Process Explorer from http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
and install the v11.33.
When you start the Process Explorer by clicking the procexp.exe that you installed, then you will see a window like the following in which all Windows processes are listed.
Select “Configure Symbols...” in the Options menu:
Select the dbghelp.dll in the folder where you installed the Windbg tool and select the local symbol path where you put the program symbols there (eg for MQ agent these symbol files are .pdb files which should be shipped when product releases). Click OK. Note that you should select the dbghelp.dll in the directory where you installed the Windbg tool instead of the one in %systemroot%\system32 because this latter copy of dbghelp.dll does not include the functionality required. Wait until you see the high CPU happens again and locate to the kmqlocal.exe process and click “Properties...”.
Change to tab Threads and locate to the thread (TID) which is consuming much CPU and click “Stack” button to show its call stack.
You will see something like:
Select all lines in the “Stack for thread” window and click the Copy button to save the call stack information and send it to IBM support for review.
1. When high CPU happens, run "top -H > top.out", top.out will show the process CPU utilization.
2. Run "ps H -e -o pid,tid,pcpu,cmd --sort=pcpu | grep kmqlocal-PID > ps.out"
Note that kmqlocal-PID is the process id of the high CPU process for MQ agent, the tid column show the thread id of the process. Write down the tid number which uses the most CPU usage.
3. Use gdb to identify the high CPU thread and take screenshots for the output and send them to IBM Support for review.
1) Start gdb
2) Attach to the kmqlocal process by
(gdb) attach PID
3) Run "info threads" to get all the thread info. Remember the thread number at the beginning of every line.
(gdb) info threads
25 Thread 4135922592 (LWP 31507) 0xffffe410 in __kernel_vsyscall ()
24 Thread 4127529888 (LWP 31508) 0xffffe410 in __kernel_vsyscall ()
23 Thread 4118920096 (LWP 31509) 0xffffe410 in __kernel_vsyscall ()
4) Switch to the thread and check its back trace (BT):
(gdb) thread 25
[Switching to thread 2 (Thread 4060576672 (LWP 3361))]#0 0xffffe410 in __kernel_vsyscall ()
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xf79f47e6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0xf76d9206 in service_thread () from /opt/IBM/ITML3WLJ/tmaitm6/lx8263/lib/libkde.so
#3 0xf79f134b in start_thread () from /lib/libpthread.so.0
#4 0xf785c65e in clone () from /lib/libc.so.6
Use tprof to analyze the CPU usage of each application thread.
1. Determine high CPU thread
The tprof command, an AIX operating system provided utility, uses sampling at the processor level to determine which process, thread and routine is using CPU time. tprof periodically samples which thread and instructions are being run on each processor and maps this back to the process and routine. The information is not 100% accurate, but is representative and gathered with very little performance overhead. The output from the "tprof -skeux sleep 60" command produces a file called sleep.prof. The tprof command uses sampling at the processor level to determine which process, thread and routine is using the processors.
-s Enables shared library profiling.
-k Enables kernel profiling.
-e Turns on kernel extension profiling.
-u Enables user mode profiling.
-x program Specifies the program to be executed by tprof.
Run the trpof and find the function which consumes maximum CPU.
Total % For ./kmqagent thread 2166785 (USER) = 7.04
Subroutine % Source
========== ====== ======
In most case this command output will show the subroutines’ name then you are able to check the code in those subroutines. However in my case when this article is written, tprof does not work correctly with my application as you can see that it shows HEX values. If you run into same symptom, you may need take additional steps to map it to actual subroutine name by dbx before this issue gets fixed in future.
-bash-3.2# dbx -a 856302 (pid of kmqagent)
Waiting to attach to process 856302 ...
Successfully attached to kmqlocal.
Type 'help' for help.
reading symbolic information ...warning: no source compiled with -g
stopped in _event_sleep at 0xd04a1c54 ($t5)
(dbx) listi 0x1005B5D4
0x1005b5d4 (at__10RWIsvSlistCFUl+0x168) 8061009c lwz r3,0x9c(r1)
0x1005b5d8 (at__10RWIsvSlistCFUl+0x16c) 3863ffff addi r3,-1(r3)
0x1005b5dc (at__10RWIsvSlistCFUl+0x170) 9061009c stw r3,0x9c(r1)
0x1005b5e0 (at__10RWIsvSlistCFUl+0x174) 80610074 lwz r3,0x74(r1)
0x1005b5e4 (at__10RWIsvSlistCFUl+0x178) 80630000 lwz r3,0x0(r3)
0x1005b5e8 (at__10RWIsvSlistCFUl+0x17c) 90610074 stw r3,0x74(r1)
0x1005b5ec (at__10RWIsvSlistCFUl+0x180) 8061009c lwz r3,0x9c(r1) 0x1005b5f0 (at__10RWIsvSlistCFUl+0x184) 28030000 cmpli cr0,0x0,r3,0x0
0x1005b5f4 (at__10RWIsvSlistCFUl+0x188) 4082ffe0 bne
0x1005b5f8 (at__10RWIsvSlistCFUl+0x18c) 8061009c lwz r3,0x9c(r1)
(dbx) map at__10RWIsvSlistCFUl