IBM Support

AIX: How to find what killed/sent a signal to a process

How To


Summary

There are occasions where a process might be brought down due to receiving a signal, but you do not know what is sending the signal. If this is something that happens repeatedly, you can attempt to catch it happening by running a circular kernel trace with hooks to catch the sending and receiving of a signal.

Steps

Using a Kernel Trace
 
The steps to do this are:
 
1) Start a trace. This has limited trace hooks, so it will grow much slower than a regular full system trace. This is also a circular trace, which will continually gather data (circling back and overwriting older data as it fills up) until you stop it. This will create /tmp/trace.raw and a number of /tmp/trace.raw-X files, one for each CPU on the system - so you may want to direct them to a dedicated subdirectory:
 
trace -anl -C all -J tidhk -j 11E,119 -L100M -T50M -o /tmp/trace.raw
 

2) Recreate the problem. Make sure you know the PID of the process that is getting killed so that you can find it in the trace report.
 
3) As soon as the problem occurs, stop the trace:
 
trcstop
 
 
4) create a trace report. If the report is being created on the same system the trace was taken on, run something like:
 
trcrpt -Oexec=on,pid=on,svc=on,timestamp=1 -C all -o trace.report /tmp/trace.raw
 
 
If you plan on creating the report on a different system, first grab the following information from the system where the trace was taken:
 
trcnm > trcnm.out
/etc/trcfmt
 
 
Then, after copying those files and the trace.raw* files to the system where you intend to generate the report, run:
 
trcrpt -Oexec=on,pid=on,svc=on,timestamp=1 -n trcnm.out -t trcfmt -C all -o trace.report trace.raw
 

5) grep the trace report for pidsig and issig. You'll see something like this:
119  ksh            17432792                  8.405181                   pidsig:   pid=15990802 signal=SIGTERM lr=E5D44
119  ksh            17432792                  8.405186                   pidsig:   pid=15990802 signal=SIGCONT lr=E5D44
11E  auditpr        15990802                  8.405340                   issig:   pid=15990802 tid=30802147 t_cursig=SIGTERM
11E  auditpr        15990802                  8.405340                   issig:   pid=15990802 tid=30802147 t_cursig=SIGTERM
 
In that case, I had run 'kill' from ksh to kill auditpr - pidsig indicates that a SIGTERM signal was sent, and it tells you the PID it was sent to, which was 15990802. issig indicates that a signal was received by that process.
 
Sometimes you might need to grep the trace report for "exec: " also; it shows you which commands are being exec'd during the course of the trace, and could give you a better idea what is happening as you follow which process exec's another, which then may be the one sending the signal.

Also, one thing to note - if you are looking in particular for someone running the 'kill' command, just running 'kill' from ksh executes a ksh builtin function called kill - it doesn't call the actual /usr/bin/kill executable. The full path would have to be given to run /usr/bin/kill. So, in my example above where I ran 'kill' from ksh, it doesn't show /usr/bin/kill being exec'd - only that ksh has sent a signal to another process.

Since it may not be feasible to manually keep an eye on the process and run 'trcstop', you can set up a simple script to do so:
#!/usr/bin/ksh

PID=$1  # Pass the process PID in as an argument to the script

while true
do
  if [[ -z `ps -ef | grep $PID | egrep -v "$0|grep"` ]]; then
    trcstop
    exit
  fi
  sleep 5
done
 
It could be beneficial to gather 'ps -ef' output in that loop also, so that you have a record of what processes were active on that system, in case you need to find the parent of any PID seen in the trace.

You might also, depending on your needs, run an alternate (default) trace - this is the same as above, but remove the -l flag:
 
trace -an -J tidhk -j 11E,119 -L100M -T50M -C all -o /tmp/trace.raw
 
 
In this case, the trace will not wrap around and overwrite itself when it is full - so rather than needing to monitor the process that you are concerned about and immediately stop the trace when it dies, you can just let this trace run until it is full. You can adjust the log and buffer sizes as needed, if you would like it to be able to run for longer.
 
 
 
Using probevue
 

probevue can also be used to help find what is sending a signal to a process.


If you are wanting to monitor a process with PID 6160860, run:

echo '@@sysproc:sendsig:6160860 {printf ("Source=%llu Target=%llu sig=%llu\n",__sigsendinfo->spid,__sigsendinfo->tpid,__sigsendinfo->signo);}'  | probevue


If you are wanting to monitor all processes, change the PID to '*':

echo '@@sysproc:sendsig:* {printf ("Source=%llu Target=%llu sig=%llu\n",__sigsendinfo->spid,__sigsendinfo->tpid,__sigsendinfo->signo);}'  | probevue


If you want to redirect the output to a file:

echo '@@sysproc:sendsig:* {printf ("Source=%llu Target=%llu sig=%llu\n",__sigsendinfo->spid,__sigsendinfo->tpid,__sigsendinfo->signo);}'  | probevue -o /tmp/pv.out


The output will look like this - in the example where srcmstr (PID 3866952) is used to stop a subsystem with PID 6554076, we would see:

Source=3866952 Target=6554076 sig=30
Source=6554076 Target=6554076 sig=9


This will only tell you the PID that sent the signal, not the process name - but it is much easier to get running and view the results than doing a kernel trace.

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
12 September 2025

UID

ibm10887097