It may be useful to understand what PID is sending a kill signal to a process on AIX. You can use this kernel trace:
Login as root
# rm -rf /tmp/aixtrace; mkdir /tmp/aixtrace/; cd /tmp/aixtrace/
# trace -C all -a -T 10M -L 20M -n -j 134,139,465,14e,46c -o ./trc
... Reproduce the problem ... e.g. kill -3 7667754
# cp /etc/trcfmt .
# trcnm -a > trace.nm
# LDR_CNTRL=MAXDATA=0x80000000 gensyms > trace.syms
# LDR_CNTRL=MAXDATA=0x80000000 gennames -f > gennames.out
# pstat -i > trace.inode
# ls -al /dev > trace.maj_min2lv
Either zip and send these files to a PMR or analysis machine, or run these commands directly to process the trace:
# trcrpt -C all -r -o trc.tr trc
# trcrpt -C all -t trcfmt -n trace.nm -x -O pid=on,tid=on,svc=on,exec=on,cpuid=on,PURR=on -o trc.txt trc.tr
Make sure the trace buffers did not wrap:
# grep WRAP trc.txt
If there are no results, then you're good; otherwise, if you see lines such as:
006 --1- -1 -1 -1 963.205627656 0.002912 963.205627656 TRACEBUFFER WRAPAROUND 0003
005 -4916246- -1 4916246 113967573 963.205627656* 963.205627656 LOGFILE WRAPAROUND 0002
Then, either try increasing buffer sizes or reducing your test case or system load (or the tracepoints in -j).
Finally, search for the signal:
# grep -Ei "^14e|46c" trc.txt | grep -E "signal 3|SIGQUIT"
14E ksh 0 10879036 62128373 28.157542500 0.128249 28.157542500 kill: signal SIGQUIT to process ? java
The time of the signal is the ELAPSED_SEC column added to the date at the top of trc.txt:
# head -2 trc.txt
Wed Aug 21 05:10:28 2013
Thus the kill was sent at 05:10:56 by PID 10879036 (ksh). If this is a long running process, then you can reference ps.out for more details. The entry may not print the PID the signal was sent to (notice the question mark), but you should be able to figure that out based on other artifacts produced at that time such as javacores.