Question & Answer
Question
The AIX trace command can be used to find out if a process is sending another process a signal.
Answer
Sometimes a process might terminate unexpectedly, leaving no clues behind to explain why. If a core dump is not created and there is no CORE_DUMP entry in the error report, it is possible the process simply terminated in the normal way, unexpectedly and possibly due to a software defect. It is also possible that a signal was generated internally, or sent by another process, and the receiving process is programmed to intercept the signal and terminate without writing a core. See the tech note Why a Core File is Not Created for additional information about this possibility. But some signals, such as SIGTERM and SIGKILL, always terminate a process without creating a core dump. One method for determining if a process is sending a SIGTERM or SIGKILL is collecting a system trace while the event occurs.
The following trace command can be used to record information about any process receiving a signal that causes the process to terminate. This includes signals like SIGTERM, the signal that causes a process to terminate normally, as well as signals such as SIGSEGV or SIGILL which usually will cause a process to terminate abnormally. The -L and -T options require root user authority.
trace -a -o trcfile -L16000000 -T8000000 -J tidhk -j 14e
The -o option specifies the location where the trace file will be written. Use a file system that has at least 500MB of free space.
In the following example, we will run a sleep command in the background and then run the kill command to send the sleep process a SIGTERM. Immediately after sending the signal, we will stop the trace so that the circular trace buffer is not overwritten with irrelevant data.
Note: The trcstop command must be executed immediately after a process has terminated. If the wait is too long, even a few seconds, the trace data that recorded the event might be overwritten and will be lost. There are a number of methods to detect when a process has terminated. A simple shell script with a check every second inside a loop is one way.
Because we will be executing the kill command from the login shell, we will record the PID for the login shell now as we will be looking for it later in the trace report.
# ps
PID TTY TIME CMD
4915414 pts/0 0:00 -ksh
5439604 pts/0 0:00 ps
The PID for the login shell is 4915414.
Start the trace:
# trace -a -o trcfile -L16000000 -T8000000 -J tidhk -j 14e
Run the sleep command in the background.
# sleep 1000 &
[1] 4587570
The PID for the sleep command is 4587570.
Send the sleep PID a SIGTERM and then immediately stop the trace:
# kill -s TERM 4587570 ; trcstop
[1] + Terminated sleep 1000 &
When the sleep process receives the SIGTERM, it terminates.
Next run the trcrpt command to generate a trace report from the binary trcfile. The trace report will be written to the text file trace.out.
# trcrpt -O pid=on,svc=on,exec=on,tid=on trcfile > trace.out
Search the trace.out file for SIG to find all signals recorded in the trace:
# grep SIG trace.out
14E ksh 4915414 16777459 35.243000285 0.214795 kill: signal SIGTERM to process 4587570 sleep
In this example, we see the process ksh with PID 4915414 (the login shell) sent a SIGTERM signal to the process sleep with PID 4587570.
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
isg3T1020498