Signals are software interrupts that send information about an occurrence of an asynchronous event to an executing program, or process. Signals are generated for a variety of reasons such as timer expiration. Most hardware traps -- illegal instructions, access to invalid addresses, and so on -- are converted to signals.
Signals can be generated by the process itself, or they can be sent from one process to another. A variety of signals can be generated or delivered, and they have many uses for programmers. (To see a complete list of signals in the Linux® environment, use the command kill -l.)
While the principles described in this article are general, the sample programs were compiled with gcc version 3.3.3 and the SUSE Linux Enterprise Server 9 (PPC version) operating system.
When debugging, perhaps 90 percent of your time is spent just finding the problem. You can use signals to reduce this time. Signals provide a great deal of information to or about a user space process. You can design applications to use signal information to decide a course of action, giving applications full control within their execution context.
Signals can be ignored by SIG_IGN; an ignored signal is not delivered to a process. Listing 1 shows how to ignore a SIGINT. (Because the process ignores SIGINT, you'll need to use Crtl-Z to stop the process or Crlt-\ to quit the process.)
Listing 1. Sample program to ignore a SIGINT signal
#include <stdio.h>
#include <signal.h>
main()
{
signal(SIGINT,SIG_IGN);
while(1)
printf("You can't kill me with SIGINT anymore, dude\n");
return 0;
}
|
When a signal is delivered to a process, two types of actions are possible:
- Default actions, where the kernel handles the signal and takes action depending on the signal. Each signal has its own signal-handling routines in the kernel; the default behavior of the routine is to kill the process.
- User-defined actions, where the signal is handled by a user-defined signal handler.
Let's focus on user-space signal handlers.
A signal handler is a piece of code that is executed at the time of signal delivery. It is part of the user-space program code and is executed in the user-space context. Signal handlers provide information on the action to be taken when the signal occurs. Signal handlers can also be written to ignore the signal.
The user process is not allowed to install handlers for all signals; for instance, installing handlers for SIGKILL and SIGSTOP are not permitted. If the process goes out of control, someone (at least the kernel) needs to be able to kill or stop the process. If the operating system allowed the process to register handlers for these signals and if the handlers were designed to ignore them, there would be virtually nothing to stop the process except a hard reboot.
Listing 2 shows one way to register a signal handler:
Listing 2. Registering a signal handler
struct sigaction mysig_act;
mysig_act.sa_flags = SA_SIGINFO;
mysig_act.sa_sigaction = (void *)mysig_handler;
if(sigaction (<signal number>,&mysig_act,(struct sigaction *)NULL)) {
printf("Sigaction returned error = %d\n", errno);
exit(0);
}
|
The sigaction system call takes three parameters:
- The signal number
-
The pointer to the new
sigactionstructure -
The pointer to the old
sigactionstructure
The sigaction structure might be defined like so:
Listing 3. sigaction structure
struct sigaction {
void (*sa_handler)(int); /* func pointer */
void (*sa_sigaction)(int, siginfo_t *, void *); /*func pointer */
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
}
|
When sa_flags is set to SA_SIGINFO, the signal handler function should be set to sa_sigaction. SA_SIGINFO invokes the signal handler with three parameters:
- Signal number
- Signal information
- Snapshot of the hardware context
mysig_handler is the handler function that will be invoked at the time of signal delivery. mysig_act is the sigaction structure containing all this information.
In UNIX®, every signal has a unique signal number. As mentioned earlier, kill -l lists all the signals with their corresponding signal number.
The second parameter is the signal information structure. The structure is called as siginfo_t. This structure is populated by the kernel, based on the signal generated. This structure can be used to get the sender's pid, uid, fault address, and other information. The structure also provides an error code and an si code. The header file that contains the structure definition is bits/siginfo.h.
The third parameter is the ucontext structure. This structure (short for User Context Structure) has pointers to various other structures such as mcontext_t, sigset_t, and so on. mcontext_t provides data about all the register values found in the system at the time of the fault; the register values will in turn be delivered to the process as a signal. The kernel maintains a context structure for all the processes in the system, information it needs for effective context switching among different processes.
The kernel provides only limited information back to the user program in the pt_regs and mcontext_t structure. These structures contain data that includes almost all registers, including general purpose registers (GPRs), floating-point registers (FPRs), VMX registers (if available), and special purpose registers (SPRs).
But remember, pt_regs is an architecture-specific structure. The header files that contain this information are sys/ucontext.h and asm/ptrace.h.
Listing 4. pt_regs structure definition <asm-ppc64/ptrace.h>
#define PPC_REG unsigned long
struct pt_regs {
PPC_REG gpr[32];
PPC_REG nip;
PPC_REG msr;
PPC_REG orig_gpr3; /* Used for restarting system calls */
PPC_REG ctr;
PPC_REG link;
PPC_REG xer;
PPC_REG ccr;
PPC_REG softe; /* Soft enabled/disabled */
PPC_REG trap; /* Reason for being here */
PPC_REG dar; /* Fault registers */
PPC_REG dsisr;
PPC_REG result; /* Result of a system call */
};
|
Important registers to look for when debugging through signals are the GPRs, instruction pointer (NIP), machine state register (MSR), trap, data address register (DAR), and so on. But all of the registers might not be relevant for all of the signals. In the case of SIGILL, the DAR might not provide useful data, because this register is used to store the fault address in the case of SIGSEGV.
Now that you have some background on signals, let's look at how to use them. The following sample program uses the SIGTERM signal.
Listing 5. Program to handle SIGTERM
#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <ucontext.h>
static void myhandler (unsigned int sn , siginfo_t si , struct ucontext *sc)
{
unsigned int mnip;
int i;
printf(" signal number = %d, signal errno = %d, signal code = %d\n",
si.si_signo,si.si_errno,si.si_code);
printf(" senders' pid = %x, sender's uid = %d, \n",si.si_pid,si.si_uid);
}
main()
{
struct sigaction s;
s.sa_flags = SA_SIGINFO;
s.sa_sigaction = (void *)myhandler;
if(sigaction (SIGTERM,&s,(struct sigaction *)NULL)) {
printf("Sigaction returned error = %d\n", errno);
exit(0);
}
while(1);
return 0;
}
|
The above sample program registers a signal handler for SIGTERM and, in handler code, it prints the pid and uid of the sender's process and silently ignores the signal and continues the execution. Here's the output:
Listing 6. Output of Listing 5
> ./fin & [2] 7375 > ps -ef | grep 7375 maddy 7375 7063 90 16:51 pts/0 00:00:24 ./fin maddy 7377 7063 0 16:52 pts/0 00:00:00 grep 7375 > kill 7375 signal number = 15, signal errno = 0, signal code = 0 senders' pid = 7063, sender's uid = 1001, > kill -9 7375 > ps -ef | grep 7375 maddy 7379 7063 0 16:52 pts/0 00:00:00 grep 7375 [2]+ Killed ./fin |
This signal handling data can be vital in some cases. Using this data, a process can terminate itself after finishing the execution of a critical part of the code (if started) as long as it gets a SIGTERM signal in the middle. This can be achieved by setting a global flag in the handler code and checking for the flag after completion of the critical part. You can also save the sender's pid and print it in a dump file to figure out who sent the signal.
Let's look at a more serious case. Consider a SIGILL signal. SIGILL is generated for the execution of an illegal instruction. It is generated under a variety of conditions, such as an illegal opcode, illegal operand, privileged opcode, and so on.
This program will try to execute a privileged opcode:
Listing 7. Program to handle SIGILL
#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <ucontext.h>
static void myhandler (unsigned int sn , siginfo_t si ,\
struct ucontext *sc)
{
unsigned int mnip;
int i,j;
printf(" Signal number = %d, Signal errno = %d\n"
,si.si_signo,si.si_errno);
switch(si.si_code)
{
case 1: printf(" SI code = %d (Illegal opcode)\n",si.si_code);
break;
case 2: printf(" SI code = %d (Illegal operand)\n",si.si_code);
break;
case 3: printf(" SI code = %d (Illegal addressing mode)\n",
si.si_code);
break;
case 4: printf(" SI code = %d (Illegal trap)\n",si.si_code);
break;
case 5: printf(" SI code = %d (Privileged opcode)\n",si.si_code);
break;
case 6: printf(" SI code = %d (Privileged register)\n",si.si_code);
break;
case 7: printf(" SI code = %d (Coprocessor error)\n",si.si_code);
break;
case 8: printf(" SI code = %d (Internal stack error)\n",si.si_code);
break;
default: printf("SI code = %d (Unknown SI Code)\n",si.si_code);
break;
}
printf(" Machine State Register = %x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->msr));
printf(" Link register pointing to location = 0x%x, \
Opcode at the location = 0x%x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->link),
*(unsigned int *) \
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->link));
for(i=20,j=5;i>0;i-=4,j--)
printf(" Op-Code [nip - %d] = 0x%x at address = 0x%x \n"
,j,*(unsigned int *)(si.si_addr - i)
,(si.si_addr - i) );
printf(" Failed Op-code = 0x%x at address = 0x%x \n",
*(unsigned int*)(si.si_addr), (si.si_addr));
printf(" Op-Code [nip + 1] = 0x%x at address = 0x%x \n",
*(unsigned int *)(si.si_addr + 4), (si.si_addr + 4));
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip) += 4;
}
my()
{
__asm__ volatile ("add 4,5,6 \n\t":);
__asm__ volatile ("add 7,8,9 \n\t":);
__asm__ volatile ("mfmsr 3 \n\t":);
__asm__ volatile ("add 4,5,6 \n\t":);
__asm__ volatile ("add 7,8,9 \n\t":);
}
main()
{
struct sigaction s;
s.sa_flags = SA_SIGINFO;
s.sa_sigaction = (void *)myhandler;
if(sigaction (SIGILL,&s,(struct sigaction *)NULL)) {
printf("Sigaction returned error = %d\n", errno);
exit(0);
}
my();
return 0;
}
|
Some instructions are not allowed to execute from the user space, such as those that try to access MSR or SRR0/SRR1 (save restore registers). To execute these instructions, you need to switch to the kernel context.
The program in Listing 7 will try to execute an instruction that will move the value of an MSR to a GPR. Reading an MSR is privileged stuff, so the expected result is a SIGILL. Here is the output:
Listing 8. Output of Listing 7
> ./mysigill Signal number = 4, Signal errno = 0 SI code = 5 (Privileged opcode) Machine State Register = 4d032 Link register pointing to location = 0x10000830, Opcode at the location = 0x38000000 Op-Code [nip - 5] = 0x9421ffe0 at address = 0x10000788 Op-Code [nip - 4] = 0x93e1001c at address = 0x1000078c Op-Code [nip - 3] = 0x7c3f0b78 at address = 0x10000790 Op-Code [nip - 2] = 0x7c853214 at address = 0x10000794 Op-Code [nip - 1] = 0x7ce84a14 at address = 0x10000798 Failed Op-code = 0x7c6000a6 at address = 0x1000079c Op-Code [nip + 1] = 0x7c853214 at address = 0x100007a0 |
As expected, the program has received a SIGILL (signal number 4) with an si code of 5, which is set when a privileged opcode is executed by a user-space program.
As shown in Listing 8, the program has dumped six consecutive instructions, including the failed one. To uncover the failed instruction in the code, do an object dump of the executable using objdump, which lists the instructions generated by the compiler. (The manpage for objdump has more information about this tool.)
Listing 9. The objdump command
> objdump -S mysigill >> /tmp/mdmp |
The /tmp/mdmp file will have the object dump of the executable mysigill. First, search for the opcode/instruction that has failed. In this case, the failed opcode is 7c6000a6.
Listing 10. Object dump file
<Search the output for the opcode "7c6000a6"> 10000788 <my>: 10000788: 94 21 ff e0 stwu r1,-32(r1) 1000078c: 93 e1 00 1c stw r31,28(r1) 10000790: 7c 3f 0b 78 mr r31,r1 10000794: 7c 85 32 14 add r4,r5,r6 10000798: 7c e8 4a 14 add r7,r8,r9 1000079c: 7c 60 00 a6 mfmsr r3 <== Bingo!!! 100007a0: 7c 85 32 14 add r4,r5,r6 100007a4: 7c e8 4a 14 add r7,r8,r9 100007a8: 7c 03 03 78 mr r3,r0 100007ac: 81 61 00 00 lwz r11,0(r1) 100007b0: 83 eb ff fc lwz r31,-4(r11) 100007b4: 7d 61 5b 78 mr r1,r11 100007b8: 4e 80 00 20 blr |
If the program has more than one occurrence of the opcode, try to find the sequence printed by the handler code in the dump file. This lets you isolate the function in the program that caused the instruction to execute or generate. When the source is compiled with the -g option, the dump file will have line-by-line source and corresponding instructions for implementation.
Now let's explore a way to debug a failure generated by a signal that programmers see frequently. The SIGSEGV signal is generated under a variety of conditions, such as when a process is trying to load or save in an unallocated memory region or when a program is trying to write to read-only memory. The sample program here is a typical example of a segmentation fault.
Listing 11. Program to handle SIGSEGV
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <ucontext.h>
static void seghandler (unsigned int sn , siginfo_t si , \
struct ucontext *sc)
{
unsigned int mnip;
int i;
mnip=*(unsigned int *)(((struct pt_regs *) \
((&(sc->uc_mcontext))->regs))->nip);
printf(" Signal number = %d, Signal errno = %d\n",
si.si_signo,si.si_errno);
switch(si.si_code)
{
case 1: printf(" SI code = %d (Address not mapped to object)\n",
si.si_code);
break;
case 2: printf(" SI code = %d (Invalid permissions for \
mapped object)\n",si.si_code);
break;
default: printf("SI code = %d (Unknown SI Code)\n",si.si_code);
break;
}
printf(" Intruction pointer = %x \n",mnip);
printf(" Fault addr = 0x%x \n",si.si_addr);
printf(" dar = 0x%x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->dar));
printf(" trap = 0x%x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->trap));
printf(" Op-Code [nip - 4] = 0x%x at address = 0x%x \n",
*(unsigned int *)\
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip-4),
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip-4) );
printf(" Failed Op-code = 0x%x at address = 0x%x \n",
*(unsigned int *)\
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip),
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip));
printf(" Op-Code [nip + 1] = 0x%x at address = 0x%x \n",
*(unsigned int *) \
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip+4),
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip + 4));
printf("***GPR values are the time of fault*** \n");
for (i=0;i<11;i++)
printf(" Gpr[%d] = 0x%x \n",i, \
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->gpr[i]));
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip)+=4;
}
main()
{
struct sigaction m;
char *p,*q, arr[]="Ma";
q=arr;
m.sa_flags = SA_SIGINFO;
m.sa_sigaction = (void *)seghandler;
sigaction (SIGSEGV,&m,(struct sigaction *)NULL);
*p++ = *q++;
return 0;
}
|
This program tries to store to an unallocated memory location by doing a string copy from arr to the p variable. The expected result is a SEGSEGV signal, as you can see:
Listing 12. Output of Listing 11
> ./sigsegv Signal number = 11, Signal errno = 0 SI code = 1 (Address not mapped to object) Intruction pointer = 98080000 Fault addr = 0x0 dar = 0x0 trap = 0x300 Op-Code [nip - 4] = 0x88090000 at address = 0x10000760 Failed Op-code = 0x98080000 at address = 0x10000764 Op-Code [nip + 1] = 0x396b0001 at address = 0x10000768 ***GPR values are the time of fault*** Gpr[0] = 0x4d Gpr[1] = 0xffffe070 Gpr[2] = 0x4001ee20 Gpr[3] = 0x0 Gpr[4] = 0xffffdf30 Gpr[5] = 0x0 Gpr[6] = 0xffffe110 Gpr[7] = 0xffffe114 Gpr[8] = 0x0 Gpr[9] = 0xffffe120 Gpr[10] = 0x0 |
The sample program has also dumped the general purpose register values in this case. One way of debugging this failure is to do an object dump of the executable and save it in a file, then search for the failed instruction (in this case, the failed opcode is 98080000).
Listing 13. Object dump file
<Search output for the opcode "98080000"> 10000744: 48 01 07 21 bl 10010e64 <__bss_start+0x48> *p++ = *q++; 10000748: 38 df 00 a0 addi r6,r31,160 1000074c: 81 46 00 00 lwz r10,0(r6) 10000750: 38 ff 00 a4 addi r7,r31,164 10000754: 81 67 00 00 lwz r11,0(r7) 10000758: 7d 48 53 78 mr r8,r10 1000075c: 7d 69 5b 78 mr r9,r11 10000760: 88 09 00 00 lbz r0,0(r9) 10000764: 98 08 00 00 stb r0,0(r8) <==Failed instruction 10000768: 39 6b 00 01 addi r11,r11,1 1000076c: 91 67 00 00 stw r11,0(r7) 10000770: 39 4a 00 01 addi r10,r10,1 10000774: 91 46 00 00 stw r10,0(r6) return 0; 10000778: 38 00 00 00 li r0,0 } |
Since the program has been compiled with the -g option, the object dump contains the source listing. The failed instruction here is stb. The process tried to store a byte from the register r0 to a memory location pointed to by the register r8, but the register r8 has a value of 0x0 -- which can be seen from the gpr value dump by the handler code and which is the cause of this signal generation.
Learn
-
"Mastering Linux debugging techniques" (developerWorks, August 2002) shows you ways to debug programs that run on Linux.
-
"Inside the Linux kernel debugger" (developerWorks, June 2003) shows you how to install, set up, and use the features provided by KDB.
-
"Secure programmer: Prevent race conditions" (developerWorks, October 2004) introduces signal handling as related to resource contention.
-
"Linux software debugging with GDB" (developerWorks, February 2001) explains GDB, which lets you see the internal structure of a program, print out variable values, set breakpoints, and step through source code.
-
"Use reentrant functions for safer signal handling" (developerWorks, January 2005) is an excellent guide on the structure of signals.
-
For a concise list of crash signals and what they mean for the Linux 2.6 kernel with glibc 2.4, see the Spike Developer Zone.
-
Learn how to debug a program that generates a signal.
-
Get details on the Linux signal handling model in this Linux Journal article.
-
In General Programming Concepts: Writing and Debugging Programs (from the IBM AIX Documentation library), learn more about signal management.
-
Read more developerWorks articles on debugging on Linux.
-
Find more resources for Linux developers in the developerWorks Linux zone.
-
Stay current with developerWorks technical events and Webcasts.
Get products and technologies
-
Download KGDB, a source-level debugger for the Linux kernel that is used along with GDB to debug the kernel.
-
Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
-
Build your next development project on Linux with IBM trial software, available for download directly from developerWorks.
Discuss
-
Participate in developerWorks
blogs and get involved in the developerWorks community.
Madhavan Srinivasan has a Bachelors of Engineering degree in Electrical and Electronics from Madras University, India. He has been working for IBM Global Services (Software Labs), India, since November 2003. As a member of the PowerPC Tools Development Team at IBM, he concentrates on designing and developing diagnostic tools for PowerPC server processors on Linux and AIX. Other areas of interest are PowerPC architecture and operating system internals.





