Skip to main content

Signals as a Linux debugging tool

Intelligent signal handling finds bugs faster

Madhavan Srinivasan (masriniv@in.ibm.com), Developer, PowerPC Tools Development, IBM
Madhavan Srinivasan has a Bachelors of Engineering degree in Electrical and Electronics from Madras University, India. He has been working for IBM Global Services (Software Labs), India, since November 2003. As a member of the PowerPC Tools Development Team at IBM, he concentrates on designing and developing diagnostic tools for PowerPC server processors on Linux and AIX. Other areas of interest are PowerPC architecture and operating system internals.

Summary:  By focusing on the analysis of data captured using signal handlers, you can speed up the most time-consuming part of debugging: finding the bug. This article gives a background on Linux® signals with examples specifically tested on PPC Linux, then goes on to show how to design your handlers to output information that lets you quickly home in on failed portions of code.

Date:  29 Nov 2005
Level:  Introductory
Activity:  5471 views
Comments:  

Signals are software interrupts that send information about an occurrence of an asynchronous event to an executing program, or process. Signals are generated for a variety of reasons such as timer expiration. Most hardware traps -- illegal instructions, access to invalid addresses, and so on -- are converted to signals.

Signals can be generated by the process itself, or they can be sent from one process to another. A variety of signals can be generated or delivered, and they have many uses for programmers. (To see a complete list of signals in the Linux® environment, use the command kill -l.)

While the principles described in this article are general, the sample programs were compiled with gcc version 3.3.3 and the SUSE Linux Enterprise Server 9 (PPC version) operating system.

Signals as a debugging tool

When debugging, perhaps 90 percent of your time is spent just finding the problem. You can use signals to reduce this time. Signals provide a great deal of information to or about a user space process. You can design applications to use signal information to decide a course of action, giving applications full control within their execution context.

Signals can be ignored by SIG_IGN; an ignored signal is not delivered to a process. Listing 1 shows how to ignore a SIGINT. (Because the process ignores SIGINT, you'll need to use Crtl-Z to stop the process or Crlt-\ to quit the process.)


Listing 1. Sample program to ignore a SIGINT signal

#include <stdio.h>
#include <signal.h>

main()
{
   signal(SIGINT,SIG_IGN);
   while(1)
      printf("You can't kill me with SIGINT anymore, dude\n");
   return 0;
}

When a signal is delivered to a process, two types of actions are possible:

  • Default actions, where the kernel handles the signal and takes action depending on the signal. Each signal has its own signal-handling routines in the kernel; the default behavior of the routine is to kill the process.
  • User-defined actions, where the signal is handled by a user-defined signal handler.

Let's focus on user-space signal handlers.


User-space signal handlers

A signal handler is a piece of code that is executed at the time of signal delivery. It is part of the user-space program code and is executed in the user-space context. Signal handlers provide information on the action to be taken when the signal occurs. Signal handlers can also be written to ignore the signal.

The user process is not allowed to install handlers for all signals; for instance, installing handlers for SIGKILL and SIGSTOP are not permitted. If the process goes out of control, someone (at least the kernel) needs to be able to kill or stop the process. If the operating system allowed the process to register handlers for these signals and if the handlers were designed to ignore them, there would be virtually nothing to stop the process except a hard reboot.

Listing 2 shows one way to register a signal handler:


Listing 2. Registering a signal handler

struct sigaction mysig_act;
mysig_act.sa_flags = SA_SIGINFO;
mysig_act.sa_sigaction = (void *)mysig_handler;
if(sigaction (<signal number>,&mysig_act,(struct sigaction *)NULL)) {
   printf("Sigaction returned error = %d\n", errno);
   exit(0);
}

The sigaction system call takes three parameters:

  • The signal number
  • The pointer to the new sigaction structure
  • The pointer to the old sigaction structure

The sigaction structure might be defined like so:


Listing 3. sigaction structure

struct sigaction {
   void (*sa_handler)(int);   /* func pointer */
   void (*sa_sigaction)(int, siginfo_t *, void *);   /*func pointer */
   sigset_t sa_mask;
   int sa_flags;
   void (*sa_restorer)(void);
}

When sa_flags is set to SA_SIGINFO, the signal handler function should be set to sa_sigaction. SA_SIGINFO invokes the signal handler with three parameters:

  • Signal number
  • Signal information
  • Snapshot of the hardware context

mysig_handler is the handler function that will be invoked at the time of signal delivery. mysig_act is the sigaction structure containing all this information.

In UNIX®, every signal has a unique signal number. As mentioned earlier, kill -l lists all the signals with their corresponding signal number.

The second parameter is the signal information structure. The structure is called as siginfo_t. This structure is populated by the kernel, based on the signal generated. This structure can be used to get the sender's pid, uid, fault address, and other information. The structure also provides an error code and an si code. The header file that contains the structure definition is bits/siginfo.h.

The third parameter is the ucontext structure. This structure (short for User Context Structure) has pointers to various other structures such as mcontext_t, sigset_t, and so on. mcontext_t provides data about all the register values found in the system at the time of the fault; the register values will in turn be delivered to the process as a signal. The kernel maintains a context structure for all the processes in the system, information it needs for effective context switching among different processes.

The kernel provides only limited information back to the user program in the pt_regs and mcontext_t structure. These structures contain data that includes almost all registers, including general purpose registers (GPRs), floating-point registers (FPRs), VMX registers (if available), and special purpose registers (SPRs).

But remember, pt_regs is an architecture-specific structure. The header files that contain this information are sys/ucontext.h and asm/ptrace.h.


Listing 4. pt_regs structure definition <asm-ppc64/ptrace.h>

#define PPC_REG unsigned long
struct pt_regs {
        PPC_REG gpr[32];
        PPC_REG nip;
        PPC_REG msr;
        PPC_REG orig_gpr3;      /* Used for restarting system calls */
        PPC_REG ctr;
        PPC_REG link;
        PPC_REG xer;
        PPC_REG ccr;
        PPC_REG softe;          /* Soft enabled/disabled */
        PPC_REG trap;           /* Reason for being here */
        PPC_REG dar;            /* Fault registers */
        PPC_REG dsisr;
        PPC_REG result;         /* Result of a system call */
};

Important registers to look for when debugging through signals are the GPRs, instruction pointer (NIP), machine state register (MSR), trap, data address register (DAR), and so on. But all of the registers might not be relevant for all of the signals. In the case of SIGILL, the DAR might not provide useful data, because this register is used to store the fault address in the case of SIGSEGV.

Now that you have some background on signals, let's look at how to use them. The following sample program uses the SIGTERM signal.


Listing 5. Program to handle SIGTERM

#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <ucontext.h>

static void myhandler (unsigned int sn , siginfo_t  si , struct ucontext *sc)
{
   unsigned int mnip;
   int i;
   printf(" signal number = %d, signal errno = %d, signal code = %d\n",
                                   si.si_signo,si.si_errno,si.si_code);
   printf(" senders' pid = %x, sender's uid = %d, \n",si.si_pid,si.si_uid);
}

main()
{
   struct sigaction s;
   s.sa_flags = SA_SIGINFO;
   s.sa_sigaction = (void *)myhandler;
   if(sigaction (SIGTERM,&s,(struct sigaction *)NULL)) {
      printf("Sigaction returned error = %d\n", errno);
      exit(0);
   }
   while(1);
   return 0;
}

The above sample program registers a signal handler for SIGTERM and, in handler code, it prints the pid and uid of the sender's process and silently ignores the signal and continues the execution. Here's the output:


Listing 6. Output of Listing 5

> ./fin &
[2] 7375
> ps -ef | grep 7375
maddy     7375  7063 90 16:51 pts/0    00:00:24 ./fin
maddy     7377  7063  0 16:52 pts/0    00:00:00 grep 7375
> kill 7375
 signal number = 15, signal errno = 0, signal code = 0
 senders' pid = 7063, sender's uid = 1001,
> kill -9 7375
> ps -ef | grep 7375
maddy     7379  7063  0 16:52 pts/0    00:00:00 grep 7375
[2]+  Killed                  ./fin

This signal handling data can be vital in some cases. Using this data, a process can terminate itself after finishing the execution of a critical part of the code (if started) as long as it gets a SIGTERM signal in the middle. This can be achieved by setting a global flag in the handler code and checking for the flag after completion of the critical part. You can also save the sender's pid and print it in a dump file to figure out who sent the signal.

Let's look at a more serious case. Consider a SIGILL signal. SIGILL is generated for the execution of an illegal instruction. It is generated under a variety of conditions, such as an illegal opcode, illegal operand, privileged opcode, and so on.

This program will try to execute a privileged opcode:


Listing 7. Program to handle SIGILL

#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <ucontext.h>

static void myhandler (unsigned int sn , siginfo_t  si ,\
                       struct ucontext *sc)
{
   unsigned int mnip;
   int i,j;

   printf(" Signal number = %d, Signal errno = %d\n"
            ,si.si_signo,si.si_errno);
   switch(si.si_code)
   {
      case 1: printf(" SI code = %d (Illegal opcode)\n",si.si_code);
              break;
      case 2: printf(" SI code = %d (Illegal operand)\n",si.si_code);
              break;
      case 3: printf(" SI code = %d (Illegal addressing mode)\n",
                       si.si_code);
              break;
      case 4: printf(" SI code = %d (Illegal trap)\n",si.si_code);
              break;
      case 5: printf(" SI code = %d (Privileged opcode)\n",si.si_code);
              break;
      case 6: printf(" SI code = %d (Privileged register)\n",si.si_code);
              break;
      case 7: printf(" SI code = %d (Coprocessor error)\n",si.si_code);
              break;
      case 8: printf(" SI code = %d (Internal stack error)\n",si.si_code);
              break;
      default: printf("SI code = %d (Unknown SI Code)\n",si.si_code);
              break;
   }

  printf(" Machine State Register = %x \n",
    (((struct pt_regs *)((&(sc->uc_mcontext))->regs))->msr));
  printf(" Link register pointing to location = 0x%x, \
            Opcode at the location = 0x%x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->link),
*(unsigned int *) \
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->link));
  for(i=20,j=5;i>0;i-=4,j--)
     printf(" Op-Code [nip - %d] = 0x%x at address = 0x%x \n"
               ,j,*(unsigned int *)(si.si_addr - i)
               ,(si.si_addr - i) );
  printf(" Failed Op-code    = 0x%x at address = 0x%x \n",
           *(unsigned int*)(si.si_addr), (si.si_addr));
  printf(" Op-Code [nip + 1] = 0x%x at address = 0x%x \n",
            *(unsigned int *)(si.si_addr + 4), (si.si_addr + 4));
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip) += 4;
}

my()
{
   __asm__ volatile ("add 4,5,6 \n\t":);
   __asm__ volatile ("add 7,8,9 \n\t":);
   __asm__ volatile ("mfmsr 3 \n\t":);
   __asm__ volatile ("add 4,5,6 \n\t":);
   __asm__ volatile ("add 7,8,9 \n\t":);
}

main()
{
   struct sigaction s;

   s.sa_flags = SA_SIGINFO;
   s.sa_sigaction = (void *)myhandler;
   if(sigaction (SIGILL,&s,(struct sigaction *)NULL)) {
      printf("Sigaction returned error = %d\n", errno);
      exit(0);
   }
   my();
   return 0;
}

Some instructions are not allowed to execute from the user space, such as those that try to access MSR or SRR0/SRR1 (save restore registers). To execute these instructions, you need to switch to the kernel context.

The program in Listing 7 will try to execute an instruction that will move the value of an MSR to a GPR. Reading an MSR is privileged stuff, so the expected result is a SIGILL. Here is the output:


Listing 8. Output of Listing 7

> ./mysigill
 Signal number = 4, Signal errno = 0
 SI code = 5 (Privileged opcode)
 Machine State Register = 4d032
 Link register pointing to location = 0x10000830, Opcode at the location = 0x38000000
 Op-Code [nip - 5] = 0x9421ffe0 at address = 0x10000788
 Op-Code [nip - 4] = 0x93e1001c at address = 0x1000078c
 Op-Code [nip - 3] = 0x7c3f0b78 at address = 0x10000790
 Op-Code [nip - 2] = 0x7c853214 at address = 0x10000794
 Op-Code [nip - 1] = 0x7ce84a14 at address = 0x10000798
 Failed Op-code    = 0x7c6000a6 at address = 0x1000079c
 Op-Code [nip + 1] = 0x7c853214 at address = 0x100007a0

As expected, the program has received a SIGILL (signal number 4) with an si code of 5, which is set when a privileged opcode is executed by a user-space program.

As shown in Listing 8, the program has dumped six consecutive instructions, including the failed one. To uncover the failed instruction in the code, do an object dump of the executable using objdump, which lists the instructions generated by the compiler. (The manpage for objdump has more information about this tool.)


Listing 9. The objdump command

> objdump -S mysigill >> /tmp/mdmp

The /tmp/mdmp file will have the object dump of the executable mysigill. First, search for the opcode/instruction that has failed. In this case, the failed opcode is 7c6000a6.


Listing 10. Object dump file

<Search the output for the opcode "7c6000a6">

10000788 <my>:
10000788:       94 21 ff e0     stwu    r1,-32(r1)
1000078c:       93 e1 00 1c     stw     r31,28(r1)
10000790:       7c 3f 0b 78     mr      r31,r1
10000794:       7c 85 32 14     add     r4,r5,r6
10000798:       7c e8 4a 14     add     r7,r8,r9
1000079c:       7c 60 00 a6     mfmsr   r3               <== Bingo!!!
100007a0:       7c 85 32 14     add     r4,r5,r6
100007a4:       7c e8 4a 14     add     r7,r8,r9
100007a8:       7c 03 03 78     mr      r3,r0
100007ac:       81 61 00 00     lwz     r11,0(r1)
100007b0:       83 eb ff fc     lwz     r31,-4(r11)
100007b4:       7d 61 5b 78     mr      r1,r11
100007b8:       4e 80 00 20     blr

If the program has more than one occurrence of the opcode, try to find the sequence printed by the handler code in the dump file. This lets you isolate the function in the program that caused the instruction to execute or generate. When the source is compiled with the -g option, the dump file will have line-by-line source and corresponding instructions for implementation.

Now let's explore a way to debug a failure generated by a signal that programmers see frequently. The SIGSEGV signal is generated under a variety of conditions, such as when a process is trying to load or save in an unallocated memory region or when a program is trying to write to read-only memory. The sample program here is a typical example of a segmentation fault.


Listing 11. Program to handle SIGSEGV

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <ucontext.h>

static void seghandler (unsigned int sn , siginfo_t  si , \
                        struct ucontext *sc)
{
   unsigned int mnip;
   int i;

  mnip=*(unsigned int *)(((struct pt_regs *) \
              ((&(sc->uc_mcontext))->regs))->nip);
  printf(" Signal number = %d, Signal errno = %d\n",
            si.si_signo,si.si_errno);
  switch(si.si_code)
  {
   case 1: printf(" SI code = %d (Address not mapped to object)\n",
              si.si_code);
             break;
   case 2: printf(" SI code = %d (Invalid permissions for \
                       mapped object)\n",si.si_code);
           break;
   default: printf("SI code = %d (Unknown SI Code)\n",si.si_code);
            break;
  }
  printf(" Intruction pointer = %x \n",mnip);
  printf(" Fault addr = 0x%x \n",si.si_addr);
  printf(" dar = 0x%x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->dar));
  printf(" trap = 0x%x \n",
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->trap));
  printf(" Op-Code [nip - 4] = 0x%x at address = 0x%x \n",
*(unsigned int *)\
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip-4),
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip-4) );
  printf(" Failed Op-code    = 0x%x at address = 0x%x \n",
*(unsigned int *)\
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip),
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip));
  printf(" Op-Code [nip + 1] = 0x%x at address = 0x%x \n",
*(unsigned int *) \
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip+4),
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip + 4));
  printf("***GPR values are the time of fault*** \n");
  for (i=0;i<11;i++)
     printf(" Gpr[%d] = 0x%x \n",i, \
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->gpr[i]));
(((struct pt_regs *)((&(sc->uc_mcontext))->regs))->nip)+=4;
}

main()
{
   struct sigaction m;
   char *p,*q, arr[]="Ma";
   q=arr;

   m.sa_flags = SA_SIGINFO;
   m.sa_sigaction = (void *)seghandler;
   sigaction (SIGSEGV,&m,(struct sigaction *)NULL);
   *p++ = *q++;
   return 0;
}

This program tries to store to an unallocated memory location by doing a string copy from arr to the p variable. The expected result is a SEGSEGV signal, as you can see:


Listing 12. Output of Listing 11

> ./sigsegv
 Signal number = 11, Signal errno = 0
 SI code = 1 (Address not mapped to object)
 Intruction pointer = 98080000
 Fault addr = 0x0
 dar = 0x0
 trap = 0x300
 Op-Code [nip - 4] = 0x88090000 at address = 0x10000760
 Failed Op-code    = 0x98080000 at address = 0x10000764
 Op-Code [nip + 1] = 0x396b0001 at address = 0x10000768
***GPR values are the time of fault***
 Gpr[0] = 0x4d
 Gpr[1] = 0xffffe070
 Gpr[2] = 0x4001ee20
 Gpr[3] = 0x0
 Gpr[4] = 0xffffdf30
 Gpr[5] = 0x0
 Gpr[6] = 0xffffe110
 Gpr[7] = 0xffffe114
 Gpr[8] = 0x0
 Gpr[9] = 0xffffe120
 Gpr[10] = 0x0

The sample program has also dumped the general purpose register values in this case. One way of debugging this failure is to do an object dump of the executable and save it in a file, then search for the failed instruction (in this case, the failed opcode is 98080000).


Listing 13. Object dump file

<Search output for the opcode "98080000">

10000744:       48 01 07 21     bl      10010e64 <__bss_start+0x48>
 *p++ = *q++;
10000748:       38 df 00 a0     addi    r6,r31,160
1000074c:       81 46 00 00     lwz     r10,0(r6)
10000750:       38 ff 00 a4     addi    r7,r31,164
10000754:       81 67 00 00     lwz     r11,0(r7)
10000758:       7d 48 53 78     mr      r8,r10
1000075c:       7d 69 5b 78     mr      r9,r11
10000760:       88 09 00 00     lbz     r0,0(r9)
10000764:       98 08 00 00     stb     r0,0(r8)       <==Failed instruction
10000768:       39 6b 00 01     addi    r11,r11,1
1000076c:       91 67 00 00     stw     r11,0(r7)
10000770:       39 4a 00 01     addi    r10,r10,1
10000774:       91 46 00 00     stw     r10,0(r6)
 return 0;
10000778:       38 00 00 00     li      r0,0
}

Since the program has been compiled with the -g option, the object dump contains the source listing. The failed instruction here is stb. The process tried to store a byte from the register r0 to a memory location pointed to by the register r8, but the register r8 has a value of 0x0 -- which can be seen from the gpr value dump by the handler code and which is the cause of this signal generation.


Resources

Learn

Get products and technologies

  • Download KGDB, a source-level debugger for the Linux kernel that is used along with GDB to debug the kernel.

  • Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

  • Build your next development project on Linux with IBM trial software, available for download directly from developerWorks.


Discuss

About the author

Madhavan Srinivasan has a Bachelors of Engineering degree in Electrical and Electronics from Madras University, India. He has been working for IBM Global Services (Software Labs), India, since November 2003. As a member of the PowerPC Tools Development Team at IBM, he concentrates on designing and developing diagnostic tools for PowerPC server processors on Linux and AIX. Other areas of interest are PowerPC architecture and operating system internals.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Multicore acceleration
ArticleID=99539
ArticleTitle=Signals as a Linux debugging tool
publish-date=11292005
author1-email=masriniv@in.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers