Skip to main content

skip to main content

developerWorks  >  AIX and UNIX  >

Solutions for tracing UNIX applications

developerWorks
Go to the previous pagePage 3 of 10 Go to the next page

Document options
PDF format - Fits A4 and Letter

PDF - Fits A4 and Letter
66 KB (24 pages)

Get Adobe® Reader®

Sample code


My developerWorks needs you!

Connect to your technical community


Rate this tutorial

Help us improve this content


Using truss and strace

The truss tool is available on Solaris and AIX and provides the ability to trace system calls and signals within an application. The strace tool, available on Linux® provides similar functionality. On different systems, there are also tools that provide similar information, including ktrace (FreeBSD) and trace.

Overview of truss/strace

The truss and strace tools both provide similar levels of information, although the command-line options may be slightly different in each case. With both tools, the standard way to use them is to prefix the tool to the command you would ordinarily execute.

For example, Listing 3 shows the output of truss on the ageindays program mentioned earlier in this tutorial.


Listing 3. Output of truss

$ truss ./ageindays 24/1/1980 26/3/2009
execve("ageindays", 0x08047BBC, 0x08047BCC)  argc = 3
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) 
   = 0xFEFB0000
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
getcwd("/root", 1013)                           = 0
resolvepath("/root/ageindays", "/root/ageindays", 1023) = 15
xstat(2, "/root/ageindays", 0x08047880)         = 0
open("/var/ld/ld.config", O_RDONLY)             = 3
fxstat(2, 3, 0x08047760)                        = 0
mmap(0x00000000, 144, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFA0000
close(3)                                        = 0
sysconfig(_CONFIG_PAGESIZE)                     = 4096
xstat(2, "/usr/lib/libc.so.1", 0x08046FA0)      = 0
resolvepath("/usr/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/usr/lib/libc.so.1", O_RDONLY)            = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) 
    = 0xFEF90000
mmap(0x00010000, 1413120, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0)
    = 0xFEE30000
mmap(0xFEE30000, 1302809, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) 
    = 0xFEE30000
mmap(0xFEF7F000, 30862, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|
    MAP_INITDATA, 3, 1306624) = 0xFEF7F000
mmap(0xFEF87000, 4776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 
   -1, 0) = 0xFEF87000
munmap(0xFEF6F000, 65536)                       = 0
memcntl(0xFEE30000, 187632, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, 
   -1, 0)
    = 0xFEE20000
munmap(0xFEF90000, 32768)                       = 0
getcontext(0x080475D0)
getrlimit(RLIMIT_STACK, 0x080475C8)             = 0
getpid()                                        = 15691 [15690]
lwp_private(0, 1, 0xFEE22A00)                   = 0x000001C3
setustack(0xFEE22A60)
sysi86(SI86FPSTART, 0xFEF879BC, 0x0000133F, 0x00001F80) = 0x00000001 
ioctl(1, TCGETA, 0x08046C20)                    = 0
fstat64(1, 0x08046B80)                          = 0
You have been alive 10654 days
write(1, " Y o u   h a v e   b e e".., 31)      = 31
You were born on 24/1/1980 which is a Thursday
write(1, " Y o u   w e r e   b o r".., 47)      = 47
_exit(134511508)

By comparison, Listing 4 shows the output from strace on Linux.


Listing 4. Output from strace

$ strace ./ageindays 24/1/1980 26/3/2009
execve("./ageindays", ["./ageindays", "24/1/1980", "26/3/2009"], 
[/* 50 vars */]) = 0
brk(0)                                  = 0x602000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db185000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
    = 0x7f47db184000
access("/etc/ld.so.preload", R_OK)      
    = -1 ENOENT (No such file or directory)
open("/usr/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls", 0x7fffe31858f0)    
    = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/lib/libc.so.6", O_RDONLY)    
    = -1 ENOENT (No such file or directory)
stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=53248, ...}) = 0
open("/usr/local/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/usr/local/qt/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/qt/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib/tls", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/qt/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/qt/lib/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib/tls", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib", 0x7fffe31858f0)  
    = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=85050, ...}) = 0
mmap(NULL, 85050, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f47db16f000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\345\1\0\0\0\0\0@"...,
    832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1330352, ...}) = 0
mmap(NULL, 3437208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) 
    = 0x7f47dac24000
mprotect(0x7f47dad63000, 2093056, PROT_NONE) = 0
mmap(0x7f47daf62000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE,
    3, 0x13e000) = 0x7f47daf62000
mmap(0x7f47daf67000, 17048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 
   -1, 0) = 0x7f47daf67000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db16e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db16d000
arch_prctl(ARCH_SET_FS, 0x7f47db16d6f0) = 0
mprotect(0x7f47daf62000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f47db186000, 4096, PROT_READ) = 0
munmap(0x7f47db16f000, 85050)           = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db183000
write(1, "You have been alive 10654 days\n"..., 31You have been alive 10654 days
) = 31
write(1, "You were born on 24/1/1980 which "..., 47You were born on 24/1/1980 
   which is a Thursday
) = 47
exit_group(0)  

In both cases, each line of output equates to the application executing one function call and shows the arguments to the function and the return value of the function call. Unlike the debugging example, each function call listed is a function within the system or system libraries and therefore represents a lower-level interface to the functions called. For example, opening a file within an application may use the fpopen() function within C or C++, but this is in fact a wrapper to the more primitive open() function.

You don't need to understand the specifics of each function to get a general idea of what the applications are doing. Many of the lines of the output are related to the initialization that the operating system applies in order to load and execute the program. The fundamentals of the two traces are the same:

  • The execve() function is called to start a new program.
  • Libraries for the program are loaded. With the Solaries output, the libraries first look for the use of resolvepath() and then open using open(). For Linux, stat() is used to check whether the library exists, and then open() is used to open it.
  • Some memory is reserved and allocated for the process. Some of this will be the stack space reserved for the application, some will be used to hold the program, and others to hold the variables used by the program.
  • Finally, the program is executed, and the write() function is called to print out the age and day of the birth information.

If you perform a trace and want to understand the specifics of each step, use the man command to read the manual page for each function.



Back to top


Identifying application startup issues

A typical problem when starting an application is that the program fails to initialize properly, but terminates with an incomplete or misleading error message. Running a trace on the application can often highlight the problem. For example, Listing 5 shows how a test application has failed.


Listing 5. Application fails

$ ./errnoacc 
ERROR: Application failed to initialize

The error message provides no specific information about why the application failed to start. In this case, the problem has been deliberately introduced, but the same issue could occur with any command or application you were using, and the error message could be equally misleading or sometimes non-existent.

Running a trace on the application might give us some more clues (see Listing 6).


Listing 6. Running a trace

$ truss ./errnoacc
execve("errnoacc", 0x08047B20, 0x08047B28)  argc = 1
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) 
    = 0xFEFB0000
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
getcwd("/export/home/mc", 1014)                 = 0
resolvepath("/export/home/mc/errnoacc", "/export/home/mc/errnoacc", 1023) = 24
xstat(2, "/export/home/mc/errnoacc", 0x080477E4) = 0
open("/var/ld/ld.config", O_RDONLY)             = 3
fxstat(2, 3, 0x080476C4)                        = 0
mmap(0x00000000, 144, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFA0000
close(3)                                        = 0
sysconfig(_CONFIG_PAGESIZE)                     = 4096
xstat(2, "/usr/lib/libc.so.1", 0x08046F04)      = 0
resolvepath("/usr/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/usr/lib/libc.so.1", O_RDONLY)            = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEF90000
mmap(0x00010000, 1413120, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 
    -1, 0) = 0xFEE30000
mmap(0xFEE30000, 1302809, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) 
    = 0xFEE30000
mmap(0xFEF7F000, 30862, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|
    MAP_INITDATA, 3, 1306624) = 0xFEF7F000
mmap(0xFEF87000, 4776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 
    -1, 0) = 0xFEF87000
munmap(0xFEF6F000, 65536)                       = 0
memcntl(0xFEE30000, 187632, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN,
    -1, 0) = 0xFEE20000
munmap(0xFEF90000, 32768)                       = 0
getcontext(0x08047534)
getrlimit(RLIMIT_STACK, 0x0804752C)             = 0
getpid()                                        = 15727 [15726]
lwp_private(0, 1, 0xFEE22A00)                   = 0x000001C3
setustack(0xFEE22A60)
sysi86(SI86FPSTART, 0xFEF879BC, 0x0000133F, 0x00001F80) = 0x00000001
open("/etc/shadow", O_RDONLY)                   Err#13 EACCES [file_dac_read]
ioctl(1, TCGETA, 0x08046BB0)                    = 0
fstat64(1, 0x08046B10)                          = 0
ERROR: Application failed to initialize
write(1, " E R R O R :   A p p l i".., 40)      = 40
_exit(0)

The offending issue is in this line: open("/etc/shadow", O_RDONLY) Err#13 EACCES [file_dac_read].

Here the application is trying to open a file to which the user running the application has no access because the file permissions have secured the file. The application then terminates because the file cannot be opened, writing the error message in the process.



Back to top


Tracing live applications

Often when you want to trace an application, it is because the application has already been started, and you want to find out why the application is not working. Like the initialization example, often the error message or other information provided by the application may not describe the exact problem.

Locks, or attempting to access a resource that is currently in use by another process, can cause an application to apparently freeze and become unresponsive.

Both strace and truss provide the ability to "attach" to an already-running process. Attaching to the process works like running the process from the command line, thus producing a list of the system functions that are being executed by the program. The actual trace starts from the function being executed when the trace was started; a program that has "hung" during execution should show you the function the program is waiting to complete.

To trace a running program, you need to specify the Process ID (PID) of the process that you want to trace. For example, in Listing 6, the program being traced has stopped but not reported an error. Here, the ps tool has been used to determine the running processes (see Listing 7).


Listing 7. The ps tool is used to determine the running processes

$ ps -ef|grep errlock
      mc 15779 15747   0 18:26:59 pts/2       0:00 ./errlock
      mc 15742   680   0 18:26:36 pts/3       0:00 ./errlock
      mc 15817 15784   0 18:28:44 pts/4       0:00 grep errlock
      mc 15734   680   0 18:25:00 pts/3       0:01 /usr/bin/emacs-nox errlock.c
$ truss -p 15779
fcntl(3, F_SETLKW, 0x08047AC4)  (sleeping...)

From this output, you can see that the fcntl() function has been called to a set a lock on a file. In this case, the function has been set to wait until the lock has been set before continuing. Unfortunately, another process already has locked this file, so the second application will wait until the first application has finished with the file and has released the lock.

The limitation of truss in this instance is that you cannot tell what file is locked, or what file is currently locked and is holding up the execution of the second program. This is because the tracing process did not start until after the function call that opened the file had already been called. Both truss and strace only trace the functions as they are being executed; they cannot go back in time and work out the functions already called.



Back to top


Getting a stack trace

As you have seen, truss can be useful when you want to monitor an entire program, but may have limited use when a program has already been started. If you are using a SVR4-based UNIX, such as Solaris or AIX, the pstack command may help.

The pstack command is actually part of a wider set of commands that output information on running processes. Other tools in the set include pfiles, which outputs the list of files used by a process, and psig, which show the list of signals and signal handlers.

To use each command, you specify the PID of the process. The pstack command outputs the call stack for a running process, showing the list of functions called before the process reached the current function. For example, using pstack on the process waiting on a locked file produces the following from Listing 8.


Listing 8. Using pstack on the process waiting on a locked file

$ pstack 15828
15828:  ./errlock
 feef0877 fcntl    (3, 7, 8047ac4)
 feedcd49 fcntl    (3, 7, 8047ac4, 8050e74) + 91
 08050f10 main     (1, 8047b24, 8047b2c) + d8
 08050cdc _start   (1, 8047c08, 0, 8047c12, 8047c7d, 8047c8e) + 80

In this case, it doesn't provide you with the information you need. Now try pfiles (see Listing 9).


Listing 9. Using pfiles

$ pfiles 15856
15856:  ./errlock
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0620 dev:292,0 ino:989038936 uid:101 gid:7 rdev:24,3
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/3
   1: S_IFCHR mode:0620 dev:292,0 ino:989038936 uid:101 gid:7 rdev:24,3
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/3
   2: S_IFCHR mode:0620 dev:292,0 ino:989038936 uid:101 gid:7 rdev:24,3
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/3
   3: S_IFREG mode:0666 dev:182,65545 ino:198 uid:101 gid:10 size:0
      O_RDWR
      advisory write lock set by process 15828
      /export/home/mc/lockdemo

This time the output is more useful. You can see that the file that has been opened by the process is called lockdemo, and since truss showed that the file is waiting for a lock, it is probably this file causing the problem.

Both truss and strace are examples of passive tracing. You can watch the functions as they are executed, but you can't extract more detail about what is going on, or make more detailed decisions about what to trace, and what information to output when the trace occurs.



Back to top



Go to the previous pagePage 3 of 10 Go to the next page