Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Solutions for tracing UNIX applications

Martin Brown (mc@mcslp.com), Professional writer, Freelance
Martin Brown has been a professional writer for over eight years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Mac OS/X and more -- as well as Web programming, systems management and integration. Martin is a regular contributor to ServerWatch.com, LinuxToday.com and IBM developerWorks, and a regular blogger at Computerworld, The Apple Blog and other sites, as well as a Subject Matter Expert (SME) for Microsoft. He can be contacted through his Web site at http://www.mcslp.com.

Summary:  If you are developing a UNIX® application, then you can trace and debug the running application and extract the information you need from it. But what if you want to know what is going on inside a UNIX application and you don't have access to the source code?This tutorial looks at some systems that enable you to trace the execution of applications and work out what they are doing without having to make any modifications to the source code, and even without having to stop and restart the application.

Date:  31 Mar 2009
Level:  Intermediate PDF:  A4 and Letter (66 KB | 24 pages)Get Adobe® Reader®

Activity:  32633 views
Comments:  

Using truss and strace

The truss tool is available on Solaris and AIX and provides the ability to trace system calls and signals within an application. The strace tool, available on Linux® provides similar functionality. On different systems, there are also tools that provide similar information, including ktrace (FreeBSD) and trace.

Overview of truss/strace

The truss and strace tools both provide similar levels of information, although the command-line options may be slightly different in each case. With both tools, the standard way to use them is to prefix the tool to the command you would ordinarily execute.

For example, Listing 3 shows the output of truss on the ageindays program mentioned earlier in this tutorial.


Listing 3. Output of truss

$ truss ./ageindays 24/1/1980 26/3/2009
execve("ageindays", 0x08047BBC, 0x08047BCC)  argc = 3
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) 
   = 0xFEFB0000
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
getcwd("/root", 1013)                           = 0
resolvepath("/root/ageindays", "/root/ageindays", 1023) = 15
xstat(2, "/root/ageindays", 0x08047880)         = 0
open("/var/ld/ld.config", O_RDONLY)             = 3
fxstat(2, 3, 0x08047760)                        = 0
mmap(0x00000000, 144, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFA0000
close(3)                                        = 0
sysconfig(_CONFIG_PAGESIZE)                     = 4096
xstat(2, "/usr/lib/libc.so.1", 0x08046FA0)      = 0
resolvepath("/usr/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/usr/lib/libc.so.1", O_RDONLY)            = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) 
    = 0xFEF90000
mmap(0x00010000, 1413120, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0)
    = 0xFEE30000
mmap(0xFEE30000, 1302809, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) 
    = 0xFEE30000
mmap(0xFEF7F000, 30862, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|
    MAP_INITDATA, 3, 1306624) = 0xFEF7F000
mmap(0xFEF87000, 4776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 
   -1, 0) = 0xFEF87000
munmap(0xFEF6F000, 65536)                       = 0
memcntl(0xFEE30000, 187632, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, 
   -1, 0)
    = 0xFEE20000
munmap(0xFEF90000, 32768)                       = 0
getcontext(0x080475D0)
getrlimit(RLIMIT_STACK, 0x080475C8)             = 0
getpid()                                        = 15691 [15690]
lwp_private(0, 1, 0xFEE22A00)                   = 0x000001C3
setustack(0xFEE22A60)
sysi86(SI86FPSTART, 0xFEF879BC, 0x0000133F, 0x00001F80) = 0x00000001 
ioctl(1, TCGETA, 0x08046C20)                    = 0
fstat64(1, 0x08046B80)                          = 0
You have been alive 10654 days
write(1, " Y o u   h a v e   b e e".., 31)      = 31
You were born on 24/1/1980 which is a Thursday
write(1, " Y o u   w e r e   b o r".., 47)      = 47
_exit(134511508)

By comparison, Listing 4 shows the output from strace on Linux.


Listing 4. Output from strace

$ strace ./ageindays 24/1/1980 26/3/2009
execve("./ageindays", ["./ageindays", "24/1/1980", "26/3/2009"], 
[/* 50 vars */]) = 0
brk(0)                                  = 0x602000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db185000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
    = 0x7f47db184000
access("/etc/ld.so.preload", R_OK)      
    = -1 ENOENT (No such file or directory)
open("/usr/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls", 0x7fffe31858f0)    
    = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/lib/libc.so.6", O_RDONLY)    
    = -1 ENOENT (No such file or directory)
stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=53248, ...}) = 0
open("/usr/local/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/tls", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/usr/local/qt/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/qt/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib/tls", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/qt/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/local/qt/lib/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/local/qt/lib", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/tls/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib/tls/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/tls/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib/tls", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/x86_64/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib/x86_64", 0x7fffe31858f0) 
    = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/libc.so.6", O_RDONLY) 
    = -1 ENOENT (No such file or directory)
stat("/usr/X11R6/lib", 0x7fffe31858f0)  
    = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=85050, ...}) = 0
mmap(NULL, 85050, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f47db16f000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\345\1\0\0\0\0\0@"...,
    832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1330352, ...}) = 0
mmap(NULL, 3437208, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) 
    = 0x7f47dac24000
mprotect(0x7f47dad63000, 2093056, PROT_NONE) = 0
mmap(0x7f47daf62000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE,
    3, 0x13e000) = 0x7f47daf62000
mmap(0x7f47daf67000, 17048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 
   -1, 0) = 0x7f47daf67000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db16e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db16d000
arch_prctl(ARCH_SET_FS, 0x7f47db16d6f0) = 0
mprotect(0x7f47daf62000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f47db186000, 4096, PROT_READ) = 0
munmap(0x7f47db16f000, 85050)           = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
    = 0x7f47db183000
write(1, "You have been alive 10654 days\n"..., 31You have been alive 10654 days
) = 31
write(1, "You were born on 24/1/1980 which "..., 47You were born on 24/1/1980 
   which is a Thursday
) = 47
exit_group(0)  

In both cases, each line of output equates to the application executing one function call and shows the arguments to the function and the return value of the function call. Unlike the debugging example, each function call listed is a function within the system or system libraries and therefore represents a lower-level interface to the functions called. For example, opening a file within an application may use the fpopen() function within C or C++, but this is in fact a wrapper to the more primitive open() function.

You don't need to understand the specifics of each function to get a general idea of what the applications are doing. Many of the lines of the output are related to the initialization that the operating system applies in order to load and execute the program. The fundamentals of the two traces are the same:

  • The execve() function is called to start a new program.
  • Libraries for the program are loaded. With the Solaries output, the libraries first look for the use of resolvepath() and then open using open(). For Linux, stat() is used to check whether the library exists, and then open() is used to open it.
  • Some memory is reserved and allocated for the process. Some of this will be the stack space reserved for the application, some will be used to hold the program, and others to hold the variables used by the program.
  • Finally, the program is executed, and the write() function is called to print out the age and day of the birth information.

If you perform a trace and want to understand the specifics of each step, use the man command to read the manual page for each function.


Identifying application startup issues

A typical problem when starting an application is that the program fails to initialize properly, but terminates with an incomplete or misleading error message. Running a trace on the application can often highlight the problem. For example, Listing 5 shows how a test application has failed.


Listing 5. Application fails

$ ./errnoacc 
ERROR: Application failed to initialize

The error message provides no specific information about why the application failed to start. In this case, the problem has been deliberately introduced, but the same issue could occur with any command or application you were using, and the error message could be equally misleading or sometimes non-existent.

Running a trace on the application might give us some more clues (see Listing 6).


Listing 6. Running a trace

$ truss ./errnoacc
execve("errnoacc", 0x08047B20, 0x08047B28)  argc = 1
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) 
    = 0xFEFB0000
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
getcwd("/export/home/mc", 1014)                 = 0
resolvepath("/export/home/mc/errnoacc", "/export/home/mc/errnoacc", 1023) = 24
xstat(2, "/export/home/mc/errnoacc", 0x080477E4) = 0
open("/var/ld/ld.config", O_RDONLY)             = 3
fxstat(2, 3, 0x080476C4)                        = 0
mmap(0x00000000, 144, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFA0000
close(3)                                        = 0
sysconfig(_CONFIG_PAGESIZE)                     = 4096
xstat(2, "/usr/lib/libc.so.1", 0x08046F04)      = 0
resolvepath("/usr/lib/libc.so.1", "/lib/libc.so.1", 1023) = 14
open("/usr/lib/libc.so.1", O_RDONLY)            = 3
mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEF90000
mmap(0x00010000, 1413120, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, 
    -1, 0) = 0xFEE30000
mmap(0xFEE30000, 1302809, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) 
    = 0xFEE30000
mmap(0xFEF7F000, 30862, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|
    MAP_INITDATA, 3, 1306624) = 0xFEF7F000
mmap(0xFEF87000, 4776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 
    -1, 0) = 0xFEF87000
munmap(0xFEF6F000, 65536)                       = 0
memcntl(0xFEE30000, 187632, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
close(3)                                        = 0
mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN,
    -1, 0) = 0xFEE20000
munmap(0xFEF90000, 32768)                       = 0
getcontext(0x08047534)
getrlimit(RLIMIT_STACK, 0x0804752C)             = 0
getpid()                                        = 15727 [15726]
lwp_private(0, 1, 0xFEE22A00)                   = 0x000001C3
setustack(0xFEE22A60)
sysi86(SI86FPSTART, 0xFEF879BC, 0x0000133F, 0x00001F80) = 0x00000001
open("/etc/shadow", O_RDONLY)                   Err#13 EACCES [file_dac_read]
ioctl(1, TCGETA, 0x08046BB0)                    = 0
fstat64(1, 0x08046B10)                          = 0
ERROR: Application failed to initialize
write(1, " E R R O R :   A p p l i".., 40)      = 40
_exit(0)

The offending issue is in this line: open("/etc/shadow", O_RDONLY) Err#13 EACCES [file_dac_read].

Here the application is trying to open a file to which the user running the application has no access because the file permissions have secured the file. The application then terminates because the file cannot be opened, writing the error message in the process.


Tracing live applications

Often when you want to trace an application, it is because the application has already been started, and you want to find out why the application is not working. Like the initialization example, often the error message or other information provided by the application may not describe the exact problem.

Locks, or attempting to access a resource that is currently in use by another process, can cause an application to apparently freeze and become unresponsive.

Both strace and truss provide the ability to "attach" to an already-running process. Attaching to the process works like running the process from the command line, thus producing a list of the system functions that are being executed by the program. The actual trace starts from the function being executed when the trace was started; a program that has "hung" during execution should show you the function the program is waiting to complete.

To trace a running program, you need to specify the Process ID (PID) of the process that you want to trace. For example, in Listing 6, the program being traced has stopped but not reported an error. Here, the ps tool has been used to determine the running processes (see Listing 7).


Listing 7. The ps tool is used to determine the running processes

$ ps -ef|grep errlock
      mc 15779 15747   0 18:26:59 pts/2       0:00 ./errlock
      mc 15742   680   0 18:26:36 pts/3       0:00 ./errlock
      mc 15817 15784   0 18:28:44 pts/4       0:00 grep errlock
      mc 15734   680   0 18:25:00 pts/3       0:01 /usr/bin/emacs-nox errlock.c
$ truss -p 15779
fcntl(3, F_SETLKW, 0x08047AC4)  (sleeping...)

From this output, you can see that the fcntl() function has been called to a set a lock on a file. In this case, the function has been set to wait until the lock has been set before continuing. Unfortunately, another process already has locked this file, so the second application will wait until the first application has finished with the file and has released the lock.

The limitation of truss in this instance is that you cannot tell what file is locked, or what file is currently locked and is holding up the execution of the second program. This is because the tracing process did not start until after the function call that opened the file had already been called. Both truss and strace only trace the functions as they are being executed; they cannot go back in time and work out the functions already called.


Getting a stack trace

As you have seen, truss can be useful when you want to monitor an entire program, but may have limited use when a program has already been started. If you are using a SVR4-based UNIX, such as Solaris or AIX, the pstack command may help.

The pstack command is actually part of a wider set of commands that output information on running processes. Other tools in the set include pfiles, which outputs the list of files used by a process, and psig, which show the list of signals and signal handlers.

To use each command, you specify the PID of the process. The pstack command outputs the call stack for a running process, showing the list of functions called before the process reached the current function. For example, using pstack on the process waiting on a locked file produces the following from Listing 8.


Listing 8. Using pstack on the process waiting on a locked file

$ pstack 15828
15828:  ./errlock
 feef0877 fcntl    (3, 7, 8047ac4)
 feedcd49 fcntl    (3, 7, 8047ac4, 8050e74) + 91
 08050f10 main     (1, 8047b24, 8047b2c) + d8
 08050cdc _start   (1, 8047c08, 0, 8047c12, 8047c7d, 8047c8e) + 80

In this case, it doesn't provide you with the information you need. Now try pfiles (see Listing 9).


Listing 9. Using pfiles

$ pfiles 15856
15856:  ./errlock
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0620 dev:292,0 ino:989038936 uid:101 gid:7 rdev:24,3
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/3
   1: S_IFCHR mode:0620 dev:292,0 ino:989038936 uid:101 gid:7 rdev:24,3
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/3
   2: S_IFCHR mode:0620 dev:292,0 ino:989038936 uid:101 gid:7 rdev:24,3
      O_RDWR|O_NOCTTY|O_LARGEFILE
      /dev/pts/3
   3: S_IFREG mode:0666 dev:182,65545 ino:198 uid:101 gid:10 size:0
      O_RDWR
      advisory write lock set by process 15828
      /export/home/mc/lockdemo

This time the output is more useful. You can see that the file that has been opened by the process is called lockdemo, and since truss showed that the file is waiting for a lock, it is probably this file causing the problem.

Both truss and strace are examples of passive tracing. You can watch the functions as they are executed, but you can't extract more detail about what is going on, or make more detailed decisions about what to trace, and what information to output when the trace occurs.

3 of 9 | Previous | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=379556
TutorialTitle=Solutions for tracing UNIX applications
publish-date=03312009
author1-email=mc@mcslp.com
author1-email-cc=mmccrary@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.