Skip to main content

Dynamic tracing facility

Manjunath S. Karikatti (mkarikat@in.ibm.com), Software Engineer, IBM
Manjunath Karikatti is a Software Engineer IBM at the IBM India Software Labs. He is currently working on the TXSeries development team.

Summary:  Problem determinations of the failures that occur in production environment pose a difficult challenge because of the complex environment and numerous different software stacks that interact with each other.This article provides a brief introduction to the Solaris dynamic trace facility and the ProbeVue facility on AIX® and illustrates its use with examples.

Date:  21 Jul 2009
Level:  Introductory PDF:  A4 and Letter (31KB | 10 pages)Get Adobe® Reader®
Activity:  2966 views

Introduction

When problems cannot be replicated in non-production environments, it becomes important to find the root cause by applying various problem determination techniques without affecting the performance and stability of live system.

Various tools and techniques are available on different UNIX® platforms, but not all of these can be used to diagnose the problems on live production systems because of performance and stability concerns.

Solaris and AIX platforms provide the dynamic tracing facility, which is intended for problem diagnosis on production systems.


Dtrace dynamic trace facility on Solaris

Dtrace, the dynamic tracing facility introduced in Solaris 10, provides a debugging tool to debug the problems that are difficult to diagnose using traditional problem determination mechanisms.

Dtrace enables the user to dynamically modify the operating system kernel and user processes to record additional data. The recorded data can be used by the administrators and developers to examine both applications as well as the operating system to track the cause of abnormal behavior of the live system.

Terminologies

  • Probe: Location of user interest either in kernel or in application from where data needs to be recorded.
  • Provider: Kernel module that performs necessary instrumentation to create probes.
  • Predicate: Logical expression that modifies the control flow to conditionally trace the data.

D language

Dtrace dynamic tracing facility uses a language, the D language, that users can associate action with probe points and specify when and what additional data to record when a probe is fired. The dtrace command interprets the scripts written using the D language.

The language supports C-like data objects like integer, char, short, long, and more. It also provides a superset of ANSI-C operators that can be used to manipulate the data objects.

D language structure

The following example shows the structure of D language:

probe1
/ predicate /
{  
 action
}
probe2
/ predicate /
{
   action  
}

The D language structure consists of one or more probe clauses. Each clause can have certain actions to be performed by the probe, upon satisfying the predicate condition. Actions and predicate are optional.

Probes are scattered all over the Solaris kernel at various locations. In order to collect data from a location of interest, a user has to enable the required probe at that location. Any actions that are specified are recorded when the probe fires.

A probe can be enabled using a command-line utility called dtrace and a script written in the D language.

Probe is made of unique four tuples, as shown below:

provider:module:function:name
  • name: is the name of the probe
  • function: is the function being probed
  • module: is the module to which the function belongs
  • provider: is the kernel module that instruments the probe

The dtrace –l command lists all the available probes on the system.

# dtrace -l  
   ID   PROVIDER   MODULE            FUNCTION NAME
    1     dtrace    -                    BEGIN
    2     dtrace    -                    END
    3     dtrace    -                    ERROR
    4     vminfo    fasttrap        fasttrap_uwritesoftlock
    5     vminfo    fasttrap        fasttrap_uread softlock
    6        fbt    pfil            pfil_list_add entry
    7        fbt    pfil            pfil_list_add return

The dtrace utility interprets the scripts written in D language, enables the probes specified, and records the data of interest, as mentioned in the script when the probe gets fired.

A D script can be executed using the dtrace command as

dtrace -s mydscript.d or script can be directly executed by inserting the line

#!/usr/sbin/dtrace -s at the beginning to the script.


ProbeVue dynamic trace facility on AIX

ProbeVue is a dynamic tracing facility introduced in AIX 6.1. It is equivalent to the dtrace facility available in the Solaris operating system.

Terminologies

probe: Interrupting normal system action to obtain information about current context.

probe point: Identifies a point during normal system activity capable of being probed.

Enabling/disable probe: Attach/remove probe to a probe point.

Probe Location: Location in user/kernel space where tracing action is to be performed.

Probe Event: Perform tracing action when this event occurs.

Predicate: Condition to be satisfied to perform the tracing action.

The probe manager: Software code that defines and provides a set of probe points of the same probe type, such as the system call probe manager.

The ProvbVue facility uses a programming language called 'Vue' and a the ProbeVue command, which starts a trace session and interprets vue scripts.

The Vue script or Vue program can be used to :

  • Identify the probe points where a probe is to be dynamically enabled.
  • Specify the conditions, if any, that must be satisfied for the actions to be executed when a probe fires.
  • Specify the actions to be executed, including what trace data to capture.

Overview of the Vue language

A probe script is made up of a declaration section (optional), followed by one or more clauses. Probe script layout is:

declaration statement 1;  /* declaration */
declaration statement 2; /* declaration */
@@BEGIN		/* BEGIN clause optional */
{
     statement 1;
     statement 2;
}
/* Probe clause  start */
<probe point tuple>,<proble point tuple>  /* probe point specification statement*/
when(<predicate>)   /* optional predicate */
{
      statement 1;     /* action block */
      statement 2;
}
/* Probe clause end */

The declaration clause contains declarative statements that declare local and global variables to be used in subsequent clauses.

It may contain:

  • _ Type, structure, union, and enum definitions.
  • _ Function prototype declarations for functions being probed.
  • _ Declarations for global and kernel variables accessed in the script.

The Vue language supports most of the C operations, C-89 data types, has its own built-in variables(_pid, _tid) and functions(stacktrace, timestamp).

For example:

int count;     /* user defined global counter variable */

Each probe clause consists of a probe point specification statement, action block, and an optional predicate.

The probe point specification identifies a code location whose execution or an event whose occurrence should trigger the probe actions.

AIX 6.1.0 supports the following probe types:

  • User Function Entry probes (or uft probes)
  • System Call Entry/Exit probes (or syscall probes)
  • Probes that fire at specific time intervals (or interval probes)

Probe point specification format
@@<probetype>:<one or more probetype-specific fields separated by colons>

Each probe point tuple is at least a 3-tuple, that is, it has a minimum of three fields.

  • The first field always identifies the probe type and thus its probe manager.
  • For probe managers that support process-specific tracing, the second field must be a process ID.
  • For probe managers that support function entry or exit probes, the location field (the last field) must use the "entry or "exit keyword.
  • Fields are separated by the colon (":") symbol.
  • An asterisk or the "*" symbol for a field in the probe point tuple indicates that it matches any possible value for that field.

For example:

- @@uft:34568:*:foo:entry

Probe at entry into any function called foo() in process with ID = 34568. The asterisk in the third field indicates that the function foo() is to be probed if it exists in any module of the process.

- @@syscall:*:read:exit

Probe at exit of read system call. The asterisk indicates that the read system call for all processes are to be probed.

- @@interval:*:clock:500

Probe to fire every 500 milliseconds (wall clock time). The asterisk is a placeholder for supporting finer probe points in the future.


Scenarios depicting usage of Solaris dtrace

Scenario 1

The customer has reported a memory leak while running his application (assume a C application). This particular issue can be hard to recreate for the technical support team because it is difficult to replicate the customer environment in the labs.

The following example shows the use of a D script, which can assist in narrowing down the cause of the memory leak without modifying the application or system environment and without bringing down the live system. The example can be used with any C application that uses malloc and free routines to allocate and free the process memory.

To use dtrace to find the probable memory leak in the application, create a D script called malloc.d with the following contents:

pid$target::malloc:entry
{
    printf("malloc: PID %d requested %d bytes\n",$target,arg0);
}
pid$target::malloc:return
{
    printf("malloc: PID %d returned address 0X%x\n",$target,arg1);
}
pid$target::free:entry
{
    printf("free: PID %d freeing address 0X%x \n",$target,arg0);
}

Invoke the application (myapp, source myapp.c) with the dtrace utility:

dtrace –c ./myapp –s malloc.d

The output will look as below:

dtrace: script 'malloc.d' matched 20 probes
dtrace: pid 6592 exited with status 1
CPU     ID       FUNCTION:NAME
  0  39972       malloc:entry malloc: PID 6592 requested 100 bytes

  0  39974       malloc:return malloc: PID 6592 returned address 0X20fa0

  0  39972       malloc:entry malloc: PID 6592 requested 100 bytes

  0  39974       malloc:return malloc: PID 6592 returned address 0X21010

  0  39981       free:entry free: PID 6592 freeing address 0X20fa0

 0  39981       free:entry free: PID 6592 freeing address 0X21010

The output can be redirected and parsed using a script to check if the application has not freed any memory that was allocated. The malloc.d script captures all malloc calls executed by the application. Further, by adding the ustack() function in the entry probe in the previous D script displays the stack trace back for the malloc call, which helps to identify location in code that is causing the leak.

This example uses the PID provider. The name of the PID probe provider is formed by suffixing process ID to the string 'pid.' So, for process 1234, the PID provider will be pid1234.

arg0 is the argument passed to the function being probed for entry probe.

arg1 is the return value from the function being probed for return probe.

In the malloc.d script, the $target will be replaced by the process ID of the application myapp at runtime.

If the user needs to track malloc and free calls in a particular user function, the malloc.d script can be modified as shown below. If the probe specifier of the module part of tuple is not specified, the probe is applicable for all modules linked to the application.

pid$target::functionXX:entry
{
    traceflag=1;
}
pid$target::functionXX:return
{
    traceflag=0;
}
pid$target::malloc:entry
/traceflag != 0/
{
    printf("malloc: PID %d requested %d bytes\n",$target,arg0);
}
pid$target::malloc:return
/traceflag != 0/
{
    printf("malloc: PID %d returned address 0X%x\n",$target,arg1);
}
pid$target::free:entry
/traceflag != 0/
{
    printf("free: PID %d freeing address 0X%x \n",$target,arg0);
}

Scenario 2

In a middleware environment (for example, WebSphere® or TXSeries®), the business logic of the application is run in an application server. Quite often, customers running their application in such a complex environment complain about slower response time. In order to identify any performance bottlenecks, it is important to know the time spent by software that the application is interacting with while servicing an end-user request.

Middleware software as the hosting environment can provide the necessary code to track the time spent in different software stacks. However, this could introduce an overhead.

The dtrace facility can be used for this purpose without affecting the performance and without modifying the application or middleware software. It can be used to track the entry and exit timestamps of various software modules that the application will interact with in order to narrow down the software causing the bottleneck.

For example, if an application is interacting with a DB2® database and response is found to be poor, the user can write a D script to track the time spent in the application and time spent while doing database operations.

A D script similar to following can be used for this purpose:

pid$target::sqlastrt:entry
{
self->time=timestamp;
}
pid$target::sqlastop:entry
/self->time != 0/
{
  printf("Time spent in sql call %d nsecs\n",timestamp - self->time);
  self->time=0;
}

Here the self identifier declares a thread local variable time. Each thread in the application will have its own local variable time, thus enabling each application thread to track its time.


Examples of AIX ProbeVue usage

The following Probe script uses the User Function Entry probe to track malloc routines and display the size of allocation.

      #!/usr/bin/probevue
      char * malloc(int size);
      @@uft:$1:*:malloc:entry
      {
          printf("malloc(%d)\n",__arg1);
      }

Invoke this script-passing process ID of the process for which you wish to trace malloc calls.

__arg1,__arg2,__arg3 and so on are the built-in entry class variables that refer to parameters passed to the function being traced. $1 gets replaced by the process ID passed to the script.

#malloc.d <process id>

To find out the call stack for the malloc call, stktrace() probe function can be called in the previous script.


Conclusion

This article provided a very brief introduction of dynamic tracing facilities with examples demonstrating the use of a few probes. There are many more probes available in Solaris and AIX systems that are helpful in diagnosing problems on production systems. The complete list of all the probes and their usage can be found in the respective manuals.


Resources

About the author

Manjunath Karikatti is a Software Engineer IBM at the IBM India Software Labs. He is currently working on the TXSeries development team.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=415664
ArticleTitle=Dynamic tracing facility
publish-date=07212009
author1-email=mkarikat@in.ibm.com
author1-email-cc=mmccrary@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers