 | Level: Intermediate Satish Chandra Gupta (satish.gupta@in.ibm.com), Programmer, RAD/PurifyPlus, Rational Software, IBM Anand Gaurav (anand.gaurav@in.ibm.com), Programmer, PurifyPlus,
IBM
26 Feb 2008 IBM® Rational® Purify® is a tool to accurately
detect memory corruption errors, which are otherwise very difficult to analyze and
fix. In this article, you will learn about how to use Purify options and directives
to customize Purify to suit the needs of your application.
IBM® Rational® Purify® is an advanced memory error-detection
tool that helps you quickly and accurately isolate memory corruption errors. After
you instrument your application by using Purify, when you run the instrumented
application, Purify scrutinizes every memory access and reports any corruption
error before it occurs. You can learn about various types of memory errors
and how to use Purify to detect them in this previous article:
Navigating
"C" in a "leaky" boat? Try Purify.
In addition to creating a memory error report, Purify includes several advanced
features that enable you to fine-tune and customize how you use Purify in a
complex software development environment. In this article, you will first get an
overview of Purify options and directives. Then you will learn how to use them to
customize following aspects of Purify:
-
Cache management: Management and sharing of instrumented binary files
-
Partial Purify: Instrumenting only part of your application to reduce
execution time overhead and scope of error detection
-
Heap memory management: Limiting memory overhead in running instrumented
application and customizing the heap use report
Purify options
Purify provides a rich set of options that give you fine-grained control in
using Purify. Options are of two types:
- Build-time options
- Runtime options
Build-time options must be used at the time of instrumentation. For example, if
you don't want Purify to do buffer overrun checking on the static data in
your program, you can use the
-static-checking
option
while instrumentating Purify:
$ purify -static-checking=no cc your_prog.c
|
The runtime options affect, as the name suggests, the runtime behavior of the
instrumented program. For example, you can tell Purify to show longer stack call
chains in the errors being reported:
$ purify -chain-length=10 cc your_prog.c
|
In addition to appearing as flags in the link line, both build-time and runtime
options can be specified by using PUREOPTIONS and PURIFYOPTIONS environment
variables.
- The PUREOPTIONS environment variable applies to Purify, as well as other
PurifyPlus products, namely Quantify and PureCoverage.
- The PURIFYOPTIONS
environment variable applies only to Purify.
You can set the previously mentioned
options as following in sh, ksh, or bash shells:
$ export PURIFYOPTIONS="-static-checking=no -chain-length=10"
|
On csh or tcsh shells,
you can use following option:
% setenv PURIFYOPTIONS "-static-checking=no -chain-length=10"
|
For build-time options, values specified in the link line override the values of
the same option in the environment variable. Purify stores any runtime options
specified during instrumentation (through the link line or environment variable) in
the instrumented program, and they are used when you run the program. At the
time of execution, any runtime option specified in the environment variable
overrides the value of the same option stored in the instrumented program (unless
the
-ignore-runtime-environment
build time option has been used).
Alternatively, you can specify runtime options through the Purify GUI, if you prefer. In
that case, these values override the values of the same options specified through
the environment
variables and link line.
Purify offers several build-time options to access Help and version strings:
$ purify -version
$ purify -usage
$ purify -help
$ purify -onlinehelp
|
Another handy build-time option is
-print-home-dir
,
which prints the name of the directory where Purify is installed. You can use it,
for example, to run the
purify_what_options
script to
find out what options were used during instrumentation:
$ `purify -print-home-dir`/purify_what_options <your_prog.pure>
|
Another interesting use of it is when you compile a program that includes
purify.h to access Purify APIs. The header file appears
in the Purify home directory, and this command will include that directory in the
compiler's include-file search path:
$ cc -c -I`purify -print-home-dir` your_prog.c
|
Purify directives
In addition to command-line options, Purify allows you to specify various
directives to fine-tune instrumentation and error reporting. Directives help you
in customizing Purify use for your project. For example, consider that there is a
memory error (say, uninitialized memory read, or UMR) in one of the functions
(foo, for example) in a third-party library (such as
libfoo) that you use in your project. You
can't fix that error, and perhaps can't do anything more than
report the error to the vendor and wait for the fix. If you don't want to
see that error in the Purify report. because you can't fix it, you can
specify a suppress directive:
This instructs Purify not to show any UMR in the foo
function. To see errors that are hidden by
suppress directives, in the Purify GUI, from View
menu, click Suppressed. Purify allows you to be as specific
or as generic as you want. Here are a few examples:
- To suppress a UMR only in a specific call chain, you can provide a full call
chain:
suppress umr printf; foo; bar; main
|
- To suppress UMRs with call chain matching a partially specified call chain,
where the ellipsis (...) means any number of occurrences of functions without
any restriction on the name of the function:
suppress umr ...; foo; ...; main
|
- To suppress all UMRs in a (third-party) library, where the asterisk (*) means
any number of occurrences of any character:
- To suppress all UMRs occurring anywhere (which is obviously very risky):
Purify processes directives specified in files named .purify in these
locations and in this order:
- <purify-installation-home>/.purify
- <users-home-dir>/.purify
- <current-working-directory>/.purify
Depending on whether your customization is for all Purify users or for all of your
projects or only one of your projects, you can edit one of those files to add your
directives.
Alternatively, you can select one of the errors in the Purify GUI that you want
to suppress, right-click to get the pop-up menu, and then select
Suppress from the menu. That will display the dialog shown in Figure 1,
where you can specify error type, call chain, and so forth. You can make a
suppression permanent by specifying the appropriate .purify file location and
clicking the button labeled Make permanent.
Figure 1. Purify Suppression dialog
Another directive, kill, works exactly like suppress, except that the
messages are not available in the GUI, even if you use the View > Suppressed
Messages menu. If an error occurs many times (thousands or tens of
thousands), the program will run faster if you kill it instead of suppressing it.
Cache management
When you instrument your program, Purify also instruments all libraries that
your program depends on directly or indirectly (such
as libc), and binds your program to these instrumented
libraries instead of the original libraries. If the directory containing the
library is writable, Purify creates the instrumented version of the library in the
same directory. Otherwise, it creates it in a default cache directory in the
Purify installation area.
There are various build-time options related to the cache. For example, if you
don't want the instrumented libraries to be littered in the various
directories where the originals appear, and, instead, you want these to be stored
in the cache directory, you can use the
-always-use-cache-dir
option. You can also specify
a cache directory of your choice instead of using the default directory by using
this option:
-cache-dir=<dir-name>
. By
using these options, you can remove all instrumented libraries from the cache by
simply deleting the cache directory that you have specified.
When you instrument a program, Purify instruments a library only if it has been
modified since the last instrumentation or if it can't find the
library's instrumented version. This helps reduce the instrumentation
time. If you want to discard previously instrumented versions of the libraries,
you can use the
-force-rebuild
build-time option, and
Purify will reinstrument all needed libraries. This is helpful when you decide to
use advanced instrumentation-time options (such as
-static-checking-guardzone
) and want to force
reinstrumentation.
In general, you cannot instrument a program on one machine and then run the
instrumented program on another machine. The reason is very simple. Typically,
your program depends on system libraries, such as libc,
and these libraries could be different on different machines, based on the patch
level of various operating system packages. To be sure that each program gets the
instrumented libraries corresponding to the original libraries on the same system,
Purify's caching mechanism organizes instrumented libraries based on host name.
In addition, your program may also depend on some of your libraries or third-party
libraries that are the same on all of your machines or even accessed from the same
network location. In such a situation, you may not want the cache to have multiple
instrumented versions of these libraries for each host name, because these copies
will be identical and will simple waste storage capacity. You can avoid this by
using the
repure
mechanism of Purify on
IBM® AIX®, Linux® and Solaris® UNIX® platforms, provided
that you take care of a few things during instrumentation:
- First, pick a path that exists on all machines, yet the actual storage for
the path is local to each machine. For example, the /tmp directory exists
on all UNIX® machines, but it is local to each machine and not shared between
two machines. Tip: Do not use paths such as your home directory or
other locations that are visible to other machines through the Network
Information Service (NIS) or the Network File System (NFS).
- Use the
-local-cache-dir
option to tell Purify to
store system-specific instrumented libraries in a path such as this one:
$ purify -always-use-cache-dir -local-cache-dir=/tmp -cache-dir=./cache \
cc -g test.c -o test.pure
|
- You can run the instrumented program on the same machine:
- If you want to run the instrumented program on another machine, just
"repurify" the program on the other machine by using
repure:
- After using
repure, you can run the program on the
other machine also:
- Repeat the repure step (Step 4) for each machine on
which you wish to run the instrumented program.
You can clean up old instrumented files by using the
pure_remove_old_files
script located in the Purify
installation's home directory:
`purify -print-home-dir`/pure_remove_old_files <path> <days>
|
This command only removes instrumented copies of libraries: files whose names
contain
_pure_
and end with shared-library extensions
like
.so
or
.sl
.
For example, you can clean all instrumented files older than 14 days and stored
anywhere in the file system by:
$ pure_remove_old_files / 14
|
Partial Purify
Purify works by instrumenting code and inserting new instructions at interesting
locations to check for invalid memory accesses. Checking and maintaining the data
needed for checking results in performance overhead. Sometimes, you may be
interested in finding memory errors only in selected components of your
application. For example, you may not be interested in memory errors in a
third-party shared library that your application uses. On such occasions, you can
selectively instrument your application and exclude shared libraries that are not
of interest to you then. This mechanism lets you to instrument important
components of your code even if your application is very large, yet you want to
reduce the overhead time in running the instrumented application.
Note:
Selective instrumentation works only on HP-UX and AIX
platforms. There is limited support for it on Solaris SPARC and none on Solaris
x86 or Linux.
When you exclude a shared library, Purify still detects heap management errors,
such as memory leaks, for the memory allocated in excluded shared libraries.
Here is how you can exclude a library by using selective instrumentation:
$ purify -selective -exclude-libs=libfoo1.so:libfoo2.so cc -g app.c \
-o a.out.pure -lfoo1 -lfoo2 -lbar
|
You can use the
-exclude-libs
option to provide a colon-separated
list of libraries that you want to exclude. Alternatively, you can use an
exclude
directive in the
.purify
directive file:
As explained earlier, the directive interprets the asterisk (*) as any
number of occurrences of any character.
The
-selective
option causes Purify to use more robust
algorithms to detect and eliminate spurious errors that may arise due to exclusion
of some libraries. For example:
- Let's say that
main() calls
foo(), and foo() calls bar(), and you exclude foo() from instrumentation.
Let's also assume that main() allocates a memory, foo() initializes that
memory, and bar() uses that memory. Given that foo() is not instrumented, Purify will
not know that the memory has been initialized in foo(), and when the memory is
used in bar(), it may report an uninitialized memory read (UMR) error. The
-selective
option tells Purify to work harder to eliminate
such spurious errors
- Excluding a library may also result into Purify missing some errors, even in the
instrumented code. Let's say that
main() calls foo() and passes an
uninitialized buffer to it. Purify will report a UMR for every use of the buffer
in function foo(). Let's say that foo() is in a
shared library libfoo.so, and
you exclude it. In that case, because foo() is not instrumented, it will not have
any memory checking code that Purify inserts, and no error will be reported, even
though the error lies in the instrumented code, because it is the instrumented
function main() that has erroneously passed an
uninitialized buffer to foo().
These two examples highlight the tradeoff between reducing the execution time
overhead in running the instrumented version and the comprehensiveness of error detection.
Therefore, you should judiciously decide which shared libraries to exclude by carefully
examine the role that a library plays in your application and the nature of its
memory-related interaction with the application.
Purify can detect buffer overrun errors even in uninstrumented code, although not
when the error occurs but later, when the buffer is freed. Purify reports errors as
and when they occur, but because Purify doesn't get the opportunity to insert
checks in the uninstrumented code, it cannot report any error there. If
you use the
-late-detect-logic
option, Purify will perform
additional checks when a heap memory block is freed. If it detects a buffer
overrun, it will report an Array Bound Write Late (ABWL) error.
Heap memory management
errors
Purify offers several options to control memory and time overhead in running
an instrumented program. When your program frees a memory block, Purify
does not free that memory immediately. Instead, it appends the freed block to a
First In-First Out (FIFO) queue. This lets Purify detect and report dangling
pointers where your program accesses memory after it gets freed (Free Memory
Read/Write, or FMR/FMW errors). If Purify allowed the block to be freed directly back
to the heap, the memory might be reused right away and Purify wouldn't
be able to tell the difference between an invalid access to the old block and a
valid access to the new block.
The default value of the queue length is 100. When the queue gets full, Purify
frees the first block in the queue and adds the newly freed block to it. You can
change the queue length by using the
-free-queue-length=<value>
option.
A longer queue length extends the time that dangling pointers to freed memory will
be reliably detected and reported. However, a longer free queue also causes your
program to use more memory (real or virtual) because free-memory actions are delayed. Thus,
longer queue lengths increase the likelihood of detecting dangling pointers, but at the
cost of higher memory overhead.
The free queue is used only for small blocks, meaning those under a certain threshold.
Large blocks are freed directly back to the heap to avoid the risk of using too
much memory. You can use the
-free-queue-threshold=<value>
option to
specify the threshold value for putting a memory block in the queue. The default
value is 10000 bytes. Any memory block larger than this value is freed
immediately.
Purify reports all memory leaks at the end of the program. If you also want to
see all memory blocks that are in use, you can specify that by using the
-inuse-at-exit=yes
option. Similarly, to find out the file descriptors in use at the end of the program,
you can use
-fds-inuse-at-exit=yes
option.
On the AIX system, if you are interested only in memory leaks, you can use the
-memory-leaks-only
option, and Purify will do very lightweight
instrumentation, merely to detect memory leaks and other heap management errors, such
as the Freeing Memory Mismatch (FMM) error. Your program will run much faster
because Purify is not doing its usual verification on every memory access.
Summary
In this article, you learned about Purify options and directives and how to use
them to manage your Purify cache, execution time overhead, and the heap memory
footprint of an instrumented application. This will help you in customizing Purify
to serve the needs of your application.
Resources Learn
- Learn debugging with Purify
Advanced features of IBM Rational Purify: Debugging with Purify,
(IBM® developerWorks®, February 2008).
- Read Goran Begic's article to get
An introduction to runtime analysis with Rational PurifyPlus,
(IBM® developerWorks®, November 2003).
- Learn about
different
types of memory errors,
and how to use Rational Purify to detect them.
- Visit the
Rational software area on developerWorks
for technical resources and best practices for Rational Software Delivery Platform
products.
- Subscribe to the
developerWorks Rational zone newsletter.
Keep up with developerWorks Rational content. Every other week, you'll
receive updates on the latest technical resources and best practices for the
Rational Software Delivery Platform.
- Subscribe to the
Rational Edge newsletter
for articles on the concepts behind effective software development.
- Subscribe to the
IBM developerWorks newsletter,
a weekly update on the best of developerWorks tutorials, articles, downloads,
community activities, webcasts and events.
- Browse the
technology bookstore
for books on these and other technical topics.
Get products and technologies
Discuss
About the authors  | 
|  | Satish Chandra Gupta is a developer in the IBM Rational® PurifyPlus® group in Bangalore, India. His interests include compilers, programming languages, runtime analysis, Java memory leaks, type theory, software engineering, and software development environments. His research has been published in ACM/IEEE conferences. He received a B.Tech. from the Indian Institute of Technology in Kanpur (India), and an M.S. from the University of Wisconsin in Milwaukee (USA). |
 | 
|  | Anand Gaurav is a developer in the IBM Rational PurifyPlus group in Bangalore, India. His interests are in the areas of runtime analysis, object-oriented design, data structures, and algorithms. He earned his B.E from PESIT (VTU), Karnataka, India. |
Rate this page
|  |