Advanced features of IBM Rational Purify: Customizing Purify instrumentation and reporting

Using Purify options and directives to customize it for your application

IBM® Rational® Purify® is a tool to accurately detect memory corruption errors, which are otherwise very difficult to analyze and fix. In this article, you will learn about how to use Purify options and directives to customize Purify to suit the needs of your application.

Satish Chandra Gupta (satish.gupta@acm.org), Senior Software Engineer, IBM

Satish Chandra GuptaSatish Chandra Gupta is a programmer and loves building software engineering and programming tools. His interests include performance (CPU time, memory, energy) profiling tools, compilers, programming languages, type theory, software engineering, and software development environments. While at IBM, he was architect for UML Action Language tooling in Rational Software Architect, and tech lead for Rational PurifyPlus on AIX and Java leak detection tooling in Rational Application Developer. You can keep up with him through his Twitter and Google+ feeds.



Anand Gaurav (anand.gaurav@in.ibm.com), Programmer, PurifyPlus, IBM Japan

Anand GauravAnand Gaurav is a developer in the IBM Rational PurifyPlus group in Bangalore, India. His interests are in the areas of runtime analysis, object-oriented design, data structures, and algorithms. He earned his B.E from PESIT (VTU), Karnataka, India.



26 February 2008

IBM® Rational® Purify® is an advanced memory error-detection tool that helps you quickly and accurately isolate memory corruption errors. After you instrument your application by using Purify, when you run the instrumented application, Purify scrutinizes every memory access and reports any corruption error before it occurs. You can learn about various types of memory errors and how to use Purify to detect them in this previous article: Navigating "C" in a "leaky" boat? Try Purify.

In addition to creating a memory error report, Purify includes several advanced features that enable you to fine-tune and customize how you use Purify in a complex software development environment. In this article, you will first get an overview of Purify options and directives. Then you will learn how to use them to customize following aspects of Purify:

  • Cache management: Management and sharing of instrumented binary files
  • Partial Purify: Instrumenting only part of your application to reduce execution time overhead and scope of error detection
  • Heap memory management: Limiting memory overhead in running instrumented application and customizing the heap use report

Purify options

Purify provides a rich set of options that give you fine-grained control in using Purify. Options are of two types:

  • Build-time options
  • Runtime options

Build-time options must be used at the time of instrumentation. For example, if you don't want Purify to do buffer overrun checking on the static data in your program, you can use the -static-checking option while instrumentating Purify:

$ purify -static-checking=no cc your_prog.c

The runtime options affect, as the name suggests, the runtime behavior of the instrumented program. For example, you can tell Purify to show longer stack call chains in the errors being reported:

$ purify -chain-length=10 cc your_prog.c

In addition to appearing as flags in the link line, both build-time and runtime options can be specified by using PUREOPTIONS and PURIFYOPTIONS environment variables.

  • The PUREOPTIONS environment variable applies to Purify, as well as other PurifyPlus products, namely Quantify and PureCoverage.
  • The PURIFYOPTIONS environment variable applies only to Purify.

You can set the previously mentioned options as following in sh, ksh, or bash shells:

$ export PURIFYOPTIONS="-static-checking=no -chain-length=10"

On csh or tcsh shells, you can use following option:

% setenv PURIFYOPTIONS "-static-checking=no -chain-length=10"

For build-time options, values specified in the link line override the values of the same option in the environment variable. Purify stores any runtime options specified during instrumentation (through the link line or environment variable) in the instrumented program, and they are used when you run the program. At the time of execution, any runtime option specified in the environment variable overrides the value of the same option stored in the instrumented program (unless the -ignore-runtime-environment build time option has been used).

Alternatively, you can specify runtime options through the Purify GUI, if you prefer. In that case, these values override the values of the same options specified through the environment variables and link line.

Purify offers several build-time options to access Help and version strings:

$ purify -version
$ purify -usage
$ purify -help
$ purify -onlinehelp

Another handy build-time option is -print-home-dir, which prints the name of the directory where Purify is installed. You can use it, for example, to run the purify_what_options script to find out what options were used during instrumentation:

$ `purify -print-home-dir`/purify_what_options <your_prog.pure>

Another interesting use of it is when you compile a program that includes purify.h to access Purify APIs. The header file appears in the Purify home directory, and this command will include that directory in the compiler's include-file search path:

$ cc -c -I`purify -print-home-dir` your_prog.c

Purify directives

In addition to command-line options, Purify allows you to specify various directives to fine-tune instrumentation and error reporting. Directives help you in customizing Purify use for your project. For example, consider that there is a memory error (say, uninitialized memory read, or UMR) in one of the functions (foo, for example) in a third-party library (such as libfoo) that you use in your project. You can't fix that error, and perhaps can't do anything more than report the error to the vendor and wait for the fix. If you don't want to see that error in the Purify report. because you can't fix it, you can specify a suppress directive:

suppress umr foo

This instructs Purify not to show any UMR in the foo function. To see errors that are hidden by suppress directives, in the Purify GUI, from View menu, click Suppressed. Purify allows you to be as specific or as generic as you want. Here are a few examples:

  • To suppress a UMR only in a specific call chain, you can provide a full call chain:
    suppress umr printf; foo; bar; main
  • To suppress UMRs with call chain matching a partially specified call chain, where the ellipsis (...) means any number of occurrences of functions without any restriction on the name of the function:
    suppress umr ...; foo; ...; main
  • To suppress all UMRs in a (third-party) library, where the asterisk (*) means any number of occurrences of any character:
    suppress umr "libfoo*"
  • To suppress all UMRs occurring anywhere (which is obviously very risky):
    suppress umr *

Purify processes directives specified in files named .purify in these locations and in this order:

  1. <purify-installation-home>/.purify
  2. <users-home-dir>/.purify
  3. <current-working-directory>/.purify

Depending on whether your customization is for all Purify users or for all of your projects or only one of your projects, you can edit one of those files to add your directives.

Alternatively, you can select one of the errors in the Purify GUI that you want to suppress, right-click to get the pop-up menu, and then select Suppress from the menu. That will display the dialog shown in Figure 1, where you can specify error type, call chain, and so forth. You can make a suppression permanent by specifying the appropriate .purify file location and clicking the button labeled Make permanent.

Figure 1. Purify Suppression dialog
Purify Suppression dialog screen capture

Another directive, kill, works exactly like suppress, except that the messages are not available in the GUI, even if you use the View > Suppressed Messages menu. If an error occurs many times (thousands or tens of thousands), the program will run faster if you kill it instead of suppressing it.

Cache management

When you instrument your program, Purify also instruments all libraries that your program depends on directly or indirectly (such as libc), and binds your program to these instrumented libraries instead of the original libraries. If the directory containing the library is writable, Purify creates the instrumented version of the library in the same directory. Otherwise, it creates it in a default cache directory in the Purify installation area.

There are various build-time options related to the cache. For example, if you don't want the instrumented libraries to be littered in the various directories where the originals appear, and, instead, you want these to be stored in the cache directory, you can use the -always-use-cache-dir option. You can also specify a cache directory of your choice instead of using the default directory by using this option: -cache-dir=<dir-name>. By using these options, you can remove all instrumented libraries from the cache by simply deleting the cache directory that you have specified.

When you instrument a program, Purify instruments a library only if it has been modified since the last instrumentation or if it can't find the library's instrumented version. This helps reduce the instrumentation time. If you want to discard previously instrumented versions of the libraries, you can use the -force-rebuild build-time option, and Purify will reinstrument all needed libraries. This is helpful when you decide to use advanced instrumentation-time options (such as -static-checking-guardzone) and want to force reinstrumentation.

In general, you cannot instrument a program on one machine and then run the instrumented program on another machine. The reason is very simple. Typically, your program depends on system libraries, such as libc, and these libraries could be different on different machines, based on the patch level of various operating system packages. To be sure that each program gets the instrumented libraries corresponding to the original libraries on the same system, Purify's caching mechanism organizes instrumented libraries based on host name. In addition, your program may also depend on some of your libraries or third-party libraries that are the same on all of your machines or even accessed from the same network location. In such a situation, you may not want the cache to have multiple instrumented versions of these libraries for each host name, because these copies will be identical and will simple waste storage capacity. You can avoid this by using the repure mechanism of Purify on IBM® AIX®, Linux® and Solaris® UNIX® platforms, provided that you take care of a few things during instrumentation:

  1. First, pick a path that exists on all machines, yet the actual storage for the path is local to each machine. For example, the /tmp directory exists on all UNIX® machines, but it is local to each machine and not shared between two machines. Tip: Do not use paths such as your home directory or other locations that are visible to other machines through the Network Information Service (NIS) or the Network File System (NFS).
  2. Use the -local-cache-dir option to tell Purify to store system-specific instrumented libraries in a path such as this one:
    $ purify -always-use-cache-dir -local-cache-dir=/tmp -cache-dir=./cache \
          cc -g test.c -o test.pure
  3. You can run the instrumented program on the same machine:
    $ ./test.pure
  4. If you want to run the instrumented program on another machine, just "repurify" the program on the other machine by using repure:
    $ repure ./test.pure
  5. After using repure, you can run the program on the other machine also:
    $ ./test.pure
  6. Repeat the repure step (Step 4) for each machine on which you wish to run the instrumented program.

You can clean up old instrumented files by using the pure_remove_old_files script located in the Purify installation's home directory:

`purify -print-home-dir`/pure_remove_old_files <path> <days>

This command only removes instrumented copies of libraries: files whose names contain _pure_ and end with shared-library extensions like .so or .sl. For example, you can clean all instrumented files older than 14 days and stored anywhere in the file system by:

$ pure_remove_old_files / 14

Partial Purify

Purify works by instrumenting code and inserting new instructions at interesting locations to check for invalid memory accesses. Checking and maintaining the data needed for checking results in performance overhead. Sometimes, you may be interested in finding memory errors only in selected components of your application. For example, you may not be interested in memory errors in a third-party shared library that your application uses. On such occasions, you can selectively instrument your application and exclude shared libraries that are not of interest to you then. This mechanism lets you to instrument important components of your code even if your application is very large, yet you want to reduce the overhead time in running the instrumented application.

Note:
Selective instrumentation works only on HP-UX and AIX platforms. There is limited support for it on Solaris SPARC and none on Solaris x86 or Linux.

When you exclude a shared library, Purify still detects heap management errors, such as memory leaks, for the memory allocated in excluded shared libraries.

Here is how you can exclude a library by using selective instrumentation:

$ purify -selective -exclude-libs=libfoo1.so:libfoo2.so cc -g app.c \
      -o a.out.pure -lfoo1 -lfoo2 -lbar

You can use the -exclude-libs option to provide a colon-separated list of libraries that you want to exclude. Alternatively, you can use an exclude directive in the .purify directive file:

exclude libfoo*

As explained earlier, the directive interprets the asterisk (*) as any number of occurrences of any character.

The -selective option causes Purify to use more robust algorithms to detect and eliminate spurious errors that may arise due to exclusion of some libraries. For example:

  • Let's say that main() calls foo(), and foo() calls bar(), and you exclude foo() from instrumentation. Let's also assume that main() allocates a memory, foo() initializes that memory, and bar() uses that memory. Given that foo() is not instrumented, Purify will not know that the memory has been initialized in foo(), and when the memory is used in bar(), it may report an uninitialized memory read (UMR) error. The -selective option tells Purify to work harder to eliminate such spurious errors
  • Excluding a library may also result into Purify missing some errors, even in the instrumented code. Let's say that main() calls foo() and passes an uninitialized buffer to it. Purify will report a UMR for every use of the buffer in function foo(). Let's say that foo() is in a shared library libfoo.so, and you exclude it. In that case, because foo() is not instrumented, it will not have any memory checking code that Purify inserts, and no error will be reported, even though the error lies in the instrumented code, because it is the instrumented function main() that has erroneously passed an uninitialized buffer to foo().

These two examples highlight the tradeoff between reducing the execution time overhead in running the instrumented version and the comprehensiveness of error detection. Therefore, you should judiciously decide which shared libraries to exclude by carefully examine the role that a library plays in your application and the nature of its memory-related interaction with the application.

Purify can detect buffer overrun errors even in uninstrumented code, although not when the error occurs but later, when the buffer is freed. Purify reports errors as and when they occur, but because Purify doesn't get the opportunity to insert checks in the uninstrumented code, it cannot report any error there. If you use the -late-detect-logic option, Purify will perform additional checks when a heap memory block is freed. If it detects a buffer overrun, it will report an Array Bound Write Late (ABWL) error.

Heap memory management errors

Purify offers several options to control memory and time overhead in running an instrumented program. When your program frees a memory block, Purify does not free that memory immediately. Instead, it appends the freed block to a First In-First Out (FIFO) queue. This lets Purify detect and report dangling pointers where your program accesses memory after it gets freed (Free Memory Read/Write, or FMR/FMW errors). If Purify allowed the block to be freed directly back to the heap, the memory might be reused right away and Purify wouldn't be able to tell the difference between an invalid access to the old block and a valid access to the new block.

The default value of the queue length is 100. When the queue gets full, Purify frees the first block in the queue and adds the newly freed block to it. You can change the queue length by using the -free-queue-length=<value> option. A longer queue length extends the time that dangling pointers to freed memory will be reliably detected and reported. However, a longer free queue also causes your program to use more memory (real or virtual) because free-memory actions are delayed. Thus, longer queue lengths increase the likelihood of detecting dangling pointers, but at the cost of higher memory overhead.

The free queue is used only for small blocks, meaning those under a certain threshold. Large blocks are freed directly back to the heap to avoid the risk of using too much memory. You can use the -free-queue-threshold=<value> option to specify the threshold value for putting a memory block in the queue. The default value is 10000 bytes. Any memory block larger than this value is freed immediately.

Purify reports all memory leaks at the end of the program. If you also want to see all memory blocks that are in use, you can specify that by using the -inuse-at-exit=yes option. Similarly, to find out the file descriptors in use at the end of the program, you can use -fds-inuse-at-exit=yes option.

On the AIX system, if you are interested only in memory leaks, you can use the -memory-leaks-only option, and Purify will do very lightweight instrumentation, merely to detect memory leaks and other heap management errors, such as the Freeing Memory Mismatch (FMM) error. Your program will run much faster because Purify is not doing its usual verification on every memory access.

Summary

In this article, you learned about Purify options and directives and how to use them to manage your Purify cache, execution time overhead, and the heap memory footprint of an instrumented application. This will help you in customizing Purify to serve the needs of your application.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=290691
ArticleTitle=Advanced features of IBM Rational Purify: Customizing Purify instrumentation and reporting
publish-date=02262008