Memory profiling for C/C++ with IBM Rational Test RealTime and IBM Rational PurifyPlus RealTime

IBM Rational Test RealTime and IBM Rational PurifyPlus RealTime are runtime analysis tools that can be tailored to fit any OS/compiler/CPU combination. This article outlines the memory-profiling aspects of the tools and how they work with custom memory management.

Share:

Jeff Campbell, Senior IT Specialist, IBM

Jeff Campbell is Software Engineering Specialist for the IBM Software Group at Rational software. He focuses on helping customers with modeling and testing solutions for the embedded space.



27 April 2004

This article was published in July, 2003, and refers to IBM® Rational® Test RealTime™ and IBM® Rational® PurifyPlus™ v2003

Application testing to identify memory corruption and memory leaks, among other problems, is a must today for companies that are developing software for embedded applications. Faced with challenges having to do with the runtime environment for their software -- such as difficulty finding testing tools supporting their chosen OS/compiler/CPU combination or their customized memory management system -- many of these companies take on the task of implementing their own custom memory-profiling solutions. But there's an opportunity cost to doing this: the development effort and ongoing maintenance require skilled resources that could otherwise be focused on revenue-generating feature-development activities.

Enter the IBM® Rational® Test RealTime™ and IBM® Rational® PurifyPlus RealTime™ software development tools. These tools, which can be tailored to fit virtually any OS/compiler/CPU combination, provide a capability called runtime analysis. With this capability, you can run your application on your embedded target and obtain detailed reports regarding memory usage, as well as performance, source code coverage, and UML sequence diagram tracing. Test RealTime, which offers a superset of the capabilities found in PurifyPlus RealTime, also provides component-testing and system-testing capabilities. This article focuses on the memory-profiling aspects of the tools, outlining their basic operation and how you can tailor them to fit into your own memory management scheme. For more information on the other capabilities and report details, please refer to the product information Web site.

How do PurifyPlus RealTime and Test RealTime work?

The runtime analysis technology works by instrumenting your application source code to capture details necessary for reporting. Generally, your makefile rules are modified to add a step to instrument the code before compilation. This instrumented code (which is created as temporary files, so that the original source code isn't modified) is then compiled and linked with services.

Suppose your original compilation make rule looks something like this:

$(CC) $(CFLAGS) *.c

Then the modified rule might look like this:

attolcc -BLOCK -MEMPRO -PERFPRO -TRACE -- $(CC) $(CFLAGS) *.c

Usually the modifications to the makefile rules are minor, involving only the compilation and final link rules. The makefile rules may not even need to be modified, in fact. For example, if the version of make accepts macro definitions as command line arguments, you could do something like this:

make CC="attolcc -BLOCK -MEMPRO -PERFPRO -TRACE -- cc"....

The modified rules take care of calling the instrumentation engine (attolcc1), running it against the source code, and calling your chosen compiler.

During the instrumentation stage, application source files are first preprocessed, producing new temporary source files. These preprocessed source files are then instrumented with calls into the tool routines. If your source code contains any memory allocation/deallocation statements like malloc or free, these calls are replaced by calls to Rational's own utilities _atp_malloc and _atp_free, which enable the tools to track all memory allocated off the heap. (If you have TestRealTime installed and want to see the implementation of _atp_malloc, you can look at the file ../Test RealTime/Targets/<target port directory>/lib/atp.c.) Likewise, if your C++ code has any new or delete statements, these are replaced by overloaded new and delete operators in the Rational utilities.

Once the application has been successfully built, you deploy it to the target platform and run it as you normally would. For example, you may run your regular series of regression tests against the target. Data captured during this execution is transferred back to the host development machine through whatever means are available. If the target machine supports a file system and FTP services, for instance, FTP could be used to ship the file back to the host; or data could be transferred back via a debugger. There are many possibilities, and this technology can readily be adapted to fit them all. The data is stored in a file and postprocessed on the host to produce the detailed report information.


When Is memory profiling actually performed?

You decide when and how frequently the memory-profiling analysis is performed. If the application terminates at some point, the default behavior is to perform memory profiling just before exiting the application. But quite often, embedded applications are never ending, and in some cases memory profiling may need to be performed periodically. You can handle this by instructing the tool during instrumentation to perform the analysis upon entry or exit from a specific routine or routines.

For example, if an application has routines called Foo and Bar and you want the memory profiling to occur just before either routine is called, you can include a DUMPINCOMING instrumentation parameter in the instrumention/compilation rule, like this:

attolcc -DUMPINCOMING=Foo,Bar -BLOCK -MEMPRO -PERFPRO -TRACE -- $(CC) $(CFLAGS) *.c

Similarly, if you want the analysis to occur after calling Foo or Bar, you can use the parameter DUMPRETURNING:

attolcc -DUMPRETURNING=Foo,Bar -BLOCK -MEMPRO -PERFPRO -TRACE -- $(CC) $(CFLAGS) *.c

You can also use a combination of both parameters, or you can manually invoke the analysis from a debugger or target shell interface by calling the following routine:

_atl_obstools_dump


What memory errors and warnings are returned?

When the current memory profile is analyzed, information related to errors and warnings is returned. The types of application errors that memory profiling can identify are as follows:

  • FFM (freeing freed memory) -- The application is attempting to free already freed memory.
  • FUM (freeing unallocated memory) -- The application is attempting to free a block that wasn't allocated.
  • ABWL (array bounds write, late detect) -- Array bounds corruption has been detected.
  • FMWL (free memory write, late detect) -- The application is attempting to write to previously freed memory.
  • MAF (memory allocation failure)

    And the following are listed as warnings:

  • MLK (memory leak) -- No reference to an allocated block was found, and the block is considered leaked.
  • MPK (potential memory leak) -- A reference to an allocated block was found, but the reference points to somewhere within the block, not its head.
  • MIU (memory in use) -- All blocks that have been allocated and not freed are listed, and handles to these blocks have been found.

How do the tools determine if memory is leaked?

As the application allocates new blocks from the heap, the tools store information about these blocks in a hash table. Later during the memory analysis, the block address(es) found in the hash table are searched for in the following areas:

  • stack data for each thread or task
  • global and statically allocated data
  • other heap-allocated blocks

If no handles to an allocated block are found within one of these areas, the block is considered "leaked."

Incidentally, when it comes to determining memory leaks, it's best to instrument as much of the application as possible to avoid reporting false positives, as you'll see in the following discussion of the areas searched. Applications that are highly componentized may be able to avoid full instrumentation and just instrument individual developer components, but this will depend on the application's architecture.

Searching the stack data

When a routine that's been instrumented is called, the tool captures the value of the current stack pointer. If it's the first time that any instrumented routine has ever been called during this execution run, this is considered the top of the stack (assuming a stack grows down). The area between the bottom and the top of the stack is searched for address pointers that fall within any of the heap-allocated blocks. If a reference is found, the block is marked as referenced and won't be considered leaked.

Note that if you only partially instrument your application, you won't necessarily be searching the entire stack space for references to allocated blocks, and it's possible that some false leaks will be reported.

Searching the global and static data

When source code is instrumented, any global or static variables found in the instrumented source files are added to a list and searched later for references to heap-allocated blocks. During analysis, if data in a global or static variable matches the address of a block, the block is marked as referenced and won't be considered leaked.

Note that if you only partially instrument your application, you won't necessarily be searching all the defined global and static variables for references to allocated blocks, and some false leaks may be reported.

Searching other heap-allocated blocks

During the memory analysis, the allocated blocks listed in our hash table are searched for references to other allocated blocks. If a reference is found, the block is marked as referenced and won't be considered leaked. Again, if you only partially instrument your source code base, blocks may be allocated from the heap without having their corresponding information captured in the hash table, and this could also result in false leaks being reported.


How do the tools detect free queue errors?

Sometimes when an application releases memory back to the heap, not all pointers to the released block have been declared Null and later the freed block is improperly accessed by the application. There are two possibilities: the application could attempt to write to this freed memory area, or it could attempt to read from this area. Test RealTime and PurifyPlus RealTime can help to uncover these problems. I'll describe how the tools work and what happens in either situation.

When an application attempts to release a block, the instrumentation causes the block to be placed on a free queue and painted with a known pattern -- for example, 0xdeadbeef. The size of the free queue is customizable, in terms of both bytes and number of elements. As an example, the maximum free queue size could be set to 1 MB and 1000 elements. When either limit is reached, the oldest block(s) will be released back into the heap until the free queue size is once again below the threshold.

If the application attempts to write to a block on the free queue, it will most likely disturb the painted pattern in the block. Later, during the memory analysis, a FMWL (free memory write, late detect) error will be reported to indicate that this block has been illegally written and the call stack will be reported during the free operation.

If the application attempts to read from a block that's been placed on the free queue, the application could crash because the block has been filled with our pattern -- for example, 0xdeadbeef 0xdeadbeef. In the event that the application does a core dump, you could open the core file with a debugger and check the value of accessed variables or pointers to see if it matches the fill pattern. If it matches, you know the application is likely accessing an invalid freed memory address.


How do the tools track and help detect corruption?

Test RealTime and PurifyPlus RealTime track heap memory usage and can identify corruption on the heap, but these tools aren't intended to identify corruption on the stack. With that caveat, I'll describe how the tools track corruption on the heap and also suggest some strategies for using the tools to detect corruption.

When the application needs to allocate a block and one of Rational's allocation routines is called, the routine returns a little extra memory at the beginning and the end of the allocated block in what we call the red zones, like this:

  • Red Zone
  • User Data
  • Red Zone

Then the red zones are monitored to ensure that your application isn't corrupting memory by writing outside the allotted block space. The red zone size can be tuned for your application. If you have a large amount of heap available, you may want to set the red zone to a relatively large value in order to increase the likelihood that if the application corrupts data, it corrupts a red zone area. If there's not much extra heap space available, you'll have to set the red zone to a lower value.

Block corruption is checked for during the memory analysis stage, and as such it's a late detect style -- that is, the corruption is identified during the analysis, after it's occurred. This is a compromise between providing detailed corruption information and maintaining a relatively small application size and high speed. If these tools were to detect corruption as it occurred, this extremely intrusive behavior would greatly increase the application footprint and slow overall performance. While this might be tolerable on a workstation or server where you can add more memory and/or run on a faster box, embedded targets generally don't have those luxuries.

Finding the source of corruption is a hard job, but it's made a lot easier with the information that Test RealTime and PurifyPlus RealTime can collect from just a single run. A useful strategy is to start by calling the analysis routine _atl_obstools_dump several times during application execution to set bounds around the corruption occurrence. If you can identify a point during the execution when no corruption has occurred and later on another point where there's corrupted data, you can look more closely to understand what's happening between those points. You can further subdivide this range with the analysis routine to pinpoint where the corruption occurred.

The code coverage and tracing capabilities of Test RealTime and PurifyPlus RealTime come in very handy here as well. With tracing enabled, the tools produce detailed UML sequence diagrams during the execution that show the functions and/or methods invoked in the application. Think of the tracing as a pitchfork to help dig in a haystack. Once you identify candidate routines requiring further investigation, you can employ the code coverage reports.

In "test-by-test" mode, you can select which test data is included in a coverage report. Here, a "test" can refer to an invocation of the analysis routine _atl_obstools_dump. This means you can deselect all the coverage analysis data from application startup to the last analysis dump before corruption, as well as the coverage data from analysis dumps after the corruption occurred. This will focus attention on just the code that was executed between the known stable period and the first analysis dump following the corruption.


How do the tools enable custom memory management?

What if your application isn't using standard interfaces like malloc, free, or new and delete? Sometimes developers do a large malloc early during initialization and then manage the parceling of memory themselves. Other common scenarios involve organizing the heap into memory pools or allocating blocks in fixed-size chunks.

A set of APIs in Test RealTime and PurifyPlus RealTime provide you with a tailored memory-profiling solution. You can create a new set of memory allocation/deallocation routines and insert calls to our APIs within those routines.

For example, let's suppose that you wanted to track the memory allocated by your proprietary routine:

MemAlloc(int size, int mempool)

You could create a similarly named routine -- for example, RTRT_MemAlloc(int size, int mempool) -- that would contain calls to the RealTime APIs to track the memory usage as well as a call to the proprietary MemAlloc routine. Anywhere the application attempted to call MemAlloc, that would be replaced with a call to RTRT_MemAlloc. This replacement would be done during the instrumentation. Some #define statements would be included in the instrumented application code, as shown:

#define MemAlloc   RTRT_MemAlloc
#define MemFree   RTRT_MemFree

Any call to MemAlloc in the application code would be replaced with a call to RTRT_MemAlloc. Here's a sample of what could be linked into the final executable:

void * RTRT_MemAlloc(int requested_size, int mempool)
{
   void *offset block_ptr , *grossed_up_block_ptr;
   int  grossed_up_size;

      grossed_up_size =_PurifyLTHeapActualSize(requested_size);

      grossed_up_block_ptr = MemAlloc( grossed_up_size, mempool );

offset_block_ptr = _PurifyLTHeapAction(_PurifyLT_API_ALLOC,
grossed_up_block_ptr, requested_size,0);

      return offset_block_ptr;
}

The routines _PurifyLTHeapAction and _PurifyLTHeapActualSize used above are a couple of the RealTime APIs. The call to _PurifyLTHeapAction is responsible for storing information about the allocated block. Later we'll perform some analysis to understand if this block has been corrupted or leaked. We increased the size of the block to be allocated with the call to _PurifyLTHeapActualSize in order to add red zones around it.

Similarly, any call to MemFree in the application code would be replaced with a call to the RTRT_MemFree routine. Here's a sample of what could be linked into the final executable:

STATUS RTRT_MemFree(int mempool ,char *usr_block_ptr)
{
   return(MemFree(mempool, (char 
      *)_PurifyLTHeapAction(_PurifyLT_API_FREE,usr_block_ptr,0,0)));
}

In this code excerpt, the call to _PurifyLTHeapAction is responsible for removing information about the block from our internal hash table and offsetting back to the beginning of the grossed-up block before releasing with the call to MemFree.


Conclusion

The memory-profiling capabilities provided with Test RealTime and PurifyPlus RealTime can be very helpful in identifying cases of memory corruption, memory leaks, and improper accesses to freed memory. The technology has been developed with flexibility in mind, enabling it to adapt to a variety of instrumentation and execution challenges.

If this is compelling, consider how the same features in Test RealTime can be applied to unit testing software components. Waiting until integration testing to find problems can result in costly delays, and it may be very difficult to test all the paths of execution in your application. Gain greater confidence in the code by finding the problems early, while they are still relatively cheap to fix.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=4560
ArticleTitle=Memory profiling for C/C++ with IBM Rational Test RealTime and IBM Rational PurifyPlus RealTime
publish-date=04272004