dbx malloc commands usage in AIX to pinpoint memory leak

Problem:

A memory leak is a condition where a process allocates memory,
but does not free it when it is done with it. This becomes a problem if the process is a long-running process,process keeps on running longer like a daemon - the excess memory usage builds up over time.
This could either result in core dumps/unexpected behaviour of the process or could cause a performance issue if the excessive memory usage results in the system paging out memory to paging space.

Problem Solution:

If any of method being used generates a core dump we need to look at the malloc statistics in dbx.

In this example,I have a program called 'leaktest' which is designed to leak memory. The program simply allocates 1MB repeatedly without freeing it:

---SAMPLE PROGRAM---
void func1();
void func2();void main() {
while(1) func1();
}void func1() {
func2();
}void func2() {
char *str;
str=(char *) malloc(1024*1024);
strcpy(str,"testing");
} ---END---

Running that, it cores rather quickly, and then load the core in dbx

:# dbx ./leaktest ./core
Type 'help' for help.
[using memory image in ./core]
reading symbolic information ...Segmentation fault in . at 0xfc04
0x0000fc04 warning: Unable to access address 0xfc04 from core Inside dbx, run the 'malloc' subcommand to check memory usage statistics:

(dbx) malloc The following options are enabled:

Implementation Algorithm........ Default Allocator (Yorktown)

Statistical Report on the Malloc Subsystem:
Heap 0
heap lock held by................ UNLOCKED
bytes acquired from sbrk()....... 267456272
bytes in the freespace tree...... 65312
bytes held by the user........... 267390960
allocations currently active..... 255
allocations since process start.. 255The Process Heap
Initial process brk value........ 0x20000680
current process brk value........ 0x2ff11590 sbrk()s called by malloc......... 241

That tells that the process currently has ~256MB of memory allocated on the heap - looking at 'bytes held' - and there is not much left on the freespace tree that can be used for new allocations. Also, since the start of this process, 255 memory allocations have been made, and they are all currently active - none of them have been freed. To determine if this amount of memory usage is excessive, you'll have to be aware of what the expected memory usage is for the process that you are examining; some processes may need to allocate and hold a large amount of memory. You'll also need to be aware of process limits and the process MAXDATA value in order to determine if the core dump was likely due to running out of memory on the heap.The process limits can be checked inside of dbx with 'proc rlimit':

(dbx) proc rlimit
rlimit name: rlimit_cur rlimit_max (units)
RLIMIT_CPU: (unlimited) (unlimited) sec
RLIMIT_FSIZE: (unlimited) (unlimited) bytes
RLIMIT_DATA: (unlimited) (unlimited) bytes
RLIMIT_STACK: 4294967296 4294967296 bytes
...etc...In this case, the data limit is set to unlimited, so there is no constriction on usable heap memory there.The MAXDATA value can be checked outside of dbx, with the 'dump' command:# dump -Xany -ov ./leaktest./leaktest: ***Object Module Header***
# Sections Symbol Ptr # Symbols Opt Hdr Len Flags
5 0x00000b2c 149 72 0x1002
Flags=( EXEC DYNLOAD DEP_SYSTEM )
Timestamp = "Aug 21 12:03:48 2018"
Magic = 0x1df (32-bit XCOFF) ***Optional Header***
Tsize Dsize Bsize Tstart Dstart
0x0000041c 0x00000110 0x00000004 0x10000150 0x2000056cSNloader SNentry SNtext SNtoc SNdata
0x0004 0x0002 0x0001 0x0002 0x0002TXTalign DATAalign TOC vstamp entry
0x0007 0x0003 0x20000634 0x0001 0x20000620maxSTACK maxDATA SNbss magic modtype
0x00000000 0x00000000 0x0003 0x010b 1LIn this case, we see that 'leaktest' is a 32-bit process with a MAXDATA value of 0x00000000.
MAXDATA of 0x00000000 is the default setting, which means for a 32-bit process, the heap and stack share a single 256MB memory segment (the 0x2 segment). The heap starts at the lower end of the 0x2 segment, at around 0x20000000, and works its way up. The stack starts at the upper end of the segment, around 0x2fffffff, and works its way down. They are in danger of colliding if either grows too large.
MAXDATA of 0x10000000 means the heap and stack are separated into different segments (stack in 0x2 and heap in 0x3), and that the heap is allowed a full 256MB segment to itself. MAXDATA of 0x20000000 means that the heap and stack are separated, and that the heap gets two 256MB segments to itself (0x3 and 0x4) - and so on.For this example, I can see that with the MAXDATA value for 'leaktest' at the default, that the process has used up the entire 256MB available in segment 0x2 while allocating memory, and either 1) the heap has crashed into the stack and cored because of that, or 2) it failed to allocate more memory once it reached the limit, and cored when trying to work on the resulting null memory pointer. As long as this process is not expected to use this much memory, I can be confident that a leak has occurred.64-bit processes do not face the same constraints or need to set MAXDATA, since the addressing space is much larger. The process data limit would be the constricting factor in memory usage.If I want to view the memory allocation tables for this process, which shows me all of the active (not yet freed) allocations that the process has made, I can do so in dbx with the 'malloc allocation' subcommand.

For example:

(dbx) malloc allocation Allocations Held by the Process:

ADDRESS         SIZE      HEAP    ALLOCATOR
0x20000688      1048584    0         YORKTOWN
0x20100698      1048584 0         YORKTOWN
0x202006a8      1048584    0         YORKTOWN
0x203006b8      1048584    0         YORKTOWN
0x204006c8      1048584    0         YORKTOWN
0x205006d8      1048584    0         YORKTOWN
0x206006e8      1048584    0         YORKTOWN ...etc...

That tells me that a memory allocation exists at address 0x20000688, it is 1MB in size, it is on Heap 0, and was allocated using the default Yorktown memory allocator. Another allocation exists at 0x20100698, and so on.
This information is not particularly useful in our diagnosis - all it does is that it makes us aware that there were many 1MB allocations active at the time of the core dump. It does not tell me what made these allocations.The more useful information can be found by turning on MALLOCDEBUG memory allocation logging.

The settings I typically use are:export MALLOCDEBUG=log:extended,stack_depth:20
<start process> unset MALLOCDEBUG

The 'stack_depth' can be adjusted as needed - a larger value requires more memory overhead, but is able to show more elements of the stack, in case the process has the potential for a deep stack. The 'extended' option is not necessarily needed, but provides extra information such as the time the allocation was made and the ID of the thread that made the allocation, which could be useful in a multithreaded application.

With my MALLOCDEBUG settings shown above, the dbx 'malloc allocation' output gives much more useful information.

Now (dbx) malloc allocation
Allocations Held by the Process: ADDRESS SIZE HEAP PID PTHREAD_T CLOCKTIME SEQ STACK TRACEBACK
0x2001f688 1048576 0 6619390 0x00000000 1534872877 0 0xd01b1854 malloc_common_debugging
0xd01111ec init_malloc
0xd0112f4c malloc
0x1000039c func2
0x10000410 func1
0x10000450 main
0x100001bc __start
0x2011f698 1048576 0 6619390 0x00000000 1534872877 1 0xd01b1854 malloc_common_debugging
0x1000039c func2
0x10000410 func1
0x10000450 main
0x100001bc __start
0x2021f6a8 1048576 0 6619390 0x00000000 1534872877 2 0xd01b1854 malloc_common_debugging
0x1000039c func2
0x10000410 func1
0x10000450 main
0x100001bc __start
0x2031f6b8 1048576 0 6619390 0x00000000 1534872877 3 0xd01b1854 malloc_common_debugging
0x1000039c func2
0x10000410 func1
0x10000450 main 0x100001bc __start...etc...This tells me that each of these 1MB allocations were made when the main() function called func1(), which then called func2(), which then performed the memory allocation. Knowing this, we can now decide: should func2() be freeing the memory that it has allocated? Or does it pass this memory address back to one of the previous functions in the stack, which is then responsible for freeing the memory once it is done with it?This is a very simple example, since there are allocations happening in only one spot, and we can identify the culprit with a quick glance. In reality, more effort will be needed in order to identify the offending stack trace or traces responsible for a leak.To make my example a little more life-like, here is the new test program

START OF PROGRAM:
void func1();
void func2();void main() {
char *a,*b,*c,*d;
a=(char *) malloc (128);
b=(char *) malloc (64);
c=(char *) malloc (128);
d=(char *) malloc (256);
while(1) func1();
free(a);free(b);free(c);free(d);
}void func1() {
func2();
}void func2() {
char *str;
str=(char *) malloc(1024*1024);
strcpy(str,"testing");
} ---END---

This is the same as above but I've added in a couple of 'non-leaked' allocations of a,b,c, and d, which would be freed if we were to get past the 'while' loop. This is still a very simple example, but will work to get my point across.

Reference:

https://www.ibm.com/developerworks/aix/library/au-mallocdebug.html

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Tilak Nayak
Operating System: AIX and VIOS
Hardware: Power
Feedback: aix_feedback@wwpdl.vnet.ibm.com,tilak-r.nayak@in.ibm.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Share your feedback

Need support?