Avoiding memory leaks in POSIX thread programming

Tips for detecting and avoiding POSIX thread memory leaks

POSIX thread (pthread) programming defines a standard set of C programming language types, functions, and constants—and pthreads provide a powerful tool for thread management. To use pthreads to the fullest, you'll want to avoid the common mistakes. One common mistake is forgetting to join joinable threads, which can create memory leaks and cause extra work. In this tips-oriented article, learn the basics of POSIX threads, see how to identify and detect thread memory leaks, and get solid advice for avoiding them.

Wei Dong Xie (xieweid@cn.ibm.com), IBM Systems Director Product Engineer, IBM

Author photo of Wei Dong XieFor the past 3 years, Wei Dong has worked as a Product Engineer for IBM Systems Director with the responsibility of fixing issues reported by customers. Before he joined IBM, Wei Dong did a 10-month internship at Intel as a Linux developer. In 2007, he graduated from Nanjing University, China, with an MS degree.



25 August 2010

Also available in Russian Japanese Spanish

Introduction to POSIX threads

The main reason to use threads is to boost program performance. Threads can be created and managed less operating system overhead and fewer system resources. All threads within a process share the same address space, which makes communication among threads more efficient and easier to implement than communication among processes. For example, if one thread is waiting for an input/output system call to complete, the others can be working on CPU-intensive tasks. With threads, important tasks can be scheduled to take precedence over—and even interrupt—lower-priority tasks. Infrequent and sporadic tasks can be sandwiched in between regularly scheduled tasks, creating scheduling flexibility. And finally, pthreads are ideal for parallel programming on multiple-CPU machines.

And the main reason to use POSIX threads, or pthreads, is even simpler: As part of the standardized C language threads programming interface, they are highly portable.

POSIX thread programming has many benefits, but if you're not clear about some basic rules, you run the risk of writing hard-to-debug code and creating memory leaks. Let's start by reviewing POSIX threads, which can be either joinable threads or detached threads.

Joinable threads

If you want to produce a new thread and you need to know how it is terminated, then you need a joinable thread. For joinable threads, the system allocates private storage to store thread termination status. The status is updated after the thread terminates. To retrieve the thread termination status, call pthread_join(pthread_t thread, void** value_ptr).

The system allocates underlying storage for each thread, including stack, thread ID, thread termination status, and so on. This underlying storage will remain in the process space (and not be recycled) until the thread has terminated and has been joined by other threads.

Detached threads

Most of time, you just create a thread, assign some task to it, and then continue to process other affairs. In these cases, you don't care how the thread terminates, and a detached thread is a good choice.

For detached threads, the system recycles its underlying resources automatically after the thread terminates.


Recognizing leaks

If you create a joinable thread but forget to join it, its resources or private memory are always kept in the process space and never reclaimed. Always join the joinable threads; by not joining them, you risk serious memory leaks.

For example, a thread on Red Hat Enterprise Linux (RHEL4), needs a 10MB stack, which means at least 10MB is leaked if you haven't joined it. Say you design a manager-worker mode program to process incoming requests. More and more worker threads need to be created, perform individual tasks, and then terminate. If they are joinable threads and you haven't called the pthread_join() to join them, each produced thread will leak a sizeable amount of memory (at least 10MB per stack) after its termination. The size of leaked memory continuously increases as more and more worker threads are created and terminated without being joined. Further, the process will fail to create any new threads since no memory is available for creating new ones.

Listing 1 shows the serious memory leak created if you forget to join joinable threads. You can also use this code to check the maximum number of thread bodies that can co-exist in one process space.

Listing 1. Creating a memory leak
#include<stdio.h>
#include<pthread.h>
void run() {
   pthread_exit(0);
}

int main () {
   pthread_t thread;
   int rc;
   long count = 0;
   while(1) {
      if(rc = pthread_create(&thread, 0, run, 0) ) {
         printf("ERROR, rc is %d, so far %ld threads created\n", rc, count);
         perror("Fail:");
         return -1;
      }
      count++;
   }
   return 0;
}

In Listing 1, pthread_create() is called to create a new thread with a default thread attribute. By default, the new created is joinable. It creates new joinable threads ceaselessly until failure happens. Then the error code and failure reason are printed out.

When you compile the code in Listing 1 on Red Hat Enterprise Linux Server release 5.4 with this command: [root@server ~]# cc -lpthread thread.c -o thread, you get the results shown in Listing 2:

Listing 2. Memory leak results
[root@server ~]# ./thread
ERROR, rc is 12, so far 304 threads created
Fail:: Cannot allocate memory

After the code created 304 threads, it failed to create more. The error code is 12, which means no more memory.

As demonstrated in Listing 1 and 2, joinable threads are produced, but they are never joined, so each terminated joinable thread still occupies the process space, leaking the process memory.

A POSIX thread on RHEL has a private stack with a size of 10MB. In other words, the system allocates at least 10MB of private storage for each pthread. In our example, 304 threads were produced before the process stopped; these threads occupy 304*10MB memory, around 3GB. The size of virtual memory for a process is 4GB with one quarter of the process space reserved for the Linux kernel. Add that up and you get 3GB memory space for user space. Thus, the 3GB memory is consumed by dead threads. That's a serious memory leak. And it's easy to see how it happened so quickly.

You can fix the leak by adding code to call pthread_join(), which joins each joinable thread.


Detecting leaks

Just as in other memory leaks, the problem may not be obvious when the process is started. So here's a way to detect such problems without needing to access source code:

  1. Count the number of thread stacks in the process. That includes the number of running active threads and terminated threads.
  2. Count the number of active running threads in the process.
  3. Compare the two. If the number of the existing thread stacks is greater than the number of active running threads, and the dispersion of these two numbers keeps increasing as the program continues running, then memory is leaking.

And most likely, such a memory leak is caused by a failure to join the joinable threads.

Use pmap to count thread stacks

In a running process, the number of thread stacks is equal to the number of thread bodies in the process. Thread bodies consist of active running threads and dead joinable threads.

pmap is a Linux tool used to report on the process memory. Combine the following commands to get the number of thread stacks:

[root@server ~]# pmap PID | grep 10240 | wc -l

(10240KB is the default stack size on Red Hat Enterprise Linux Server release 5.4.)

Use /proc/PID/task to count active threads

Every time a thread is created and running, an entry is populated into /proc/PID/task. When the thread terminates, whether joinable or detached, the entry is removed from /proc/PID/task. So the number of active threads can be obtained by running:

[root@server ~]# ls /proc/PID/task | wc -l.

Compare outputs

Check the output of pmap PID | grep 10240 | wc -l and compare it to the output of ls /proc/PID/task | wc -l. If the number of all thread stacks is greater than the number of active threads, and their dispersion continues growing as the program keeps running, you can conclude that the leak problem does exist.


Preventing leaks

Joinable threads should be joined during programming. If you are creating joinable threads in your program, don't forget to call pthread_join(pthread_t, void**) to recycle the private storage allocated to the thread. Otherwise, you'll introduce serious memory leaks.

After programming and during the test phase, you can use the pmap and /proc/PID/task to detect whether such leaks exist. If the leak exists, check the source code to see if all joinable threads have been joined.

And that's it. A small amount of prevention will save you later work and embarrassing memory leaks.

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=513253
ArticleTitle=Avoiding memory leaks in POSIX thread programming
publish-date=08252010