Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

IBM Linux Scholar Challenge: Phil and Matt's story

Threadpool: A reusable pool of general purpose threads

Phil Allen (allenpd@swirve.com), Student, Clarkson University
Phil Allen grew up in Malone, NY, and enrolled at Clarkson University in the fall of 1998. At Clarkson, he was a member of numerous organizations including student government, the school paper, and the Clarkson University Honors Program, and was one of the winners of the IBM Linux Scholar Challenge in 2001. He graduated in the Spring of 2002 and will be living in Seattle working as a programmer. You can contact Phil at allenpd@swirve.com.
Matthew Finlayson (finlayms@clarkson.edu), Student, Clarkson University
Matthew Finlayson is a graduate of the Computer Science program at Clarkson University. Currently he is interning at IBM in the Linux Testing and Integration Center for the summer before returning to Clarkson University to pursue his Master's Degree in Computer Science. Aside from his four-year long obsession with the GNU/Linux operating system, he participates in several other hobbies. These include going to the gym, hiking, beating his roommates at Tony Hawk3, playing basketball, and playing the occasional game of chess. You can contact Matt at finlayms@clarkson.edu.

Summary:  This article describes Phil Allen and Matthew Finlayson's winning entry to the IBM Linux Scholar Challenge: a development library that implements a thread pool. This thread pool lets you more quickly create a 'work crew' in a multithreaded application. It takes generally short-lived thread uses and amortizes the cost of creating and destroying the thread over many uses. This article shows that, though programming with threads is tough, some parts can be made relatively easy.

Date:  01 Jun 2002
Level:  Introductory

Activity:  2743 views
Comments:  

Taking the plunge!

Matt: Before going to Clarkson University, neither of us had experience with Linux. At a Clarkson ACM open house, I saw a Linux demo. At the time, I had no idea what it really was. After I started attending Clarkson, I installed and began using Linux. Beginning with basic skills and then moving to more difficult portions, I became proficient in many parts of Linux and played around with things that interested me. Sometime during our freshman year, Phil stopped by my room and saw Linux for the first time. One night I stopped by his room and installed it for him, allowing him to experiment with it.

We both took computer science courses and were in Operating Systems in the spring of our junior year. The class covers a lot of concepts that are fundamental to understanding how a computer works, and it is among the most necessary courses for a computer scientist. Near the end of the course, one of the professors proposed holding a graduate-level seminar in the fall of 2001 relating to operating systems. Phil and I joined the class, along with a few others.

Advanced Topics in Operating Systems was both challenging and rewarding. We had to read many articles about the development of computing, starting with systems like Mach, Hydra, and Multics and ending with Windows NT, VMWare, and AFS. The class covered operating systems, communications, distributed systems, file systems, and others. As a part of the course, all of us signed up for the IBM Linux Scholar Challenge, and eventually we all submitted entries.


Diving into the thread pool

One of the most difficult things about the IBM Linux Scholar Challenge was deciding on a topic. Making modifications to AFS was a possibility; user resource tracking, an excellent idea, was already taken. In the end, Phil and I ended up taking an old homework assignment and revamping it. Instead of tackling a huge problem, we tackled a small one -- but made sure we did it well.

The original assignment was to develop a thread pool, which is a data structure that maintains different threads of execution. A set number of threads are generated when the thread pool is created, and then tasks are sent to the pool through function calls. Each task sent to the thread pool (with a call of dispatch) is executed by one of the threads. If there are more tasks than threads, the requests are placed in a queue in order to be serviced. If there are fewer tasks than threads, idle threads wait on a signal.

This makes a thread pool a 'work crew' of sorts, where each thread is a member of the crew, and the work crew, as a whole, does work for the main thread of the program. The threads themselves do not die, even when they finish a task. This is an advantage because there is a time cost to creating a thread; instead of paying that cost once for each task, the cost is spread over many tasks.


The thread pool library

We built the thread pool library to satisfy the following requirements:

  • It must execute all dispatched functions exactly once. The quality of service requirement (QoS) is that each request be serviced at some point.
  • When dispatch is called, the work to be done is placed into a queue and dispatch returns. The only time that dispatch blocks is when the queue uses at least 64k of heap space and is full; in that case, the act of adding information to the queue will block. This prevents the program from crashing due to lack of heap space.
  • The thread pool cannot have deadlock conditions. Threads can cause deadlock, but the thread pool package itself should not.
  • The thread pool should have no busy-wait loops. A busy-wait loop has the tendency to be slow; in some cases, a two-phase wait is faster than blocking (see Resources); however, programming around busy-wait loops is just as effective.
  • It must include a simple, clean API to allow rapid and painless adoption by programmers.

The only additional requirement on the thread pool is that the programmer cannot execute thread pool functions through the dispatch of thread pool. Any such function calls will immediately return with no code executed.

We used Linux, the C programming language, and the pthreads library to build our library of functions. The functions can be used by other programmers who would like a thread pool in their program, but don't want to write their own. All of the synchronization issues are already taken care of if you use our thread pool, plus it's already written and tested. If you are writing a program that uses a group of threads as a work crew, using our thread pool could save you considerable time and headache. The actual API can be found below (some lines in the example have been split).


The thread pool API

typedef void *threadpool;
typedef void (*dispatch_fn)(void *);

/**
 * create_threadpool creates a fixed-sized thread
 * pool.  If the function succeeds, 
 * it returns a (non-NULL)
 * "threadpool", else it returns NULL.
 */
threadpool create_threadpool(int num_threads_in_pool);

/**
 * dispatch sends a thread off to do some work.  If
 * all threads in the pool are busy, dispatch will
 * block until a thread becomes free and is dispatched.
 * 
 * Once a thread is dispatched, this function returns
 * immediately, except in cases of a heavily loaded server.
 * 
 * The dispatched thread calls into the function
 * "dispatch_to_here" with argument "arg".
 */
void dispatch(threadpool from_me, 
    dispatch_fn dispatch_to_here, void * arg);

/**
 * Works the same as dispatch, but enables the user to
 *  define cleanup
 * handlers in cases of immediate cancel.
 * The cleanup handler function
 * (cleaner_func) is executed automatically
 * with cleaner_arg as the argument
 * after the dispatch_to_here function is done;
 * in the case that
 * destroy_threadpool_immediately is called,
 * cleaner_func is
 * also called before the threadpool exits.
 */
void dispatch_with_cleanup(threadpool from_me,
     dispatch_fn dispatch_to_here,
     void * arg, dispatch_fn cleaner_func,
     void* cleaner_arg);

/**
 * destroy_threadpool kills the threadpool, causing
 * all threads in it to commit suicide, and then
 * frees all the memory associated with the threadpool.
 */
void destroy_threadpool(threadpool destroyme);

/**
 * destroy_threadpool_immediately cancels all threads
 * in the threadpool
 * immediately.  It is potentially dangerous to use with
 * libraries that are not
 * specifically asynchronous cancel thread safe.
 * 
 * Also, without cleanup
 * handlers, any dynamic memory or system resource in use
 * could potentially be left without a reference to it.
 */
void destroy_threadpool_immediately(threadpool destroymenow);


Using the thread pool

The single best example of using the thread pool is a multithreaded server. It fits the use of a work crew the best: it can service many short requests, and there can be many of them. What is typical in a single-threaded server is a code block that looks like (in pseudocode):


Single-threaded server example

while(server_is_running)
{
  requestnum=wait_for_request	
  process_requst(requestnum)
}

There is a lot of waste in this, though. I/O is typically very slow, so running multiple requests in parallel can be advantageous. A typical server that simply spawns threads would have a code block like this:


Multithreaded server example

while(server_is_running)
{
  requestnum=wait_for_request()
  thread_dispatch(process_requst,requestnum)
}

To convert a server like this to a server using a thread pool is a matter of only a few steps:

  1. At the top of the file, add #include "threadpool.h".
  2. At the top of the function running the while loop shown above, declare a thread pool.
  3. Add a line just before the while loop that calls create_threadpool(N), where N is the number of threads to exist in the thread pool.
  4. Change the line that states thread_dispatch to state dispatch(name_of_threadpool, process_request, requestnum).
  5. After the while loop, add a line that calls destroy_threadpool to clean up the thread pool.

There are two distinct advantages to this. First, the cost of creating each thread is amortized over the life of the server. Second, because there is a limit on the number of threads that will then exist, a lot of time will be wasted with context switches between the threads. The code example from above, converted to use the thread pool, is:


Thread pool server example

threadpool tp;
.
.
.
tp = create_threadpool(10);
while(server_is_running)
{
  requestnum = wait_for_request
  dispatch(tp,process_requst,requestnum);
}

destroy_threadpool(tp);


Testing

We tested the thread pool using a very basic server application; it accepted TCP/IP requests, did an amount of work set at the server creation, and then responded to the request. The clients were simple applications that made requests to the server and then waited for a response. If the server dropped a request, it would close down its connection and start a new request. The network connecting the test machines was a switched ethernet running at a minimum of 100 Mbps (we conducted tests at night to minimize the effect of other traffic). The time to send one packet across this connection was 2.230ms.

Our tests ranged from I/O-intensive to computation-intensive applications. In all cases, the addition of threads to the thread pool showed superior performance to the monolithic implementation.


Test data: Results with varying computational and I/O loads

Loops:      1  Threads:  1 Dispatches/sec:    526.660
Loops:    100  Threads:  1 Dispatches/sec:    533.225
Loops:   1000  Threads:  1 Dispatches/sec:    267.247
Loops:  10000  Threads:  1 Dispatches/sec:     55.794
Loops: 100000  Threads:  1 Dispatches/sec:      6.596
Loops: 500000  Threads:  1 Dispatches/sec:      1.325
Loops:      1  Threads:  2 Dispatches/sec:    666.329
Loops:    100  Threads:  2 Dispatches/sec:     16.175
Loops:   1000  Threads:  2 Dispatches/sec:    247.148
Loops:  10000  Threads:  2 Dispatches/sec:     64.766
Loops: 100000  Threads:  2 Dispatches/sec:      6.664
Loops: 500000  Threads:  2 Dispatches/sec:      1.328
Loops:      1  Threads:  4 Dispatches/sec:    575.043
Loops:    100  Threads:  4 Dispatches/sec:    966.688
Loops:   1000  Threads:  4 Dispatches/sec:    381.330
Loops:  10000  Threads:  4 Dispatches/sec:     65.346
Loops: 100000  Threads:  4 Dispatches/sec:      6.775
Loops: 500000  Threads:  4 Dispatches/sec:      1.375
Loops:      1  Threads:  8 Dispatches/sec:    616.021
Loops:    100  Threads:  8 Dispatches/sec:    339.300
Loops:   1000  Threads:  8 Dispatches/sec:    304.173
Loops:  10000  Threads:  8 Dispatches/sec:     67.703
Loops: 100000  Threads:  8 Dispatches/sec:      6.781
Loops: 500000  Threads:  8 Dispatches/sec:      1.433
Loops:      1  Threads: 16 Dispatches/sec:    617.017
Loops:    100  Threads: 16 Dispatches/sec:    496.465
Loops:   1000  Threads: 16 Dispatches/sec:    346.666
Loops:  10000  Threads: 16 Dispatches/sec:     61.634
Loops: 100000  Threads: 16 Dispatches/sec:      6.854
Loops: 500000  Threads: 16 Dispatches/sec:      1.505
Loops:      1  Threads: 32 Dispatches/sec:   1053.896
Loops:    100  Threads: 32 Dispatches/sec:    381.208
Loops:   1000  Threads: 32 Dispatches/sec:    313.846
Loops:  10000  Threads: 32 Dispatches/sec:     89.955
Loops: 100000  Threads: 32 Dispatches/sec:      8.458
Loops: 500000  Threads: 32 Dispatches/sec:      2.130

For a complete description of the machines used for testing and the results of our tests, see Resources.


Conclusion

Our thread pool provides a simple interface for an application writer to create multithreaded applications. People tend to implement their own thread pools when they need them. However, it is very difficult to get a thread pool right without extensive testing and proofs of correctness. Reimplementing complicated data structures can be a quick and easy way of introducing bugs to any program. Therefore, we believe a solid, fully tested version of a thread pool is a valuable asset to the open-source community.


Acknowledgements

We would like to thank Steve Gribble of the University of Washington for many things within our project; the APIs, Makefile, and testing framework are based upon projects used in his CSE451 class.

We would also like to thank Dwight Tuinstra for his indispensible aid in implementing the thread pool; his attention to detail is astounding, and, someday, we hope to code like him.

We would also like to thank Matt Sabins and Jeanna Matthews (also mentioned below) for their suggestion of queueing the requests. This queueing enabled the thread pool to be faster than the monolithic model.

Further, we would like to thank the entire CS644 class at Clarkson University for reviewing the code, giving us ideas, making jokes, and generally making the course an interesting and enjoyable learning experience.

Finally, we would like to thank Jeanna Matthews for offering the CS644 course; informing the class about the IBM Linux Scholar Challenge (where 66% of the class won); showing us the idea for a thread pool; and simply being Jeanna.


Resources

About the authors

Phil Allen grew up in Malone, NY, and enrolled at Clarkson University in the fall of 1998. At Clarkson, he was a member of numerous organizations including student government, the school paper, and the Clarkson University Honors Program, and was one of the winners of the IBM Linux Scholar Challenge in 2001. He graduated in the Spring of 2002 and will be living in Seattle working as a programmer. You can contact Phil at allenpd@swirve.com.

Matthew Finlayson is a graduate of the Computer Science program at Clarkson University. Currently he is interning at IBM in the Linux Testing and Integration Center for the summer before returning to Clarkson University to pursue his Master's Degree in Computer Science. Aside from his four-year long obsession with the GNU/Linux operating system, he participates in several other hobbies. These include going to the gym, hiking, beating his roommates at Tony Hawk3, playing basketball, and playing the occasional game of chess. You can contact Matt at finlayms@clarkson.edu.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Sample IT projects
ArticleID=10204
ArticleTitle=IBM Linux Scholar Challenge: Phil and Matt's story
publish-date=06012002
author1-email=allenpd@swirve.com
author1-email-cc=
author2-email=finlayms@clarkson.edu
author2-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).