Matt: Before going to Clarkson University, neither of us had experience with Linux. At a Clarkson ACM open house, I saw a Linux demo. At the time, I had no idea what it really was. After I started attending Clarkson, I installed and began using Linux. Beginning with basic skills and then moving to more difficult portions, I became proficient in many parts of Linux and played around with things that interested me. Sometime during our freshman year, Phil stopped by my room and saw Linux for the first time. One night I stopped by his room and installed it for him, allowing him to experiment with it.
We both took computer science courses and were in Operating Systems in the spring of our junior year. The class covers a lot of concepts that are fundamental to understanding how a computer works, and it is among the most necessary courses for a computer scientist. Near the end of the course, one of the professors proposed holding a graduate-level seminar in the fall of 2001 relating to operating systems. Phil and I joined the class, along with a few others.
Advanced Topics in Operating Systems was both challenging and rewarding. We had to read many articles about the development of computing, starting with systems like Mach, Hydra, and Multics and ending with Windows NT, VMWare, and AFS. The class covered operating systems, communications, distributed systems, file systems, and others. As a part of the course, all of us signed up for the IBM Linux Scholar Challenge, and eventually we all submitted entries.
One of the most difficult things about the IBM Linux Scholar Challenge was deciding on a topic. Making modifications to AFS was a possibility; user resource tracking, an excellent idea, was already taken. In the end, Phil and I ended up taking an old homework assignment and revamping it. Instead of tackling a huge problem, we tackled a small one -- but made sure we did it well.
The original assignment was to develop a thread pool, which is a data structure that maintains different threads of execution. A set number of threads are generated when the thread pool is created, and then tasks are sent to the pool through function calls. Each task sent to the thread pool (with a call of dispatch) is executed by one of the threads. If there are more tasks than threads, the requests are placed in a queue in order to be serviced. If there are fewer tasks than threads, idle threads wait on a signal.
This makes a thread pool a 'work crew' of sorts, where each thread is a member of the crew, and the work crew, as a whole, does work for the main thread of the program. The threads themselves do not die, even when they finish a task. This is an advantage because there is a time cost to creating a thread; instead of paying that cost once for each task, the cost is spread over many tasks.
We built the thread pool library to satisfy the following requirements:
- It must execute all dispatched functions exactly once. The quality of service requirement (QoS) is that each request be serviced at some point.
- When dispatch is called, the work to be done is placed into a queue and dispatch returns. The only time that dispatch blocks is when the queue uses at least 64k of heap space and is full; in that case, the act of adding information to the queue will block. This prevents the program from crashing due to lack of heap space.
- The thread pool cannot have deadlock conditions. Threads can cause deadlock, but the thread pool package itself should not.
- The thread pool should have no busy-wait loops. A busy-wait loop has the tendency to be slow; in some cases, a two-phase wait is faster than blocking (see Resources); however, programming around busy-wait loops is just as effective.
- It must include a simple, clean API to allow rapid and painless adoption by programmers.
The only additional requirement on the thread pool is that the programmer cannot execute thread pool functions through the dispatch of thread pool. Any such function calls will immediately return with no code executed.
We used Linux, the C programming language, and the pthreads library to build our library of functions. The functions can be used by other programmers who would like a thread pool in their program, but don't want to write their own. All of the synchronization issues are already taken care of if you use our thread pool, plus it's already written and tested. If you are writing a program that uses a group of threads as a work crew, using our thread pool could save you considerable time and headache. The actual API can be found below (some lines in the example have been split).
The thread pool API
typedef void *threadpool;
typedef void (*dispatch_fn)(void *);
/**
* create_threadpool creates a fixed-sized thread
* pool. If the function succeeds,
* it returns a (non-NULL)
* "threadpool", else it returns NULL.
*/
threadpool create_threadpool(int num_threads_in_pool);
/**
* dispatch sends a thread off to do some work. If
* all threads in the pool are busy, dispatch will
* block until a thread becomes free and is dispatched.
*
* Once a thread is dispatched, this function returns
* immediately, except in cases of a heavily loaded server.
*
* The dispatched thread calls into the function
* "dispatch_to_here" with argument "arg".
*/
void dispatch(threadpool from_me,
dispatch_fn dispatch_to_here, void * arg);
/**
* Works the same as dispatch, but enables the user to
* define cleanup
* handlers in cases of immediate cancel.
* The cleanup handler function
* (cleaner_func) is executed automatically
* with cleaner_arg as the argument
* after the dispatch_to_here function is done;
* in the case that
* destroy_threadpool_immediately is called,
* cleaner_func is
* also called before the threadpool exits.
*/
void dispatch_with_cleanup(threadpool from_me,
dispatch_fn dispatch_to_here,
void * arg, dispatch_fn cleaner_func,
void* cleaner_arg);
/**
* destroy_threadpool kills the threadpool, causing
* all threads in it to commit suicide, and then
* frees all the memory associated with the threadpool.
*/
void destroy_threadpool(threadpool destroyme);
/**
* destroy_threadpool_immediately cancels all threads
* in the threadpool
* immediately. It is potentially dangerous to use with
* libraries that are not
* specifically asynchronous cancel thread safe.
*
* Also, without cleanup
* handlers, any dynamic memory or system resource in use
* could potentially be left without a reference to it.
*/
void destroy_threadpool_immediately(threadpool destroymenow);
|
The single best example of using the thread pool is a multithreaded server. It fits the use of a work crew the best: it can service many short requests, and there can be many of them. What is typical in a single-threaded server is a code block that looks like (in pseudocode):
Single-threaded server example
while(server_is_running)
{
requestnum=wait_for_request
process_requst(requestnum)
}
|
There is a lot of waste in this, though. I/O is typically very slow, so running multiple requests in parallel can be advantageous. A typical server that simply spawns threads would have a code block like this:
Multithreaded server example
while(server_is_running)
{
requestnum=wait_for_request()
thread_dispatch(process_requst,requestnum)
}
|
To convert a server like this to a server using a thread pool is a matter of only a few steps:
- At the top of the file, add
#include "threadpool.h". - At the top of the function running the while loop shown above, declare a thread pool.
- Add a line just before the while loop that calls
create_threadpool(N), where N is the number of threads to exist in the thread pool. - Change the line that states
thread_dispatch to state dispatch(name_of_threadpool, process_request, requestnum). - After the while loop, add a line that calls
destroy_threadpoolto clean up the thread pool.
There are two distinct advantages to this. First, the cost of creating each thread is amortized over the life of the server. Second, because there is a limit on the number of threads that will then exist, a lot of time will be wasted with context switches between the threads. The code example from above, converted to use the thread pool, is:
Thread pool server example
threadpool tp;
.
.
.
tp = create_threadpool(10);
while(server_is_running)
{
requestnum = wait_for_request
dispatch(tp,process_requst,requestnum);
}
destroy_threadpool(tp);
|
We tested the thread pool using a very basic server application; it accepted TCP/IP requests, did an amount of work set at the server creation, and then responded to the request. The clients were simple applications that made requests to the server and then waited for a response. If the server dropped a request, it would close down its connection and start a new request. The network connecting the test machines was a switched ethernet running at a minimum of 100 Mbps (we conducted tests at night to minimize the effect of other traffic). The time to send one packet across this connection was 2.230ms.
Our tests ranged from I/O-intensive to computation-intensive applications. In all cases, the addition of threads to the thread pool showed superior performance to the monolithic implementation.
Test data: Results with varying computational and I/O loads
Loops: 1 Threads: 1 Dispatches/sec: 526.660 Loops: 100 Threads: 1 Dispatches/sec: 533.225 Loops: 1000 Threads: 1 Dispatches/sec: 267.247 Loops: 10000 Threads: 1 Dispatches/sec: 55.794 Loops: 100000 Threads: 1 Dispatches/sec: 6.596 Loops: 500000 Threads: 1 Dispatches/sec: 1.325 Loops: 1 Threads: 2 Dispatches/sec: 666.329 Loops: 100 Threads: 2 Dispatches/sec: 16.175 Loops: 1000 Threads: 2 Dispatches/sec: 247.148 Loops: 10000 Threads: 2 Dispatches/sec: 64.766 Loops: 100000 Threads: 2 Dispatches/sec: 6.664 Loops: 500000 Threads: 2 Dispatches/sec: 1.328 Loops: 1 Threads: 4 Dispatches/sec: 575.043 Loops: 100 Threads: 4 Dispatches/sec: 966.688 Loops: 1000 Threads: 4 Dispatches/sec: 381.330 Loops: 10000 Threads: 4 Dispatches/sec: 65.346 Loops: 100000 Threads: 4 Dispatches/sec: 6.775 Loops: 500000 Threads: 4 Dispatches/sec: 1.375 Loops: 1 Threads: 8 Dispatches/sec: 616.021 Loops: 100 Threads: 8 Dispatches/sec: 339.300 Loops: 1000 Threads: 8 Dispatches/sec: 304.173 Loops: 10000 Threads: 8 Dispatches/sec: 67.703 Loops: 100000 Threads: 8 Dispatches/sec: 6.781 Loops: 500000 Threads: 8 Dispatches/sec: 1.433 Loops: 1 Threads: 16 Dispatches/sec: 617.017 Loops: 100 Threads: 16 Dispatches/sec: 496.465 Loops: 1000 Threads: 16 Dispatches/sec: 346.666 Loops: 10000 Threads: 16 Dispatches/sec: 61.634 Loops: 100000 Threads: 16 Dispatches/sec: 6.854 Loops: 500000 Threads: 16 Dispatches/sec: 1.505 Loops: 1 Threads: 32 Dispatches/sec: 1053.896 Loops: 100 Threads: 32 Dispatches/sec: 381.208 Loops: 1000 Threads: 32 Dispatches/sec: 313.846 Loops: 10000 Threads: 32 Dispatches/sec: 89.955 Loops: 100000 Threads: 32 Dispatches/sec: 8.458 Loops: 500000 Threads: 32 Dispatches/sec: 2.130 |
For a complete description of the machines used for testing and the results of our tests, see Resources.
Our thread pool provides a simple interface for an application writer to create multithreaded applications. People tend to implement their own thread pools when they need them. However, it is very difficult to get a thread pool right without extensive testing and proofs of correctness. Reimplementing complicated data structures can be a quick and easy way of introducing bugs to any program. Therefore, we believe a solid, fully tested version of a thread pool is a valuable asset to the open-source community.
We would like to thank Steve Gribble of the University of Washington for many things within our project; the APIs, Makefile, and testing framework are based upon projects used in his CSE451 class.
We would also like to thank Dwight Tuinstra for his indispensible aid in implementing the thread pool; his attention to detail is astounding, and, someday, we hope to code like him.
We would also like to thank Matt Sabins and Jeanna Matthews (also mentioned below) for their suggestion of queueing the requests. This queueing enabled the thread pool to be faster than the monolithic model.
Further, we would like to thank the entire CS644 class at Clarkson University for reviewing the code, giving us ideas, making jokes, and generally making the course an interesting and enjoyable learning experience.
Finally, we would like to thank Jeanna Matthews for offering the CS644 course; informing the class about the IBM Linux Scholar Challenge (where 66% of the class won); showing us the idea for a thread pool; and simply being Jeanna.
- Read about the other Clarkson entries in our series.
- To learn more about coscheduling, see Andrea C. Arpaci-Dusseau's "Implicit Coscheduling: Scheduling with Implicit Information in Distributed Systems" (ACM Transactions on Computer Systems, Vol. 19, No 3, 2001, pp.283-331).
- Read John Ousterhout's ideas about threads in Why Threads Are A Bad Idea (for most purposes). Invited Talk at the 1996 USENIX Technical Conference (January 25, 1996).
- Visit the Advanced Topics in Operating Systems class homepage.
- View Steve Gribble's Operating Systems course at the University of Washington.
- For complete testing information, go tohttp://www.clarkson.edu/~allenpd/threadpool/testing.html.
- Read Phil and Matt's thread pool paper.
- Get more Linux articles from the developerWorks
Linux zone.
- Are you a student or a teacher? Visit the IBM Student Portal or the IBM Faculty Portal to find information and offers tailored to your needs.
Phil Allen grew up in Malone, NY, and enrolled at Clarkson University in the fall of 1998. At Clarkson, he was a member of numerous organizations including student government, the school paper, and the Clarkson University Honors Program, and was one of the winners of the IBM Linux Scholar Challenge in 2001. He graduated in the Spring of 2002 and will be living in Seattle working as a programmer. You can contact Phil at allenpd@swirve.com.
Matthew Finlayson is a graduate of the Computer Science program at Clarkson University. Currently he is interning at IBM in the Linux Testing and Integration Center for the summer before returning to Clarkson University to pursue his Master's Degree in Computer Science. Aside from his four-year long obsession with the GNU/Linux operating system, he participates in several other hobbies. These include going to the gym, hiking, beating his roommates at Tony Hawk3, playing basketball, and playing the occasional game of chess. You can contact Matt at finlayms@clarkson.edu.
