Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Server clinic: Concurrency for grown-ups

It's not just about threading

Cameron Laird (claird@phaseit.net), Vice president, Phaseit, Inc.
Cameron is a full-time consultant for Phaseit, Inc., who writes and speaks frequently on Open source and other technical topics.

Summary:  Concurrency -- multi-processing -- is widely misunderstood. This month's "Server clinic" introduces the basic concurrency concepts you need to conduct your business in the server closets safely.

Date:  15 Aug 2002
Level:  Introductory

Comments:  

Many mistakes are made in the name of multi-processing. Most academic programs and many programming texts explain the concepts of concurrency clearly, but it's a difficult topic, and nearly all of us can use a refresher.

Concurrency labels situations where more than one "application" is running at a time. I quote "application" here because the meaning is context dependent. Linux hosts always fill their process tables with a bunch of more-or-less simultaneous programs: network protocol daemons, cron managers, the kernel itself, and often much more. Linux is a multi-tasking operating system. It's built for such duty.

On a typical uniprocessor host, tasks don't really execute simultaneously. The part of the kernel called the scheduler swaps jobs in and out so that they all get a turn. Your browser downloads during the same interval as you're editing a program source, and also playing a music track. Concurrency most often has to do with this appearance of simultaneity.

Concurrency's two aspects

Keep in mind the "user view" or "programming model" of concurrency as a matter of scheduling access to a unitary resource. Complementing this, though, is a second, "back end," meaning of concurrency. People out for raw performance emphasize this different aspect. In their context, "multi-processing" generally means dividing up a single task into parts on which different central processing units (CPUs) can collaborate. The idea is to finish a job in a shorter time as measured by an external clock, even if at the cost of more hardware and programming complexity.

Both aspects of concurrency have to do with scheduling, or assignment of tasks to CPUs. Both bear on usability. Confusing the two aspects, though, is a common and troublesome error. Beginning programmers seem particularly prone to false beliefs about one of the most important concurrency methods, called "multi-threading." Often abbreviated as "threading," its misconceptions include the ideas that:

  • Threading makes programs run faster.
  • Threading is the only concurrency construct, or the only practical one.
  • N-way hosts work approximately N times as fast as uni-processing hosts.

Just a little conceptual clarity helps correct these mistakes quickly.

Naive developers often say ask, "My program is too slow; how can I make it threaded so it'll be faster?" The answer often is, "You can't." A straightforward transformation of an existing single-tasking application to break it into multi-tasking parts always demands more computations. In general, "threading" such a program makes it take longer.

There's a reason the falsehood persists, of course. Many programs can be factored into parts in a way that eases bottlenecks. A compute-intensive job -- simulation of a Space Shuttle re-entry, say -- that can be spread over eight CPUs rather than one probably will finish much faster. Even more common is to restructure a program to avoid input/output (I/O) "blocks." If your consumer-level application can do useful work while waiting for keyboard input, for data to swap in from disk, or for messages to arrive through the network, it will appear to have "sped up" for free.


The limits of threading

It is hazardous, though, to credit threading for these accelerations. They all depend on a deeper analysis; speed-up is possible only when an under-utilized resource is available. Moreover, threading isn't the only way to achieve these concurrencies, and frequently it's not the best one.

Academic literature studies at least a dozen concurrency models important enough to go into production. Along with threading, you're likely to have heard of multi-processing (in a programmatic sense), co-routines, event-based programming, and perhaps continuations, generators, and several of the more esoteric constructs. All of these methods have a rough formal equivalence in the sense that if you have a language that supports, say, generators but not threads, you can write an emulation for threads in terms of generators (and vice-versa).

Programming appropriateness is different from abstract equivalence, though. There are real differences between the concurrency models when you're working to deliver reliable applications on schedule. Threading, for example, has frailties that have been known for many years. It's a relatively low-level programming construct. It's hard to program safely; programs that manipulate threads are prone to inconsistent data, deadlocks, unscalable locking, and inverted priorities. Java recently abandoned its initial intent to support only multi-threading as a core concurrency concept because of the performance problems threading has. Thread-savvy debuggers have been notoriously expensive.

Not all the news is bad, though. If you make the time to understand basic concepts clearly, you can work with threads as reliably as you do XML, LDAP, or any other specialized domain. More immediately, there are safer -- and sometimes faster! -- concurrency models for many situations.

In many, many situations, the best way to multi-task an application is to decompose it into collaborating processes rather than threads. Programmers commonly resist this reality. One reason is history: processes used to be far "heavier" than threads, and, under most flavors of Windows, still are. With modern Linux, though, a context switch between distinct processes might take only 15% more time than the corresponding context switch between same-process threads. What you gain for that cost in time is a far better understood and more robust programming model. Many programmers can safely write independent processes. Relatively few are safe with threads.

When is it good to multi-process rather than multi-thread? Suppose, for example, that you have a "control panel" graphical user interface (GUI) that monitors results of several large calculations, retrieves and updates database records, and perhaps even reports on the status of external physical devices. You could put all this in one process, with a separate thread for each task. That's often the preferred course under Windows.

My development practice, though, generally is to put each task in its own process, communicating through sockets, pipes, or occasionally shared memory. This enormously simplifies unit testing, as you can use all your usual command-line tools for automation of the separate processes. A crash in one process doesn't harm any of the others. Performance is usually about as good as with multi-threading, and, depending on hardware and programming details, occasionally better.


Other concurrency models

Such a multi-process implementation frequently depends on event-based programming. Events are a distinct concurrency concept useful for managing I/O and related multi-tasking responsibilities. Events relate asynchronous "externalities" to programmed callbacks (also called signals, bindings, and so on). Think of the GUI control panel; a high-performance way to program this with Unix is to update the display only when the select() system call detects arriving data. C-oriented programmers often label event-based methods in terms of select.

You might regard "co-routines" or "generators" as classroom exotica. They were built into the definitions of such languages as Modula and Icon, though, because they make for multi-tasking programming that is expressively powerful while remaining comprehensible and therefore safe. If you have complex performance requirements, if your applications are best modeled in terms of hundreds of subtasks, and especially if your server room is home to a large number of multi-way hosts, you ought to study more of the range of concurrency models. You'll find that each one has applications where it's ideal. Some of these might match your own needs.

Also, be aware that you can probably find support for any of the models you might want to use with Linux. The references below point, among other resources, to implementations and experiments with a wide variety of concurrency models.


Multiway puzzles

One final caution: don't assume that your multitasking software works sensibly on your multiprocessing (often "symmetric multiprocessing" -- SMP) hardware. Especially with older versions of Linux, expertise was often involved in getting useful results from an SMP box. A default Linux 2.4 installation does a good job of using up to four (and sometimes more) processors for distinct processes. Threads within a process, though, can be bottlenecked on a single processor, while other processors sit idle. Other concurrency methods sometimes suffer similarly.

To avoid these resource wastes depends on the details of your platform. With Linux 2.4 and popular multi-way hardware, you can reasonably expect default "kernel threads" (see the Linux threading FAQ in Resources) to schedule threads properly, that is, to share them among different CPUs. Use top and other system management tools to verify that scheduling appears to be correct, and ask your Linux vendor or users' group specific questions about practical thread scheduling.

Much of your programming is likely to have a natural decomposition into distinct logical tasks. Understand basic concurrency concepts clearly, and you can apply them to to meet your own requirements. Remember that concurrency has both a forward- and backward-facing aspect: the "user view" or "programming model" controls the functionality of how you interact with an application, while the "back end" manages assignment of tasks to hardware. Rigorously distinguish your functional and performance requirements. Finally, keep in mind that there's more to concurrency than just threading. You often can make best use of your servers by programming with a model that is not overtly multi-threading.


Resources

  • Participate in the discussion forum.

  • Check out the other installments of Server clinic.

  • I have stylistic disputes with the comp.programming.threads FAQ. There's no doubt, though, that it's a valuable resource, particularly if you're already elbow-deep in practical threaded programming challenges.



  • My first major complaint about the previous FAQ is that it's so narrowly oriented toward C/C++ programming that it doesn't acknowledge other languages or concurrency models. Quite a few other languages, including Java, have their own language-specific threading FAQs.



  • The Linux Threads Home Page does recognize other languages than C/C++. Its major problem is that it's been nearly unmaintained for a couple of years. Still, it has valuable discussions of user- and kernel-level multi-threading, co-operative vs. pre-emptive scheduling, and more.



  • "Modern Concurrency Abstractions for C#" illustrates the ferment that still exists in concurrency theory. Researchers and engineers continue to invent and apply new models for better multi-tasking.



  • Several members of the IBM Linux Technology Center are working on the Next Generation POSIX Threading project. More information is available at the NGPT home page.



  • Also on developerWorks, read other viewpoints in:

  • Find more Linux articles in the developerWorks Linux zone.

About the author

Cameron is a full-time consultant for Phaseit, Inc., who writes and speaks frequently on Open source and other technical topics.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11238
ArticleTitle=Server clinic: Concurrency for grown-ups
publish-date=08152002
author1-email=claird@phaseit.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).