The area of microthreads -- at least in Python -- has been an exotic enhancement reserved for Stackless Python. The topic of Stackless, and the changes it has been undergoing recently, probably merits a whole column by itself. But the very short story is that under the "new Stackless," continuations are apparently out-the-window, but microthreads remain the raison d'être of the project. It's complicated...
Let's begin by backing up a bit though. Just what is a microthread?
Basically, a microthread is a process that can run with minuscule inherent
resource requirements -- and that runs within a single instance of the
Python interpreter (in a common memory space, and so on). With
microthreads, it is possible to run tens of thousands of parallel
processes on a moderately capable modern PC, and to switch between
contexts hundreds of thousands of times every second. Calls to fork() or standard OS threading calls do not even
come close to this! Even so-called "lightweight" threading libraries have
threads that are orders of magnitude "heavier" than those presented
here.
The semantics of the weightless threads I introduce in this column are
a bit different from those of OS threads. For that matter, they are also
not quite the same as what Stackless provides. In most respects
weightless threads are quite a bit simpler than most variants; most issues
about semaphores, locking, and the like disappear. The price of the
simplicity is that I present a form of "cooperative multithreading;"
adding preemption within the framework of standard Python does not appear
feasible to me (at least with non-Stackless Python 2.2 -- no one knows
what the __future__ might bring).
In a way, weightless threads recall the cooperative multitasking of older Windows and MacOS versions (but within a single application). In another sense, however, weightless threads are merely another way of expressing flow in a program; everything that weightless threads do could, at least in principle, be accomplished with the "really big if/elif block" technique (the last resort of the brute-force programmer).
Another installment of this column presents a mechanism for simulating
coroutines with simple generators. At its core, the mechanism is
amazingly simple. A set of generator objects is wrapped in a scheduler() function that controls delegation of
control flow to an appropriate branch. These are not true
coroutines in the sense that they control only branches to and from the
scheduler() function. But for practical
purposes, you can accomplish the same thing with very little extra code.
This is what scheduler() looked like:
def scheduler(gendct, start):
global cargo
coroutine = start
while 1:
(coroutine, cargo) = gendct[coroutine].next()
|
The thing to notice about this wrapper is that each
generator/coroutine yields a tuple that contains its intended branch
destination. Basically, a generator/coroutine exits with a GOTO target.
For convenience, I also let the generators yield a standard cargo container as a way of formalizing the data that
is passed between coroutines -- but you could also simply use agreed-upon
global variables or callback setter/getter functions to pass data. Raymond
Hettinger has written a Python Enhancement Proposal (PEP) that is intended
to allow better encapsulation of passed data; perhaps future Pythons will
include it.
For weightless threads, the requirements are slightly different from
those for coroutines. But we can still use a scheduler() function at its heart. The difference is
that the scheduler itself should decide branch targets rather than receive
them from the generator/coroutines. Let me show you a complete test program
and sample:
from __future__ import generators
import sys, time
threads = []
TOTALSWITCHES = 10**6
NUMTHREADS = 10**5
def null_factory():
def empty():
while 1: yield None
return empty()
def quitter():
for n in xrange(TOTALSWITCHES/NUMTHREADS):
yield None
def scheduler():
global threads
try:
while 1:
for thread in threads: thread.next()
except StopIteration:
pass
if __name__ == "__main__":
for i in range(NUMTHREADS):
threads.append(null_factory())
threads.append(quitter())
starttime = time.clock()
scheduler()
print "TOTAL TIME: ", time.clock()-starttime
print "TOTAL SWITCHES:", TOTALSWITCHES
print "TOTAL THREADS: ", NUMTHREADS |
This is about the simplest weightless thread scheduler you could choose. Every thread is entered in fixed order, and each has an identical priority. Next, let's look at how to complicate these details. As with the prior installment's coroutines, weightless threads should be written to obey a few conventions.
Most of the time, a weightless thread's generator should be bracketed in a
while 1: loop. The way the scheduler is set up
here causes the whole scheduler to stop when one thread does so. This is
less "robust" in a sense than OS threads -- but it would not require much
extra machinery to catch exceptions inside
scheduler()'s loop rather than outside the loop. Moreover, a
thread can be removed from the threads list without reaching
a termination (either by itself, or by some other thread). We haven't
really provided the bookkeeping to make removal easy; but a natural
extension would be to store threads in a dictionary or some other
structure, instead of a list.
The example illustrates one reasonable technique for eventual
termination of the scheduler loop. A special generator/thread, quitter(), monitors some condition (in this case,
just a count of context switches) and throws a StopIteration when it is satisfied (other exceptions
are not caught in the example). Notice that after termination, all the
other generators are still intact, and can be resumed later, if desired
(either in a microthread scheduler or otherwise). Obviously, you could
delete these generator/threads if desired.
The example in question uses particularly pointless threads. They do nothing, and do so in the most minimal possible form. The example is set up this way to illustrate a point -- the inherent overhead of weightless threads is quite low. Creating 100,000 weightless threads is no problem on an older Windows 98 Pentium II laptop with only 64 MB of memory (at one million threads, long disk churning occurs). Try that with OS threads! Moreover, on this fairly slow 366 MHz chip, one million context switches can be performed in about 10 seconds (the number of threads involved is not particularly significant to the timing). Obviously, realistic weightless threads should do something, and this will use some greater resources proportionate to the task. But the threading itself earns the "weightless" name.
Switching between weightless threads is cheap, but it is still not quite free. To test the case, I constructed an example that performs some work, but about the least you might reasonably perform in a thread. Since the thread scheduler really amounts to instructions to "do A, then do B, then do C, etc." it was not difficult to create an entirely parallel case in a main function:
from __future__ import generators import time TIMES = 100000 |
It turns out that the weightless thread version runs for slightly more than twice as long as the straight loop version -- a little less than 3 seconds versus just over 6 on the mentioned machine. Obviously, if each unit of work were two, or ten, or one hundred times as large as a single string method call, then the proportion of time spent in thread overhead would be correspondingly small.
Weightless threads can, and usually should, be larger-scale than a single conceptual operation. A thread, of whatever sort, is used to represent the amount of flow context necessary to describe one particular task or activity. But a task can be larger than the time/size one would want to spend inside a single thread context. Preemption handles this issue automatically, without any specific intervention by an application developer. Unfortunately, weightless thread users need to pay attention to "playing nice" with other threads.
At a minimum, a weightless thread should be considerate enough to
yield whenever it completes a conceptual
operation. The scheduler will come back to it for the next step. For
example:
def nicethread():
while 1:
...operation A...
yield None
...operation B...
yield None
...operation C...
yield None
|
A lot of times, a good design will yield
more often even than at the boundaries between basic operations. Often
something conceptually "basic" nonetheless involves looping over a large
collection. When that is the case (depending on just how time consuming
the loop body is), it probably makes sense to put a yield or two in the loop body (perhaps recurrent
after a certain number of loop iterations). Unlike with preemptive
threads, an ill-behaved weightless thread can grab an indefinite amount of
exclusive processor attention.
The examples so far have presented only the most basic styles of thread schedulers. There is a lot more that you can potentially do (this issue is fairly independent of designing a good generator/thread). Let me present several likely enhancements in passing.
A simple threads list makes it easy to add
generator/threads for the scheduler to handle. But this data structure
does nothing to make it easy to remove or suspend threads that are no
longer relevant. A dictionary or a class is likely to be a better data
structure for thread management. As a quick example, below is a class that
can (almost) drop in for the threads list in
the examples:
class ThreadPool:
"""Enhanced threads list as class
threads = ThreadPool()
threads.append(threadfunc) # not generator object
if threads.query(num) <<has some property>>:
threads.remove(num)
"""
def __init__(self):
self.threadlist = []
self.threaddict = {}
self.avail = 1
def __getitem__(self, n):
return self.threadlist[n]
def append(self, threadfunc, docstring=None):
# Argument is the generator func, not the gen object
# Every threadfunc should contain a docstring
docstring = docstring or threadfunc.__doc__
self.threaddict[self.avail] = (docstring, threadfunc())
self.avail += 1
self.threadlist = [p[1] for p in self.threaddict.values()]
return self.avail-1 # return the threadID
def remove(self, threadID):
self.threadlist = [p[1] for p in self.threaddict.values()]
def query(self, threadID):
"Information on thread, if it exists (otherwise None)
return self.threaddict.get(threadID,[None])[0]
|
You could do more, but this is a good start.
In the simple examples, all threads get equal attention from the
scheduler. There are at least two general approaches to a more fine-tuned
thread priority system. One priority system could simply devote more
attention to "high-priority" threads than to low-priority ones. A
straightforward way to do this would be to create a new class PriorityThreadPool(ThreadPool) that returned more
important threads more often during the thread iteration. The simplest
such approach might return some threads multiple times successively in its
.__getitem__() method. A high-priority thread
could then receive two, or ten, or a hundred successive "time-slices"
rather than just one this way. A (very weak) "real-time" variant on this
might go as far as returning the multiple copies of important threads
scattered throughout the thread list. This would increase the actual
frequency of servicing high-priority threads, not only the total attention
they receive.
A perhaps more sophisticated approach to thread priorities is not
easily available in pure Python (but it is with some third-party
OS/processor-specific libraries). Rather than simply give high-priority
threads an integer number of time-slices, the scheduler could measure the
time actually spent in each weightless thread, and dynamically adjust the
thread scheduling to be more "fair" to under-serviced threads (fairness
being perhaps related to thread priority). Unfortunately, Python's time.clock() and its family are not nearly high
enough resolution timers to make this approach effective. On the other
hand, there is nothing that prevents an underfed thread in the
"multiple-time-slices" approach from boosting its own priority
dynamically.
Combining microthreads with coroutines
In order to create a scheduler for weightless threads (microthreads), I
removed the coroutine logic for "please branch to here." Doing that was
not actually necessary. The weightless threads in the examples have always
yielded None rather than a jump target. But
there is no reason the two concepts could not be combined: If a
coroutine/thread yields a jump target, the scheduler can jump where
requested (unless overridden by thread priorities, perhaps). However, if a
coroutine/thread merely yields None, the
scheduler could make its own decision about the appropriate thread for the
next attention. There would be a bit of work involved in deciding (and
programming) exactly how an arbitrary jump would interact with a linear
thread queue, but nothing particularly mysterious about the work.
Fast and cheap -- what's not to like?
The microthread pattern (or "weightless threads") basically boils down to yet another exotic style for flow control in Python. Earlier installments of this column have already touched on several others. The beauty of having a good variety of control mechanisms is that it lets a developer isolate code functionality into its logical components and maximize contextual relevance of code.
It doesn't take much, actually, to enable the possibility of doing everything possible (a "loop" and an "if"). For a class of problems easily broken into many small "agents," "servers," or "processes," weightless threads may be the clearest model for expressing the underlying "business logic" of an application. Of course, it does not hurt matters that weightless threads have the potential to be blazingly fast in comparison to more well-known flow mechanisms.
- Read the previous installments of Charming Python.
- Early on in this column, David presented an abstract pattern for a broad class of state machines in Python. The installment, "Using state machines" (developerWorks, August 2000), is the touchstone from which he explains the new concepts in this article.
- David's interview with Stackless Python creator Christian Tismer (developerWorks, October 2000) touched on microthreads briefly, along with other exotic flow structures like continuations, coroutines, and generators. The installment hints at some of the issues, before Python 2.2 introduced generators.
- More recently, David introduced iterators and simple generators (developerWorks, September 2001) during Python 2.2's alpha period.
- Good reading for anyone interested in microthreads -- or the complicated history of Stackless Python -- is the Stackless Python Web site. Even though David's suggestions may encroach on the Stackless conceptual space, he believes that Stackless continues to do a lot that his framework doesn't (at the price of requiring a patched fork of the standard Python).
- Find more Linux articles in the developerWorks Linux zone.

David Mertz would like to write, with Nietzsche, that these are the musings of an old philologist, but that untruth would unmask itself. But perhaps his (gratuitously plugged) forthcoming book, Text Processing in Python, will someday be mistaken for a cybernetic variant of philology. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcome.





