Make Python run as fast as C with Psyco
Using Psyco, the Python specializing compiler
This content is part # of # in the series: Charming Python
This content is part of the series:Charming Python
Stay tuned for additional content in this series.
Python is usually fast enough for what you want it to do. Ninety percent of the concerns that novice programmers express about the execution speed of an interpreted/byte-compiled language like Python are simply naive. On modern hardware, most non-optimized Python programs run as fast as they need to, and there is really no point in spending extra programming effort to make an application run faster.
In this installment, therefore, I'm only interested in that other ten percent. Once in while, Python programs (or programs in other languages) run impracticably slowly. What is needed for a given purpose varies greatly; shaving off milliseconds is rarely compelling, but speeding up tasks that run for minutes, hours, days, or weeks is often worthwhile. Moreover, it should be noted that not everything that runs slowly is CPU-bound. If a database query takes hours to complete, for example, it makes little difference whether the resulting dataset takes one or two minutes to process. This installment is also not about I/O issues.
There are a number of ways to speed up Python programs. The first technique that should come to every programmer's mind is improving the algorithms and data structures used. Micro-optimizing the steps of an inefficient algorithm is a fool's errand. For example, if the complexity order of the current technique is O(n**2), speeding up the steps by 10x is a lot less helpful than finding an O(n) substitute. This moral applies even when considering an approach as extreme as a rewrite in assembly: the right algorithm in Python will often go faster than the wrong algorithm in hand-tuned assembly.
The second technique you ought to consider first is to profile your Python application, with an eye to rewriting key portions as C extension modules. Using an extension wrapper like SWIG (see Related topics), you can create a C extension that executes the most time consuming elements of your program as C code. Extending Python in this manner is relatively straightforward, but require some time to learn (and knowledge of C). Very often you will find that the large majority of the time spent in executing your Python application is spent in just a handful of functions, so considerable gains are possible.
A third technique builds on the second. Greg Ewing has created a language called Pyrex, which melds Python and C. In particular, to use Pyrex, you write functions in a Python-like language that adds type-declarations to selected variables. Pyrex (the tool) processes ".pyx" files into ".c" extensions. Once compiled with a C compiler, these Pyrex (the language) modules can be imported into and used in your regular Python applications. Since Pyrex uses almost the same syntax as Python itself (including loop, branch and exception statements, assignment forms, block indentation, etc.) a Pyrex programmer need not learn C to write extensions. Moreover, Pyrex allows more seamless mixing of C-level variables and Python-level variables (objects) within the same code than does an extension written directly in C.
A final technique is the subject of this installment. The extension
module Psyco can plug in to the guts of the Python interpreter, and
selectively substitute optimized machine code for portions of Python's
interpreted bytecode. Unlike the other techniques described, Psyco
operates strictly at Python runtime. That is, Python source code is
compiled by the
python command to bytecode in
exactly the same manner as before (except for a couple
import statements and function calls added to invoke
Psyco). But while the Python interpreter is running an application, Psyco
sometimes checks to see if it can substitute some specialized machine code
for regular Python bytecode actions. This specialized compilation is
both very similar to what Java just-in-time compilers do (broadly
speaking, at least) and is architecture-specific. As of right now, Psyco
is only available for i386 CPU architectures. The nice thing about Psyco
is that you can use the very same Python code you have been writing all
along (literally!), but let it run much faster.
How Psyco works
To understand Psyco completely, you probably need to have a good grasp of
both the Python interpreter's
function and i386 Assembly. Unfortunately, I myself can claim neither
expertise -- but I think I can explain Psyco in outline without going too
In regular Python, the
eval_frame() function is
the inner loop of the Python interpreter. Basically, the
eval_frame() function looks at the current bytecode
in an execution context, and switches control out to a function
appropriate for implementing that bytecode. The specifics of what this
support function will do depend, in general, upon the states of various
Python objects held in memory. To make it simple, adding the Python
objects "2" and "3" produces a different result than adding the objects
"5" and "6," but both operations are dispatched in a similar way.
Psyco replaces the
eval_frame() function with a
compound evaluation unit. There are several ways that Psyco is able to
improve upon what Python does. In the first place, Psyco compiles
operations to somewhat optimized machine code; in itself this produces
only slight improvements, since what the machine code needs to accomplish
is the same as what Python's dispatched functions do. Moreover, what is
"specialized" in Psyco compilation is more than the choice of Python
bytecodes, Psyco also specializes over variable values that are known in
execution contexts. For example, in code like the below, the variable
x is knowable for the duration of the loop:
x = 5 l =  for i in range(1000): l.append(x*i)
An optimized version of this code need not multiply each
i by "the content of the x variable/object" -- it is
less expensive to simply multiply each
i by 5,
saving a lookup/dereference.
Aside from creating i386-specific codes for small operations, Psyco caches this compiled machine code for later reuse. If Psyco is able to recognize that a particular operation is the same as something that was performed (and "specialized") earlier, it can rely on this cached code, rather than need to recompile the segment. This saves a bit more time.
The real savings in Psyco, however, arise from Psyco's categorization of operations into three different levels. For Psyco, there are "run-time," "compile-time" and "virtual-time" variables. Psyco promotes and demotes variables between the levels as needed. Run-time variables are simply the raw bytecodes and object structures that the regular Python interpreter handles. Compile-time variables are represented in machine registers and directly-accessed memory locations, once the operations have been compiled by Psyco into machine code.
The most interesting level is the virtual-time variables. A Python variable is, internally, a complete structure, with lots of members -- even when the object only represents an integer. Psyco virtual-time variables represent Python objects that could potentially be built if the need arose, but whose details are omitted until they are. For example, consider an assignment like:
x = 15 * (14 + (13 - (12 / 11)))
Standard Python builds and destroys a number of objects to compute this
value. An entire integer object is built to hold the value of
(12/11); then a value is pulled out of the temporary
object's structure, and used to compute a new temporary object
(13-PyInt). Psyco skips the objects, and just
computes the values, knowing that an object can be created "if needed"
from the value.
Explaining Psyco is relatively difficult, but using Psyco is far easier. Basically, all there is to it is telling the Psyco module which functions/methods to "specialize." No code changes need be made to any of your Python functions and classes themselves.
There are a couple approaches to specifying what Psyco should do. The "shotgun" approach is to enable Psyco just-in-time operation everywhere. To do that, put the following lines at the top of your module:
import psyco ; psyco.jit() from psyco.classes import *
The first line tells Psyco to do its magic on all global functions. The second line (in Python 2.2 and above) tells Psyco to do the same with class methods. To target Psyco's behavior a bit more precisely, you can use the commands:
psyco.bind(somefunc) # or method, class newname = psyco.proxy(func)
The second form leaves
func as a standard
Python function, but optimizes calls involving
newname. In almost all cases other than testing and
psyco.bind() form is what you
As magic as Psyco is, using it still requires a little thought and testing. The main thing to understand is that Psyco is useful for handling blocks that loop many times, and it knows how to optimize operations involving integers and floating point numbers. For non-looping functions, and for operations on other types of objects, Psyco mostly just adds overhead for its analysis and internal-compilation. Moreover, for applications with large numbers of functions and classes, enabling Psyco application-wide adds a large burden in machine-code compilation and memory-usage for this caching. It is far better to selectively bind those functions that can benefit most from Psyco's optimizations.
I started my testing in a completely naive fashion. I simply considered what application I had run recently that I would not mind speeding up. The first example that came to mind was a text-manipulation program I use to convert drafts of my forthcoming book (Text Processing in Python) to LaTeX format. This application uses some string methods, some regular expressions, and some program logic driven mostly by regular expressions and string matches. It is actually a terrible candidate for Psyco, but I use it, so I started there.
On the first pass, all I did was add
psyco.jit() to the top of my script. Painless
enough. Unfortunately, the results were (expectedly) disappointing.
Where the script initially took about 8.5 seconds to run, after Psyco's
"speedup" it ran in about 12 seconds. Not so good! I guessed that the
just-in-time compilation probably had some startup overhead that swamped
the running time. So the next thing I tried was processing a much larger
input file (consisting of multiple copies of the original one). This gave
the very limited success of reducing running time from about 120 seconds
to 110 seconds. The speedup was consistent across several runs, but
fairly insignificant either way.
Second pass with my text processing candidate. Instead of adding a
psyco.jit() call, I added only the line
psyco.bind(main), since the
main() function does loop a number of times
(but only makes minimal use of integer arithmetic). The results here were
nominally better. This approach shaved a few tenths of a second off the
normal running time, and a few seconds off the large-input version. But
still nothing spectacular (but also no harm done).
For a more relevant test of Psyco, I dug up some neural network code that
I had written about in an earlier article (see Resources). This
"code_recognizer" application can be trained to recognize the likely
distribution of different ASCII values in different programming languages.
Potentially, something like this could be useful in guessing file types
(say of lost network packets); but the code is actually completely generic
as to what it is trained on -- it could learn to recognize faces, or
sounds, or tidal patterns just as easily. In any case, "code_recognizer"
is based on the Python library
which is also included (in modified form) as a test case with the Psyco
0.4 distribution. The important thing to know about "code_recognizer" for
this article is that it does a lot of looping floating point math, and it
takes a long time to run. We have got a good candidate for Psyco to work
After a little playing around, I established several details about how to use Psyco. For this application, with just a small number of classes and functions, it does not make too much difference whether you use just-in-time or targeted binding. But the best result, by a few percentage points, still comes about by selectively binding the best optimizable classes. More significantly, however, it is important to understand the scope of Psyco binding.
code_recognizer.py script contains lines
from bpnn import NN
class NN2(NN): # customized output methods, math core inherited
That is, the interesting stuff from Psyco's point of view is in the class
bpnn.NN. Adding either
psyco.bind(NN2) to the
code_recognizer.py script has little effect. To get
Psyco to do the desired optimization, you need to either add
code_recognizer.py or add
Contrary to what you might assume, just-in-time does not happen when an
instance is created, or methods run, but rather in the scope where the
class is defined. In addition, binding descendent classes does not
specialize their methods that are inherited from elsewhere.
Once the small details of proper Psyco binding are worked out, the resultant speedups are rather impressive. Using the same test cases and training regime the referenced article presented (500 training patterns, 1000 training iterations), neural net training time was reduced from about 2000 seconds to about 600 seconds -- better than a 3x speedup. Reducing the iterations as low as 10 showed proportional speedups (but worthless neural net recognition), as did intermediate numbers of iterations.
I find bringing running time down from more than 1/2 hour to about 10 minutes with two lines of new code to be quite remarkable. This speedup is still probably less than the speed of a similar application in C, and it is certainly less than the 100x speedup that a few isolated Psyco test cases exhibit. But this application is fairly "real life" and the improvements are enough to be significant in many contexts.
Psyco currently does not perform any sort of internal statistics or profiling, and does only minimal optimization of generated machine code. Potentially, a later version might know how to target those Python operations that could actually benefit most, and discard cached machine code for non-optimizable sections. In addition, perhaps a future Psyco could decide to perform more extensive (but more costly) optimizations on heavily-run operations. Such runtime analysis would be similar to what Sun's HotSpot technology does for Java. The fact that Java, unlike Python, has type-declarations is actually less significant than many people assume (but prior work in optimization of Self, Smalltalk, Lisp, and Scheme make this point also).
Although I suspect it will never actually happen, it would be exciting to have Psyco-type technology integrated into some future version of Python itself. A few lines for imports and bindings is not much to do, but letting Python just inherently run much faster would be even more seamless. We shall see.
- Read the previous installments of Charming Python.
- Find more information at Psyco's home page and project page at SourceForge.
- The Simplified Wrapper and Interface Generator (SWIG) is a very widely -- perhaps predominantly -- used tool for writing C/C++ modules for Python and other "scripting" languages.
- Greg Ewing has created the language Pyrex, which is used for writing Python extension modules. The idea behind Pyrex is to define a language that looks very close to Python itself, and that allows a mixture of Python and C datatypes to be combined, but which is ultimately transformed and compiled into a Python C-extension.
- David co-authored with Andrew Blais An
Introduction to neural networks (developerWorks, July 2001). In that
article, they provided some code based on Neil Schemenauer's Python module
bpnn. The current article utilizes that neural network code to demonstrate Psyco's capabilities.
- Find more Linux articles in the developerWorks Linux zone.