Pyrex is a language specially designed for writing Python extension modules. According to the Pyrex Web site, "It's designed to bridge the gap between the nice, high-level, easy-to-use world of Python and the messy, low-level world of C." Almost any piece of Python code is also valid Pyrex code, but you can add optional static type declarations to Pyrex code, making the declared objects run at C speed.
In some sense, Pyrex is just part of a growing family of Python-like languages: Jython, IronPython, Prothon, Boo, Vyper (now-defunct), Stackless Python (in a way), or the Parrot runtime (in another way). In language terms, Pyrex is essentially just Python with type declarations added in. The few other variations on the language do not amount to that much (although the extension to the for loop does have an elegance to it).
The reason you actually want to use Pyrex, however, is to write modules that run faster -- maybe a lot faster -- than pure Python can possibly run.
Internally, Pyrex generates a C program out of a Pyrex one. The intermediate module.c file remains available for hand tweaking, in the unlikely event you need to do that. For "normal" Pyrex users, however, there is no reason to modify the generated C module. Pyrex itself gives you access to all the C-level constructs that are most important for speed, while saving you from all the C gotchas of memory allocation and deallocation, pointer arithmetic, prototypes, and so on. Pyrex also seamlessly handles all the interfacing with Python-level objects; mostly it does this by declaring variables as PyObject structures where necessary and using Python C-API calls for memory handling and type conversions.
For the most part, Pyrex runs faster than Python by avoiding the need to continuously box and unbox variables of simple data types. An int in Python, for example, is an object with a bunch of methods. It has a tree of ancestors, itself having a computed "method resolution order" (MRO). It has allocators and dellocators for memory handling. And it knows when to promote itself to a long, and how to enter into numeric operations with values of other types. All those extra capabilities mean many levels of indirection and many more conditional checks when you do things with int objects. On the other hand, a C or Pyrex int variable is just a region in memory with some bits set to ones or zeros. Doing something with a C/Pyrex int does not involve any indirection or conditional checks. A CPU "add" operation is just performed in silicon.
In well-chosen cases, a Pyrex module can run 40-50 times faster than a Python version of the same module. But in contrast to writing the module in C, per se, the Pyrex version will hardly be any longer than the Python version, and the code will look much more like Python than like C.
Of course, when you start talking about making Python(-like) modules fast, Pyrex is not the only tool there is. Psyco also lives at the back of every Python developer's mind. Psyco -- to keep it very short -- is a just-in-time (JIT) compiler of Python code into (x86) machine code. Unlike Pyrex, Psyco does not exactly type variables but rather creates several speculative machine code versions of each Python block based on each hypothesis about what the data types might be. If the data turns out to be of a simple type such as int for the entire run of a given block, that block (especially if it loops) can run very quickly. So, for example, x can be an int within a loop that runs a million times but still be assigned a float value at the end of the loop. Psyco happily speeds up the million iterations by using essentially the same (speculative) unboxing that you can specify explicitly with Pyrex.
Although Pyrex is not difficult either, Psyco is childishly simple to use. Using Psyco amounts to nothing more than putting a few lines at the end of your module; in fact, if you use the right lines, the module will run identically (except more slowly), even when Psyco is not available:
Listing 1. Using Psyco only if available
# Import Psyco if available
try:
import psyco
psyco.full()
except ImportError:
pass
|
To use Pyrex, you do need to change a bit more in your code (but only just a bit), and you also need to have a C compiler available and properly configured on the system on which you generate the Pyrex module. You can distribute binary Pyrex modules, but for your module to work elsewhere, you must match the Python version, architecture, and optimization flags that the end user needs.
I recently created a pure-Python implementation of hashcash for the developerWorks article Beat spam using hashcash, but basically, hashcash is a technique for proving CPU work using SHA-1 challenges. Python has a standard module sha, which makes coding hashcash relatively simple.
Unlike 95 percent of the Python programs I write, the slow speed of my hashcash module bothers me, at least a little bit. By design, the protocol is meant to eat CPU cycles, so runtime efficiency really does matter. The ANSI C binary of hashcash.c runs about 10 times as quickly as my hashcash.py script. Moreover, the brilliantly optimized PPC/Altivec-enabled binary of hashcash.c runs another four times as fast as the generic ANSI C version (a G4/Altivec at 1 Ghz easy outpaces a Pentium4™/MMX at 3 Ghz in hashcash/SHA speed; a G5 is that much faster still). So testing on my TiPowerbook shows my module to be an embarrassing 40 times slower than the optimized C version (although the gap is far less on x86).
Because the module runs slowly, maybe Pyrex would be a good candidate for speeding it up. At least that was my thought. The first thing I did in "Pyrex-izing" hashcash.py (after installing Pyrex, of course) was to simply copy it to hashcash_pyx.pyx and try processing it like this:
$ pyrexc hashcash_pyx.pyx |
Running this command happily generates a file hashcash.c (once a few minor changes are made to the source file). Unfortunately, getting the gcc switches just right for my platform was a bit trickier, so I decided to take the recommended shortcut of letting distutils figure it out for me. A standard Python installation knows how to work with the local C compiler during module installations, and using distutils makes sharing a Pyrex module easier. I created a setup_hashcash.py script as follows:
Listing 2. The setup_hashcash.py script
from distutils.core import setup
from distutils.extension import Extension
from Pyrex.Distutils import build_ext
setup(
name = "hashcash_pyx",
ext_modules=[
Extension("hashcash_pyx", ["hashcash_pyx.pyx"], libraries = [])
],
cmdclass = {'build_ext': build_ext}
)
|
Running the following line fully builds a C-based extension module, hashcash:
$ python2.3 prime_setup.py build_ext --inplace |
I slightly overstated the ease I experienced in generating a C-based module out of hashcash.pyx. Actually, I had to make two changes to the source code; I found the locations by looking at where pyrexc complained. I used one unsupported list comprehension in my code, which I had to unroll into a regular for loop. Simple enough. I also had to change one augmented assignment from counter+=1 to counter=counter+1.
That's it. That was my first Pyrex module.
To easily test the speed of the incrementally improving modules I planned to develop, I wrote a small test harness to run different versions of the module:
Listing 3. Test harness, hashcash_test.py
#!/usr/bin/env python2.3
import time, sys, optparse
hashcash = __import__(sys.argv[1])
start = time.time()
print hashcash.mint('mertz@gnosis.cx', bits=20)
timer = time.time()-start
sys.stderr.write("%0.4f seconds (%d hashes per second)\n" %
(timer, hashcash.tries[0]/timer))
|
Excitedly, I decided to see just how much speed improvement I got just by compiling via Pyrex. Note that in all the samples below, the actual time varies widely and randomly. The figure to look at is the "hashes per second," which pretty reliably measures speed. So comparing native Python with Pyrex:
Listing 4. Native Python versus "barely Pyrex"
$ ./hashcash_test.py hashcash 1:20:041003:mertz@gnosis.cx::I+lyNUpV:167dca 13.7879 seconds (106904 hashes per second) $ ./hashcash_test.py hashcash_pyx > /dev/null 6.0695 seconds (89239 hashes per second) |
Oops! It went almost 20 percent slower by using Pyrex. Not really what I was hoping for. It's time to start analyzing the code for speed-up possibilities. Here's the short function that takes substantially all the time:
Listing 5. Worker function in hashcash.py
def _mint(challenge, bits):
"Answer a 'generalized hashcash' challenge'"
counter = 0
hex_digits = int(ceil(bits/4.))
zeros = '0'*hex_digits
hash = sha
while 1:
digest = hash(challenge+hex(counter)[2:]).hexdigest()
if digest[:hex_digits] == zeros:
tries[0] = counter
return hex(counter)[2:]
counter += 1
|
I need to take advantage of Pyrex variable declarations to get a speedup. Some variables are obviously integers, and others are obviously strings -- I'll specify that. And while I'm at it, I'll use Pyrex's enhanced for loop:
Listing 6. Minimally Pyrex-enhanced minter
cdef _mint(challenge, int bits):
# Answer a 'generalized hashcash' challenge'"
cdef int counter, hex_digits, i
cdef char *digest
hex_digits = int(ceil(bits/4.))
hash = sha
for counter from 0 <= counter < sys.maxint:
py_digest = hash(challenge+hex(counter)[2:]).hexdigest()
digest = py_digest
for i from 0 <= i < hex_digits:
if digest[i] != c'0': break
else:
tries[0] = counter
return hex(counter)[2:]
|
Pretty easy so far. I have just declared some variable types that I clearly know, and I used the cleanest Pyrex counter loop. A little trick is assigning py_digest (a Python string) to digest (a C/Pyrex string) in order to type it. Experimentally, I also found that a looping string comparison is faster than taking a slice. How much does all of this help?
Listing 7. Pyrex-ized speed results for minting
$ ./hashcash_test.py hashcash_pyx2 >/dev/null 20.3749 seconds (116636 hashes per second) |
This is better. I've managed a slight improvement from native Python, and a pretty good improvement over my initial Pyrex module. Still nothing very impressive though -- a few percent gain.
Something seems wrong here. Gaining a few percent in speed differs from gaining 40 times like the Pyrex home page -- and many Pyrex users -- boast about. It's time to see where my Python _mint() function is actually spending its time. A quick script (not shown) breaks out just what is going on in the complex compound operation sha(challenge+hex(counter)[2:]).hexdigest():
Listing 8. Timing aspects of hashcash minting
1000000 empty loops: 0.559 ------------------------------ 1000000 sha()s: 2.332 1000000 hex()[2:]s: 3.151 just hex()s: <2.471> 1000000 concatenations: 0.855 1000000 hexdigest()s: 3.742 ------------------------------ Total: 10.079 |
Clearly, I cannot eliminate the loop itself from the _mint() function. Pyrex's enhanced for probably speeds it up slightly, but the whole function is mainly the loop. And I cannot get rid of the call to sha() -- at least not unless I am willing to reimplement SHA-1 in Pyrex (I am far from confident that I could do better than the writers of the Python standard sha module even if I did this). Moreover, if I hope to get an actual hash out of the sha.SHA object, I have to call .hexdigest() or .digest(); the former is slightly faster, too.
All that is really left to address is the hex() conversion on the counter variable, and perhaps the slice taken from the result. I might be able to squeeze a little bit out of concatenating Pyrex/C strings rather than Python string objects, too. The only way I see to avoid the hex() conversion, however, is to manually build a suffix out of nested loops. Doing this can avoid any int->char conversion, but also makes for more code:
Listing 9. Fully Pyrex-optimized minter
cdef _mint(char *challenge, int bits):
cdef int hex_digits, i0, i1, i2, i3, i4, i5
cdef char *ab, *digest, *trial, *suffix
suffix = '******'
ab = alphabet
hex_digits = int(ceil(bits/4.))
hash = sha
for i0 from 0 <= i0 < 55:
suffix[0] = ab[i0]
for i1 from 0 <= i1 < 55:
suffix[1] = ab[i1]
for i2 from 0 <= i2 < 55:
suffix[2] = ab[i2]
for i3 from 0 <= i3 < 55:
suffix[3] = ab[i3]
for i4 from 0 <= i4 < 55:
suffix[4] = ab[i4]
for i5 from 0 <= i5 < 55:
suffix[5] = ab[i5]
py_digest = hash(challenge+suffix).hexdigest()
digest = py_digest
for i from 0 <= i < hex_digits:
if digest[i] != c'0': break
else:
return suffix
|
This Pyrex function still looks quite a bit easier to read than the corresponding C function would, but it is certainly more complicated than was my naively elegant pure-Python version. By the way, unrolling the suffix generation in pure Python has a slightly negative effect on overall speed versus the initial version. In Pyrex, as you would expect, these nested loops are pretty much free, and I save the cost of conversion and slicing:
Listing 10. Optimizing Pyrex-ized speed results for minting
$ ./hashcash_test.py hashcash_pyx3 >/dev/null 13.2270 seconds (166125 hashes per second) |
Better than I started with, certainly. But still well under a doubling of speed. The problem is that most of the time was -- and is -- spent in calls to the Python library, and nothing I might code around those calls prevents or speeds them up.
Getting a speedup of 50 to 60% still seems worthwhile. And I have not done that much coding to get it. But if you think about adding the two statements import psyco;psyco.bind(_mint) to the initial Python version, this speedup does not seem so impressive:
Listing 11. Psyco-ized speed results for minting
$ ./hashcash_test.py hashcash_psyco >/dev/null 15.2300 seconds (157550 hashes per second) |
In other words, Psyco does almost as much with just two generic lines of code. Of course, Psyco only works on x86 platforms, whereas Pyrex will work anywhere that has a C compiler. But for the particular example at issue, os.popen('hashcash -m '+options) will still be many times faster than either Pyrex or Psyco will get you (assuming the C utility hashcash is available, of course).
In the best case, Pyrex can indeed produce quite fast code. For example, the Pyrex home page prominently features a numeric-intensive function for calculating a list of initial prime numbers. All operations involved are performed on integers, so type declarations can speed up the algorithm quite substantially. This Pyrex function just barely differs from pure Python -- just a few declarations added:
Listing 12. Pyrex function for finding primes
def primes(int kmax):
# Calculate initial prime numbers
cdef int n, k, i
cdef int p[100000]
result = []
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] <> 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
return result
|
I also wrote the same function as actual Python, basically just by taking the declarations back out. Running the Pyrex and Python versions, and also a Psyco-ized Python version gives these times:
Listing 13. Times for finding primes in Python, Psyco, and Pyrex
$ ./prime_test.py Pure Python version First 10000 primes 60.30 seconds Psyco-ized Python First 10000 primes 7.62 seconds Pyrex version First 10000 primes 1.31 seconds |
So, in the best case, Pyrex does a lot better than Python, and still quite significantly better than the Psyco speedup. I have a hunch, however, that I might be able to improve Psyco's speed by fiddling with some algorithm details. Even so, Pyrex almost certainly represents the best you can do for this type of problem. The generated C code looks almost exactly the same as what you'd write if you simply started with C.
There are a few things that Pyrex does quite well. For code that works with simple numeric or character data in tight loops, Pyrex can produce significant speedups, maybe 50 times the speed in best cases. If a Python application encounters a significant bottleneck in numeric functions, creating Pyrex versions of those functions is very sensible. But the cases where you find significant gains are relatively constrained. Code that spends most of its time making library calls is just not going to benefit that much from Pyrex speeding up incidental loop overhead. Moreover, in many cases, a generic two lines enabling Psyco can produce an improvement similar to what you get though a moderate degree of rewriting from Python into Pyrex. Pyrex code is easy to write, but you have to write it, unlike with Psyco.
I will note that the efforts with hashcash in this article are not the best you might do. I am confident that (with much more work) it would be possible to modify the Python sha module a bit to enable direct calls to the C-level interface, thereby avoiding the Python-level calling overheads. It might also be possible to find some other optimized SHA-1 implementation in C. Pyrex code is perfectly able to utilize external C code, and calling a sha() function written in C will be faster than boxing and unboxing it in Python objects and namespaces. But then, it is not clear why this is worthwhile, given a quite good existing C implementation of hashcash.
Another option to think about, however, in writing specialized numeric functions using Pyrex is whether Numerical Python might be a suitable tool for your work. The numeric package is fairly general, and quite fast for what it does. Using numeric does not involve any non-Python code for its end user, just calls to appropriate library functions. The coverage of numeric is certainly not identical to those functions that can benefit from Pyrex, but there is certainly some overlap.
-
Visit the Pyrex Web site for manuals and a tutorial, as well as the module itself.
-
Get more info on Psyco in David's Charming Python installment Make Python run as fast as C with Psyco (developerWorks, October 2002).
-
Learn more about David's pure-Python implementation of hashcash in Beat spam using hashcash (developerWorks, November 2004).
-
Download the Python module that David wrote,
hashcash.py. -
David's Charming Python installment on Numerical Python (developerWorks, October 2003) covers
NumarrayandNumeric. -
For more on Python, read the author's Charming Python columns on developerWorks.
-
Find more resources for Linux™ developers in the developerWorks Linux zone.
-
Get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
You can download evaluation versions of the products at no charge, or select the Linux® or Windows® version of developerWorks' Software Evaluation Kit.
- See the latest development techniques and products in action at the complimentary
IBM developerWorks Live! technical briefings. If you're new to Linux, take a look at the half-day technical briefing on Migrating and developing new applications for Linux.
- Further build your development skills with On demand demos and On demand Webcasts.
-
Join the developerWorks community by participating in
developerWorks blogs.
- Browse for books on these and other technical topics.
David Mertz has a slow brain, and most of his programs still run slowly. For more on his life, see his personal Web page. He's been writing the developerWorks columns Charming Python and XML Matters since 2000. Check out his book Text Processing in Python.