Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Charming Python: Pyrex extends and speeds Python apps

For the right kinds of functions, Pyrex pays huge dividends

David Mertz (mertz@gnosis.cx), Developer, Gnosis Software, Inc.
David Mertz has a slow brain, and most of his programs still run slowly. For more on his life, see his personal Web page. He's been writing the developerWorks columns Charming Python and XML Matters since 2000. Check out his book Text Processing in Python.

Summary:  The author takes a stab at speeding up his pure-Python version of hashcash using Pyrex, a language for writing Python extension modules that lets you avoid having to use C for the job. He contrasts writing code in Pyrex -- generally for use with larger Python applications -- with speeding up Python applications using the Psyco compiler, which he has written about previously on developerWorks.

Date:  25 Jan 2005
Level:  Introductory

Comments:  

Pyrex is a language specially designed for writing Python extension modules. According to the Pyrex Web site, "It's designed to bridge the gap between the nice, high-level, easy-to-use world of Python and the messy, low-level world of C." Almost any piece of Python code is also valid Pyrex code, but you can add optional static type declarations to Pyrex code, making the declared objects run at C speed.

Making Python fast

In some sense, Pyrex is just part of a growing family of Python-like languages: Jython, IronPython, Prothon, Boo, Vyper (now-defunct), Stackless Python (in a way), or the Parrot runtime (in another way). In language terms, Pyrex is essentially just Python with type declarations added in. The few other variations on the language do not amount to that much (although the extension to the for loop does have an elegance to it).

The reason you actually want to use Pyrex, however, is to write modules that run faster -- maybe a lot faster -- than pure Python can possibly run.

Internally, Pyrex generates a C program out of a Pyrex one. The intermediate module.c file remains available for hand tweaking, in the unlikely event you need to do that. For "normal" Pyrex users, however, there is no reason to modify the generated C module. Pyrex itself gives you access to all the C-level constructs that are most important for speed, while saving you from all the C gotchas of memory allocation and deallocation, pointer arithmetic, prototypes, and so on. Pyrex also seamlessly handles all the interfacing with Python-level objects; mostly it does this by declaring variables as PyObject structures where necessary and using Python C-API calls for memory handling and type conversions.

For the most part, Pyrex runs faster than Python by avoiding the need to continuously box and unbox variables of simple data types. An int in Python, for example, is an object with a bunch of methods. It has a tree of ancestors, itself having a computed "method resolution order" (MRO). It has allocators and dellocators for memory handling. And it knows when to promote itself to a long, and how to enter into numeric operations with values of other types. All those extra capabilities mean many levels of indirection and many more conditional checks when you do things with int objects. On the other hand, a C or Pyrex int variable is just a region in memory with some bits set to ones or zeros. Doing something with a C/Pyrex int does not involve any indirection or conditional checks. A CPU "add" operation is just performed in silicon.

In well-chosen cases, a Pyrex module can run 40-50 times faster than a Python version of the same module. But in contrast to writing the module in C, per se, the Pyrex version will hardly be any longer than the Python version, and the code will look much more like Python than like C.

Of course, when you start talking about making Python(-like) modules fast, Pyrex is not the only tool there is. Psyco also lives at the back of every Python developer's mind. Psyco -- to keep it very short -- is a just-in-time (JIT) compiler of Python code into (x86) machine code. Unlike Pyrex, Psyco does not exactly type variables but rather creates several speculative machine code versions of each Python block based on each hypothesis about what the data types might be. If the data turns out to be of a simple type such as int for the entire run of a given block, that block (especially if it loops) can run very quickly. So, for example, x can be an int within a loop that runs a million times but still be assigned a float value at the end of the loop. Psyco happily speeds up the million iterations by using essentially the same (speculative) unboxing that you can specify explicitly with Pyrex.

Although Pyrex is not difficult either, Psyco is childishly simple to use. Using Psyco amounts to nothing more than putting a few lines at the end of your module; in fact, if you use the right lines, the module will run identically (except more slowly), even when Psyco is not available:


Listing 1. Using Psyco only if available

# Import Psyco if available
try:
    import psyco
    psyco.full()
except ImportError:
    pass

To use Pyrex, you do need to change a bit more in your code (but only just a bit), and you also need to have a C compiler available and properly configured on the system on which you generate the Pyrex module. You can distribute binary Pyrex modules, but for your module to work elsewhere, you must match the Python version, architecture, and optimization flags that the end user needs.


A (naive) first try for speed

I recently created a pure-Python implementation of hashcash for the developerWorks article Beat spam using hashcash, but basically, hashcash is a technique for proving CPU work using SHA-1 challenges. Python has a standard module sha, which makes coding hashcash relatively simple.

Unlike 95 percent of the Python programs I write, the slow speed of my hashcash module bothers me, at least a little bit. By design, the protocol is meant to eat CPU cycles, so runtime efficiency really does matter. The ANSI C binary of hashcash.c runs about 10 times as quickly as my hashcash.py script. Moreover, the brilliantly optimized PPC/Altivec-enabled binary of hashcash.c runs another four times as fast as the generic ANSI C version (a G4/Altivec at 1 Ghz easy outpaces a Pentium4™/MMX at 3 Ghz in hashcash/SHA speed; a G5 is that much faster still). So testing on my TiPowerbook shows my module to be an embarrassing 40 times slower than the optimized C version (although the gap is far less on x86).

Because the module runs slowly, maybe Pyrex would be a good candidate for speeding it up. At least that was my thought. The first thing I did in "Pyrex-izing" hashcash.py (after installing Pyrex, of course) was to simply copy it to hashcash_pyx.pyx and try processing it like this:

$ pyrexc hashcash_pyx.pyx

Creating a binary module

Running this command happily generates a file hashcash.c (once a few minor changes are made to the source file). Unfortunately, getting the gcc switches just right for my platform was a bit trickier, so I decided to take the recommended shortcut of letting distutils figure it out for me. A standard Python installation knows how to work with the local C compiler during module installations, and using distutils makes sharing a Pyrex module easier. I created a setup_hashcash.py script as follows:


Listing 2. The setup_hashcash.py script

from distutils.core import setup
from distutils.extension import Extension
from Pyrex.Distutils import build_ext
setup(
  name = "hashcash_pyx",
  ext_modules=[
    Extension("hashcash_pyx", ["hashcash_pyx.pyx"], libraries = [])
    ],
  cmdclass = {'build_ext': build_ext}
)

Running the following line fully builds a C-based extension module, hashcash:

$ python2.3 prime_setup.py build_ext --inplace

Code changes

I slightly overstated the ease I experienced in generating a C-based module out of hashcash.pyx. Actually, I had to make two changes to the source code; I found the locations by looking at where pyrexc complained. I used one unsupported list comprehension in my code, which I had to unroll into a regular for loop. Simple enough. I also had to change one augmented assignment from counter+=1 to counter=counter+1.

That's it. That was my first Pyrex module.


Testing speeds

To easily test the speed of the incrementally improving modules I planned to develop, I wrote a small test harness to run different versions of the module:


Listing 3. Test harness, hashcash_test.py

#!/usr/bin/env python2.3
import time, sys, optparse
hashcash = __import__(sys.argv[1])
start = time.time()
print hashcash.mint('mertz@gnosis.cx', bits=20)
timer = time.time()-start
sys.stderr.write("%0.4f seconds (%d hashes per second)\n" %
        (timer, hashcash.tries[0]/timer))

Excitedly, I decided to see just how much speed improvement I got just by compiling via Pyrex. Note that in all the samples below, the actual time varies widely and randomly. The figure to look at is the "hashes per second," which pretty reliably measures speed. So comparing native Python with Pyrex:


Listing 4. Native Python versus "barely Pyrex"

$ ./hashcash_test.py hashcash
1:20:041003:mertz@gnosis.cx::I+lyNUpV:167dca
13.7879 seconds (106904 hashes per second)

$ ./hashcash_test.py hashcash_pyx > /dev/null
6.0695 seconds (89239 hashes per second)

Oops! It went almost 20 percent slower by using Pyrex. Not really what I was hoping for. It's time to start analyzing the code for speed-up possibilities. Here's the short function that takes substantially all the time:


Listing 5. Worker function in hashcash.py

def _mint(challenge, bits):
    "Answer a 'generalized hashcash' challenge'"
    counter = 0
    hex_digits = int(ceil(bits/4.))
    zeros = '0'*hex_digits
    hash = sha
    while 1:
        digest = hash(challenge+hex(counter)[2:]).hexdigest()
        if digest[:hex_digits] == zeros:
            tries[0] = counter
            return hex(counter)[2:]
        counter += 1

I need to take advantage of Pyrex variable declarations to get a speedup. Some variables are obviously integers, and others are obviously strings -- I'll specify that. And while I'm at it, I'll use Pyrex's enhanced for loop:


Listing 6. Minimally Pyrex-enhanced minter

cdef _mint(challenge, int bits):
    # Answer a 'generalized hashcash' challenge'"
    cdef int counter, hex_digits, i
    cdef char *digest
    hex_digits = int(ceil(bits/4.))
    hash = sha
    for counter from 0 <= counter < sys.maxint:
        py_digest = hash(challenge+hex(counter)[2:]).hexdigest()
        digest = py_digest
        for i from 0 <= i < hex_digits:
            if digest[i] != c'0': break
        else:
            tries[0] = counter
            return hex(counter)[2:]

Pretty easy so far. I have just declared some variable types that I clearly know, and I used the cleanest Pyrex counter loop. A little trick is assigning py_digest (a Python string) to digest (a C/Pyrex string) in order to type it. Experimentally, I also found that a looping string comparison is faster than taking a slice. How much does all of this help?


Listing 7. Pyrex-ized speed results for minting

$ ./hashcash_test.py hashcash_pyx2 >/dev/null
20.3749 seconds (116636 hashes per second)

This is better. I've managed a slight improvement from native Python, and a pretty good improvement over my initial Pyrex module. Still nothing very impressive though -- a few percent gain.

Profiling

Something seems wrong here. Gaining a few percent in speed differs from gaining 40 times like the Pyrex home page -- and many Pyrex users -- boast about. It's time to see where my Python _mint() function is actually spending its time. A quick script (not shown) breaks out just what is going on in the complex compound operation sha(challenge+hex(counter)[2:]).hexdigest():


Listing 8. Timing aspects of hashcash minting

1000000 empty loops:     0.559
------------------------------
1000000 sha()s:          2.332
1000000 hex()[2:]s:      3.151
   just hex()s:         <2.471>
1000000 concatenations:  0.855
1000000 hexdigest()s:    3.742
------------------------------
Total:                  10.079

Clearly, I cannot eliminate the loop itself from the _mint() function. Pyrex's enhanced for probably speeds it up slightly, but the whole function is mainly the loop. And I cannot get rid of the call to sha() -- at least not unless I am willing to reimplement SHA-1 in Pyrex (I am far from confident that I could do better than the writers of the Python standard sha module even if I did this). Moreover, if I hope to get an actual hash out of the sha.SHA object, I have to call .hexdigest() or .digest(); the former is slightly faster, too.

All that is really left to address is the hex() conversion on the counter variable, and perhaps the slice taken from the result. I might be able to squeeze a little bit out of concatenating Pyrex/C strings rather than Python string objects, too. The only way I see to avoid the hex() conversion, however, is to manually build a suffix out of nested loops. Doing this can avoid any int->char conversion, but also makes for more code:


Listing 9. Fully Pyrex-optimized minter

cdef _mint(char *challenge, int bits):
    cdef int hex_digits, i0, i1, i2, i3, i4, i5
    cdef char *ab, *digest, *trial, *suffix
    suffix = '******'
    ab = alphabet
    hex_digits = int(ceil(bits/4.))
    hash = sha
    for i0 from 0 <= i0 < 55:
        suffix[0] = ab[i0]
        for i1 from 0 <= i1 < 55:
            suffix[1] = ab[i1]
            for i2 from 0 <= i2 < 55:
                suffix[2] = ab[i2]
                for i3 from 0 <= i3 < 55:
                    suffix[3] = ab[i3]
                    for i4 from 0 <= i4 < 55:
                        suffix[4] = ab[i4]
                        for i5 from 0 <= i5 < 55:
                            suffix[5] = ab[i5]
                            py_digest = hash(challenge+suffix).hexdigest()
                            digest = py_digest
                            for i from 0 <= i < hex_digits:
                                if digest[i] != c'0': break
                            else:
                                return suffix

This Pyrex function still looks quite a bit easier to read than the corresponding C function would, but it is certainly more complicated than was my naively elegant pure-Python version. By the way, unrolling the suffix generation in pure Python has a slightly negative effect on overall speed versus the initial version. In Pyrex, as you would expect, these nested loops are pretty much free, and I save the cost of conversion and slicing:


Listing 10. Optimizing Pyrex-ized speed results for minting

$ ./hashcash_test.py hashcash_pyx3 >/dev/null
13.2270 seconds (166125 hashes per second)

Better than I started with, certainly. But still well under a doubling of speed. The problem is that most of the time was -- and is -- spent in calls to the Python library, and nothing I might code around those calls prevents or speeds them up.

A disappointing comparison

Getting a speedup of 50 to 60% still seems worthwhile. And I have not done that much coding to get it. But if you think about adding the two statements import psyco;psyco.bind(_mint) to the initial Python version, this speedup does not seem so impressive:


Listing 11. Psyco-ized speed results for minting

$ ./hashcash_test.py hashcash_psyco >/dev/null
15.2300 seconds (157550 hashes per second)

In other words, Psyco does almost as much with just two generic lines of code. Of course, Psyco only works on x86 platforms, whereas Pyrex will work anywhere that has a C compiler. But for the particular example at issue, os.popen('hashcash -m '+options) will still be many times faster than either Pyrex or Psyco will get you (assuming the C utility hashcash is available, of course).


Where Pyrex shines

In the best case, Pyrex can indeed produce quite fast code. For example, the Pyrex home page prominently features a numeric-intensive function for calculating a list of initial prime numbers. All operations involved are performed on integers, so type declarations can speed up the algorithm quite substantially. This Pyrex function just barely differs from pure Python -- just a few declarations added:


Listing 12. Pyrex function for finding primes

def primes(int kmax):
    # Calculate initial prime numbers
    cdef int n, k, i
    cdef int p[100000]
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] <> 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

I also wrote the same function as actual Python, basically just by taking the declarations back out. Running the Pyrex and Python versions, and also a Psyco-ized Python version gives these times:


Listing 13. Times for finding primes in Python, Psyco, and Pyrex

$ ./prime_test.py
Pure Python version
First 10000 primes
60.30 seconds

Psyco-ized Python
First 10000 primes
7.62 seconds

Pyrex version
First 10000 primes
1.31 seconds

So, in the best case, Pyrex does a lot better than Python, and still quite significantly better than the Psyco speedup. I have a hunch, however, that I might be able to improve Psyco's speed by fiddling with some algorithm details. Even so, Pyrex almost certainly represents the best you can do for this type of problem. The generated C code looks almost exactly the same as what you'd write if you simply started with C.


Conclusion

There are a few things that Pyrex does quite well. For code that works with simple numeric or character data in tight loops, Pyrex can produce significant speedups, maybe 50 times the speed in best cases. If a Python application encounters a significant bottleneck in numeric functions, creating Pyrex versions of those functions is very sensible. But the cases where you find significant gains are relatively constrained. Code that spends most of its time making library calls is just not going to benefit that much from Pyrex speeding up incidental loop overhead. Moreover, in many cases, a generic two lines enabling Psyco can produce an improvement similar to what you get though a moderate degree of rewriting from Python into Pyrex. Pyrex code is easy to write, but you have to write it, unlike with Psyco.

I will note that the efforts with hashcash in this article are not the best you might do. I am confident that (with much more work) it would be possible to modify the Python sha module a bit to enable direct calls to the C-level interface, thereby avoiding the Python-level calling overheads. It might also be possible to find some other optimized SHA-1 implementation in C. Pyrex code is perfectly able to utilize external C code, and calling a sha() function written in C will be faster than boxing and unboxing it in Python objects and namespaces. But then, it is not clear why this is worthwhile, given a quite good existing C implementation of hashcash.

Another option to think about, however, in writing specialized numeric functions using Pyrex is whether Numerical Python might be a suitable tool for your work. The numeric package is fairly general, and quite fast for what it does. Using numeric does not involve any non-Python code for its end user, just calls to appropriate library functions. The coverage of numeric is certainly not identical to those functions that can benefit from Pyrex, but there is certainly some overlap.


Resources

About the author

David Mertz has a slow brain, and most of his programs still run slowly. For more on his life, see his personal Web page. He's been writing the developerWorks columns Charming Python and XML Matters since 2000. Check out his book Text Processing in Python.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=47190
ArticleTitle=Charming Python: Pyrex extends and speeds Python apps
publish-date=01252005
author1-email=mertz@gnosis.cx
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).