Web development tips: Use antipool.py for threaded Python database access

Sharing and preserving database connections

Databases are happy to handle many parallel requests (doing so is almost in the definition of a database). However, active processes (threaded or forked) almost inevitably eat up a valuable resource: database connections. The antipool.py module in Martin Blais' antiorm library does a nice job of making the pooling and reuse of connections transparent to programmers, and in a RDBMS-agnostic fashion.

David Mertz, Ph.D (mertz@gnosis.cx), Author, Gnosis Software, Inc.

Photo of David MertzDavid Mertz is a big fan of concurrency, but is frustrated by contention. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/. And buy his book at http://gnosis.cx/TPiP/.



21 August 2007

Also available in Russian Japanese

Introduction

Long-running Web server applications frequently utilize back-end RDBMSs for a wide variety of purposes, perhaps especially in transactional applications. In fact, I believe that more Web services should use databases than actually do, as developers often feel an unnecessary "burden" is involved in configuring an RDBMS (but that's a topic for a different article). For all their numerous advantages, database access has at least one bottleneck: connections.

Connections can create a bottleneck in two different ways. Straightforwardly, they require a little bit of bandwidth to negotiate over the wire, and a modicum of CPU and memory resources on the RDBMS machine to allocate each one. Those resources are minimal, but not negligible. More importantly, however, an RDBMS provides a finite number of connections. For a situation where the main client of an RDBMS is a Web application, it does not make much practical difference whether that limit is per-client or total, since one database client (the Web server) creates the bulk of activity.

In this tip, I use the excellent psycopg2 adaptor, but any Python database using DBAPI will work essentially identically. The one caveat here is that object-relational mappings (ORMs), to my mind, hide far too much of the details of what is actually happening with database access, impose artificial restrictions, and generally get in the way far more than they ease programming.

Without pools

In the ideal case (without connection pooling), every thread or process in which a Web server application connects to a database will quickly acquire a connection, create a cursor, then perform some reasonably speedy operations, commit or roll back the operations, then close the connection. Easy enough if everything goes well. For example:

Listing 1. Web server writing to an RDBMS
def AddData(foo, bar):
    "This function is generally called in its own thread"
    from psycopg import connect
    conn = connect("dbname=transact user=web host=server.mine")
    cur = conn.cursor()
    cur.execute("INSERT INTO userdata VALUES (%r, %r)" % (foo, bar))
    conn.commit()
    conn.close()

This example is not really the best code possible, since I might wrap a try/'except' around the connect() call, and perhaps a try/finally to assure conn.close() gets called. I also do not use DBAPI value escaping, as I might. Nonetheless, AddData() has the "shape" of a typical database call.

What goes wrong (and how to fix it)

A few things tend to go wrong in code like the above. One problem is that when many threads are launched at once some of them, in practice, will fail to obtain a connection, even if the database limit is not reached. In theory, this problem should not occur, but in reality it does. The other problem, however, is that if the database pool limit is reached—usually because threads take longer to complete than you expect or raise exceptions before closing their connection—the failure to obtain a connection is far more opaque than you would like.

By creating a local thread pool, both problems can be addressed. A connection pool is simply a collection of preallocated database connections that can be reused and released by each "virtual connection" thread. That is, the real database connections are closed relatively infrequently, but their pooled proxies are acquired and released with every transaction.

On the one hand, pulling preallocated connections from a local pool is an in-memory operation that is very unlikely to fail (other than by exhausting connections), since no network or database latency issues come up. On the other hand, using a pool lets you accurately and locally manage your connection limit. If you rely on the database to limit connections, you have no direct means of determining at an application level how many connections are used and available; the number of open threads might act as a crude approximation, but that is a fragile approach. The antipool.py module will increase and decrease its pooled (real) connections in the background to avoid holding real connections longer than necessary. To a user, however, a pooled connection looks the same as a real database connection, except availability of connections in the pool is known locally, and the maximum is a parameter of the connection pool rather than of the database.

Even using antipool.py, I find it convenient to thinly wrap the pooling further, rather than use all the options of ConnectionPool all the time. Moreover, for the most common case where a real connection limit is reached, a simple sleep() will usually suffice to eventually acquire a new one. If a lot of threads are genuinely ill-behaved (never closing), you will still have a problem, but it becomes easier to debug at least. Here is a wrapper I use:

Listing 2. webapp_pool.py
"""USAGE:
from webapp_pool import get_connection
conn, cur = get_connection()   # Might hang, but never raises
cur.execute(SQL)
conn.commit()
conn.release()    # 'conn.release()' not 'conn.close()'
"""
import psycopg2
from time import sleep
from antipool import ConnectionPool
from database import host, database, user, MAXCONNECTIONS
conn_pool = ConnectionPool(psycopg2,
                           host=host,
                           database=database,
                           user=user,
                           options={'maxconn':MAXCONNECTIONS})
def get_connection():
    got_connection = False
    while not got_connection:
        try:
            conn = conn_pool.connection()
            cur = conn.cursor()
            got_connection = True
        except psycopg2.OperationalError, mess:
            # Might log exception here
            sleep(1)
        except AttributeError, mess:
            # Might log exception here
            sleep(1)
    return conn, cur

Conclusion

In this tip, I have used the database-agnostic antipool.py module. I find antipool.py to be very flexible; it has additional well-thought-out features not discussed in this short tip. However, if you were to use the pooling capabilities built into psycopg2 or other tools, the same general principles for use of connection pooling remain the same.

Resources

Learn

  • Find Martin Blais' overall antiorm package.
  • IBM's DB2 RDBMS also has a Python adaptor, called (as you might expect) PyDB2. That module does not include connection pooling, but should work perfectly well with antipool.py.
  • If you wish to work with IBM's leading-edge DB2 RDBMS, be sure to check out the freely downloadable beta of DB2/Viper.
  • To obtain psycopg2, use its Trac home page. Be sure to use the 2.x series rather than the deprecated 1.x.
  • See all of developerWorks' featured trial downloads.
  • Subscribe to the developerWorks Web development newsletter.
  • Get your hands on more howto articles from the Web development zone's technical library.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=249412
ArticleTitle=Web development tips: Use antipool.py for threaded Python database access
publish-date=08212007