Skip to main content

Web development tips: Use antipool.py for threaded Python database access

Sharing and preserving database connections

David Mertz, Ph.D (mertz@gnosis.cx), Author, Gnosis Software, Inc.
Photo of David Mertz
David Mertz is a big fan of concurrency, but is frustrated by contention. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/. And buy his book at http://gnosis.cx/TPiP/.

Summary:  Databases are happy to handle many parallel requests (doing so is almost in the definition of a database). However, active processes (threaded or forked) almost inevitably eat up a valuable resource: database connections. The antipool.py module in Martin Blais' antiorm library does a nice job of making the pooling and reuse of connections transparent to programmers, and in a RDBMS-agnostic fashion.

Date:  21 Aug 2007
Level:  Introductory
Activity:  1662 views

Introduction

Long-running Web server applications frequently utilize back-end RDBMSs for a wide variety of purposes, perhaps especially in transactional applications. In fact, I believe that more Web services should use databases than actually do, as developers often feel an unnecessary "burden" is involved in configuring an RDBMS (but that's a topic for a different article). For all their numerous advantages, database access has at least one bottleneck: connections.

Connections can create a bottleneck in two different ways. Straightforwardly, they require a little bit of bandwidth to negotiate over the wire, and a modicum of CPU and memory resources on the RDBMS machine to allocate each one. Those resources are minimal, but not negligible. More importantly, however, an RDBMS provides a finite number of connections. For a situation where the main client of an RDBMS is a Web application, it does not make much practical difference whether that limit is per-client or total, since one database client (the Web server) creates the bulk of activity.

In this tip, I use the excellent psycopg2 adaptor, but any Python database using DBAPI will work essentially identically. The one caveat here is that object-relational mappings (ORMs), to my mind, hide far too much of the details of what is actually happening with database access, impose artificial restrictions, and generally get in the way far more than they ease programming.

Without pools

In the ideal case (without connection pooling), every thread or process in which a Web server application connects to a database will quickly acquire a connection, create a cursor, then perform some reasonably speedy operations, commit or roll back the operations, then close the connection. Easy enough if everything goes well. For example:


Listing 1. Web server writing to an RDBMS
def AddData(foo, bar):
    "This function is generally called in its own thread"
    from psycopg import connect
    conn = connect("dbname=transact user=web host=server.mine")
    cur = conn.cursor()
    cur.execute("INSERT INTO userdata VALUES (%r, %r)" % (foo, bar))
    conn.commit()
    conn.close()

This example is not really the best code possible, since I might wrap a try/'except' around the connect() call, and perhaps a try/finally to assure conn.close() gets called. I also do not use DBAPI value escaping, as I might. Nonetheless, AddData() has the "shape" of a typical database call.

What goes wrong (and how to fix it)

A few things tend to go wrong in code like the above. One problem is that when many threads are launched at once some of them, in practice, will fail to obtain a connection, even if the database limit is not reached. In theory, this problem should not occur, but in reality it does. The other problem, however, is that if the database pool limit is reached—usually because threads take longer to complete than you expect or raise exceptions before closing their connection—the failure to obtain a connection is far more opaque than you would like.

By creating a local thread pool, both problems can be addressed. A connection pool is simply a collection of preallocated database connections that can be reused and released by each "virtual connection" thread. That is, the real database connections are closed relatively infrequently, but their pooled proxies are acquired and released with every transaction.

On the one hand, pulling preallocated connections from a local pool is an in-memory operation that is very unlikely to fail (other than by exhausting connections), since no network or database latency issues come up. On the other hand, using a pool lets you accurately and locally manage your connection limit. If you rely on the database to limit connections, you have no direct means of determining at an application level how many connections are used and available; the number of open threads might act as a crude approximation, but that is a fragile approach. The antipool.py module will increase and decrease its pooled (real) connections in the background to avoid holding real connections longer than necessary. To a user, however, a pooled connection looks the same as a real database connection, except availability of connections in the pool is known locally, and the maximum is a parameter of the connection pool rather than of the database.

Even using antipool.py, I find it convenient to thinly wrap the pooling further, rather than use all the options of ConnectionPool all the time. Moreover, for the most common case where a real connection limit is reached, a simple sleep() will usually suffice to eventually acquire a new one. If a lot of threads are genuinely ill-behaved (never closing), you will still have a problem, but it becomes easier to debug at least. Here is a wrapper I use:


Listing 2. webapp_pool.py
"""USAGE:
from webapp_pool import get_connection
conn, cur = get_connection()   # Might hang, but never raises
cur.execute(SQL)
conn.commit()
conn.release()    # 'conn.release()' not 'conn.close()'
"""
import psycopg2
from time import sleep
from antipool import ConnectionPool
from database import host, database, user, MAXCONNECTIONS
conn_pool = ConnectionPool(psycopg2,
                           host=host,
                           database=database,
                           user=user,
                           options={'maxconn':MAXCONNECTIONS})
def get_connection():
    got_connection = False
    while not got_connection:
        try:
            conn = conn_pool.connection()
            cur = conn.cursor()
            got_connection = True
        except psycopg2.OperationalError, mess:
            # Might log exception here
            sleep(1)
        except AttributeError, mess:
            # Might log exception here
            sleep(1)
    return conn, cur

Conclusion

In this tip, I have used the database-agnostic antipool.py module. I find antipool.py to be very flexible; it has additional well-thought-out features not discussed in this short tip. However, if you were to use the pooling capabilities built into psycopg2 or other tools, the same general principles for use of connection pooling remain the same.


Resources

Learn

  • Find Martin Blais' overall antiorm package.

  • IBM's DB2 RDBMS also has a Python adaptor, called (as you might expect) PyDB2. That module does not include connection pooling, but should work perfectly well with antipool.py.

  • If you wish to work with IBM's leading-edge DB2 RDBMS, be sure to check out the freely downloadable beta of DB2/Viper.

  • To obtain psycopg2, use its Trac home page. Be sure to use the 2.x series rather than the deprecated 1.x.

  • See all of developerWorks' featured trial downloads.

  • Subscribe to the developerWorks Web development newsletter.

  • Get your hands on more howto articles from the Web development zone's technical library.

Discuss

About the author

Photo of David Mertz

David Mertz is a big fan of concurrency, but is frustrated by contention. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/. And buy his book at http://gnosis.cx/TPiP/.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=249412
ArticleTitle=Web development tips: Use antipool.py for threaded Python database access
publish-date=08212007
author1-email=mertz@gnosis.cx
author1-email-cc=nora@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers