Long-running Web server applications frequently utilize back-end RDBMSs for a wide variety of purposes, perhaps especially in transactional applications. In fact, I believe that more Web services should use databases than actually do, as developers often feel an unnecessary "burden" is involved in configuring an RDBMS (but that's a topic for a different article). For all their numerous advantages, database access has at least one bottleneck: connections.
Connections can create a bottleneck in two different ways. Straightforwardly, they require a little bit of bandwidth to negotiate over the wire, and a modicum of CPU and memory resources on the RDBMS machine to allocate each one. Those resources are minimal, but not negligible. More importantly, however, an RDBMS provides a finite number of connections. For a situation where the main client of an RDBMS is a Web application, it does not make much practical difference whether that limit is per-client or total, since one database client (the Web server) creates the bulk of activity.
In this tip, I use the excellent psycopg2 adaptor, but any
Python database using DBAPI will work essentially identically. The
one caveat here is that object-relational mappings (ORMs), to my mind,
hide far too much of the details of what is actually happening with
database access, impose artificial restrictions, and generally get in
the way far more than they ease programming.
In the ideal case (without connection pooling), every thread or process in which a Web server application connects to a database will quickly acquire a connection, create a cursor, then perform some reasonably speedy operations, commit or roll back the operations, then close the connection. Easy enough if everything goes well. For example:
Listing 1. Web server writing to an RDBMS
def AddData(foo, bar):
"This function is generally called in its own thread"
from psycopg import connect
conn = connect("dbname=transact user=web host=server.mine")
cur = conn.cursor()
cur.execute("INSERT INTO userdata VALUES (%r, %r)" % (foo, bar))
conn.commit()
conn.close()
|
This example is not really the best code possible, since I might wrap
a try/'except' around the connect() call, and perhaps a
try/finally to assure conn.close() gets called. I also do not
use DBAPI value escaping, as I might. Nonetheless, AddData() has
the "shape" of a typical database call.
What goes wrong (and how to fix it)
A few things tend to go wrong in code like the above. One problem is that when many threads are launched at once some of them, in practice, will fail to obtain a connection, even if the database limit is not reached. In theory, this problem should not occur, but in reality it does. The other problem, however, is that if the database pool limit is reached—usually because threads take longer to complete than you expect or raise exceptions before closing their connection—the failure to obtain a connection is far more opaque than you would like.
By creating a local thread pool, both problems can be addressed. A connection pool is simply a collection of preallocated database connections that can be reused and released by each "virtual connection" thread. That is, the real database connections are closed relatively infrequently, but their pooled proxies are acquired and released with every transaction.
On the one hand, pulling preallocated connections from a local pool
is an in-memory operation that is very unlikely to fail (other than by
exhausting connections), since no network or database latency issues
come up. On the other hand, using a pool lets you accurately and
locally manage your connection limit. If you rely on the database to
limit connections, you have no direct means of determining at an
application level how many connections are used and available; the
number of open threads might act as a crude approximation, but that is
a fragile approach. The antipool.py module will increase and
decrease its pooled (real) connections in the background to avoid
holding real connections longer than necessary. To a user, however, a
pooled connection looks the same as a real database connection, except
availability of connections in the pool is known locally, and the
maximum is a parameter of the connection pool rather than of the
database.
Even using antipool.py, I find it convenient to thinly wrap the
pooling further, rather than use all the options of ConnectionPool
all the time. Moreover, for the most common case where a real
connection limit is reached, a simple sleep() will usually suffice
to eventually acquire a new one. If a lot of threads are genuinely
ill-behaved (never closing), you will still have a problem, but it
becomes easier to debug at least. Here is a wrapper I use:
Listing 2. webapp_pool.py
"""USAGE:
from webapp_pool import get_connection
conn, cur = get_connection() # Might hang, but never raises
cur.execute(SQL)
conn.commit()
conn.release() # 'conn.release()' not 'conn.close()'
"""
import psycopg2
from time import sleep
from antipool import ConnectionPool
from database import host, database, user, MAXCONNECTIONS
conn_pool = ConnectionPool(psycopg2,
host=host,
database=database,
user=user,
options={'maxconn':MAXCONNECTIONS})
def get_connection():
got_connection = False
while not got_connection:
try:
conn = conn_pool.connection()
cur = conn.cursor()
got_connection = True
except psycopg2.OperationalError, mess:
# Might log exception here
sleep(1)
except AttributeError, mess:
# Might log exception here
sleep(1)
return conn, cur
|
In this tip, I have used the database-agnostic antipool.py module.
I find antipool.py to be very flexible;
it has additional well-thought-out features not discussed in this short tip. However, if you
were to use the pooling
capabilities built into psycopg2 or
other tools, the same general
principles for use of connection pooling remain the same.
Learn
- Find Martin Blais' overall antiorm package.
- IBM's DB2 RDBMS also has a Python adaptor, called (as you might
expect) PyDB2. That module does not
include connection pooling, but should work perfectly well with
antipool.py. - If you wish to work with IBM's leading-edge DB2 RDBMS, be sure to
check out the freely downloadable beta of DB2/Viper.
- To obtain
psycopg2, use its Trac home page. Be sure to use the 2.x series rather than the deprecated 1.x. - See all of developerWorks'
featured trial downloads.
-
Subscribe to
the developerWorks Web development newsletter.
-
Get your hands on more howto articles from the Web
development zone's technical library.
Discuss
-
Get involved in the developerWorks
community: blogs, forums, and so much
more.

David Mertz is a big fan of concurrency, but is frustrated by contention. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/. And buy his book at http://gnosis.cx/TPiP/.





