The main reason for the success of the Web is its flexibility. You find almost as many ways to design, develop, and deploy Web sites and applications as there are developers. With a huge wealth of choices, a Web developer often chooses a unique combination of Web design tools, page style, content language, Web server, middleware, and DBMS technology, using different implementation languages and accessory toolkits. To make all of these elements work together to offer maximum flexibility, Web functionality should be provided through components as much as possible. These components should perform a limited number of focused tasks competently and work well with each other. This is easy to say, but in practice it's very difficult to achieve because of the many different approaches to Web technology.
The best hope to keep your sanity is the growth of standards for Web component interoperability. Some of these important standards are already developed, and the most successful Web development platforms have them as their backbone. Prominent examples include the Java servlet API and the Ruby on Rails framework. Some languages long popular for Web programming are only recently being given the same level of componentization and have learned from the experience of preceding Web framework component standards. One example is the Zend Framework for PHP (see Resources). Another is Web Server Gateway Interface (WSGI) for Python.
Many people have complained the popular Python programming language has too many Web frameworks, from well-known entrants such as Zope to under-the-radar frameworks such as SkunkWeb. Some have argued this diversity can be a good thing, as long as there is some underpinning standardization. Python and Web expert Phillip J. Eby went about the task of such standardization. He authored Python Enhancement Proposal (PEP) 333, which defines WSGI.
The goal of WSGI is to allow for greater interoperability between Python frameworks. WSGI's success brings about an ecosystem of plug-in components you can use with your favorite frameworks to gain maximum flexibility. In this article, I'll introduce WSGI, and focus on its use as a reusable Web component architecture. In all discussions and sample code, I'll assume that you're using Python 2.4 or a more recent version.
WSGI was developed under fairly strict constraints, but most important was the need for a reasonable amount of backward compatability with the Web frameworks preceding it. This constraint means WSGI unfortunately isn't as neat and transparent as Python developers are used to. Usually the only developers who have to deal directly with WSGI are those who build frameworks and reusable components. Most regular Web developers will pick a framework for its ease of use and be insulated from WSGI details.
If you want to develop reusable Web components, you have to understand WSGI, and the first thing you need to understand about it is how Web applications are structured in the WSGI world view. Figure 1 illustrates this structure.
Figure 1. Illustration of how HTTP request-response passes through the WSGI stack
The Web server, also called the gateway, is very low-level code for basic communication with the request client (usually the user's browser). The application layer handles the higher-level details that interpret requests from the user and prepare response content. The application interface to WSGI itself is usually just the more basic layer of an even higher level of application framework providing friendly facilities for common Web patterns such as Ajax techniques or content template systems. Above the server or gateway layer lies WSGI middleware. This important layer comprises components that can be shared across server and application implementations. Common Web features such as user sessions, error handling, and authentication can be implemented as WSGI middleware.
WSGI middleware is the most natural layer for reusable components. WSGI middleware looks like an application to the lower layers, and like a server to the higher layers. It watches the state of requests, responses, and the WSGI environment in order to add some particular features. Unfortunately, the WSGI specification offers a very poor middleware example, and many of the other examples you can find are too simplistic to give you a feel for how to quickly write your own middleware. I'll give you a feel for the process WSGI middleware undertakes with the following broad outline. It ignores matters that most WSGI middleware authors won't need to worry about. In Python, where I use the word function, I mean any callable object.
- Set-up phase. A set-up phase occurs once each time the Web server starts up. It accepts an instance of the middleware, which wraps the application function.
- Handling a client request. Handling a client request occurs each time the Web server receives a request.
- Server calls the middleware function with the environment and
- Middleware processes the environment and calls the application callable, passing on the environment and a wrapped function
- The application executes; first it prepares the response headers, then it calls
- Middleware processes response headers and calls
- Server passes control back to the middleware and then back to the application, which starts yielding response body blocks (as strings).
- For each response, body block middleware makes any modifications and passes on some corresponding string to the server.
- Once all blocks from the application have been processed, middleware returns control to the server, finished for the current request.
- Server calls the middleware function with the environment and
Many component technologies are rather complex, so the best examples for instruction are simple throwaway toys. This isn't the case with WSGI, and, in fact, I'll present a very practical example. Many developers prefer to serve XHTML Web pages because XML technologies are easier to manage than "tag soup" HTML, and emerging Web trends favor sites that are easier for automatons to read. The problem is that not all Web browsers support XHTML properly. Listing 1 (safexhtml.py) is a WSGI middleware module that checks incoming requests to see if the browser supports XHTML and, if not, translates any XHTML responses to plain HTML. You can use such a module so all of your main application code produces XHTML and the middleware takes care of any needed translation to HTML. Review Listing 1 carefully and try to combine it with the general outline of WSGI middleware execution from the previous section. I've provided enough comments so you can identify the different stages in the code.
Listing 1 (safexhtml.py). WSGI middleware to translate XHTML to HTML for browsers that can't handle it
import cStringIO from itertools import chain from xml import sax from Ft.Xml import CreateInputSource from Ft.Xml.Sax import SaxPrinter from Ft.Xml.Lib.HtmlPrinter import HtmlPrinter XHTML_IMT = "application/xhtml+xml" HTML_CONTENT_TYPE = 'text/html; charset=UTF-8' #This class is not specific to the safexhtml example middleware and #Can be reused as-is in other WSGI middleware implementations #It's part of the wsgi.xml library (http://uche.ogbuji.net/tech/4suite/wsgixml) #You can install it by using easy_install wsgixml and then using #from wsgixml.util import import iterwrapper class iterwrapper: """ Wraps the response body iterator from the application to meet WSGI requirements. """ def __init__(self, wrapped, response_chunk_handler): """ wrapped - the iterator coming from the application response_chunk_handler - a callable for any processing of a response body chunk before passing it on to the server. """ self._wrapped = wrapped self._response_chunk_handler = response_chunk_handler if hasattr(wrapped, 'close'): self.close = self._wrapped.close def __iter__(self): return self def next(self): return self._response_chunk_handler(self._wrapped.next()) class safexhtml(object): """ Middleware that checks for XHTML capability in the client and translates XHTML to HTML if the client can't handle it """ def __init__(self, app): #Set-up phase self.wrapped_app = app return def __call__(self, environ, start_response): #Handling a client request phase. #Called for each client request routed through this middleware #Does the client specifically say it supports XHTML? #Note saying it accepts */* or application/* will not be enough xhtml_ok = XHTML_IMT in environ.get('HTTP_ACCEPT', '') #Specialized start_response function for this middleware def start_response_wrapper(status, response_headers, exc_info=None): #Assume response is not XHTML; do not activate transformation environ['safexhtml.active'] = False #Check for response content type to see whether it is XHTML #That needs to be transformed for name, value in response_headers: #content-type value is a media type, defined as #media-type = type "/" subtype *( ";" parameter ) if ( name.lower() == 'content-type' and value.split(';') == XHTML_IMT ): #Strip content-length if present (needs to be #recalculated by server) #Also strip content-type, which will be replaced below response_headers = [ (name, value) for name, value in response_headers if ( name.lower() not in ['content-length', 'content-type']) ] #Put in the updated content type response_headers.append(('content-type', HTML_CONTENT_TYPE)) #Response is XHTML, so activate transformation environ['safexhtml.active'] = True break #We ignore the return value from start_response start_response(status, response_headers, exc_info) #Replace any write() callable with a dummy that gives an error #The idea is to refuse support for apps that use write() def dummy_write(data): raise RuntimeError('safexhtml does not support the deprectated write() callable in WSGI clients') return dummy_write #Get the iterator from the application that will yield response #body fragments iterable = self.wrapped_app(environ, start_response_wrapper) #Gather output strings for concatenation #(only used if HTML translation is required) response_blocks =  #This function processes each chunk of output (simple string) from #the app, returning The modified chunk to be passed on to the server def handle_response_chunk(chunk): if xhtml_ok: #The client can handle XHTML, so nothing for this middleware to do #Notice that the original start_response function is passed #On, not this middleware's start_response_wrapper return chunk else: if environ['safexhtml.active']: response_blocks.append(chunk) return '' #Obey buffering rules for WSGI else: return chunk #After the application has finished sending its response body #fragments, if HTML translation is required, it is necessary #to send one more chunk, with the fully translated XHTML #This is handled by the following function, a generator. #If HTML translation is not required the generator produces nothing def produce_final_output(): if not xhtml_ok and environ['safexhtml.active']: #Need to convert response from XHTML to HTML xhtmlstr = ''.join(response_blocks) #First concatenate response #Now use 4Suite to transform XHTML to HTML htmlstr = cStringIO.StringIO() #Will hold the HTML result parser = sax.make_parser(['Ft.Xml.Sax']) handler = SaxPrinter(HtmlPrinter(htmlstr, 'UTF-8')) parser.setContentHandler(handler) #Don't load the XHTML DTDs from the Internet parser.setFeature(sax.handler.feature_external_pes, False) parser.parse(CreateInputSource(xhtmlstr)) yield htmlstr.getvalue() return chain(iterwrapper(iterable, handle_response_chunk), produce_final_output())
safexhtml is the full middleware implementation. Each instance is a callable object because the class defines the special
__call__ method. You pass an instance of the class to the server, passing the application you are wrapping to the initializer
__init__. The wrapped application might also be another middleware instance if you are chaining
safexhtml to other middleware. When the middleware is invoked as a result of a request to the server, the class first checks the Accept headers sent by the client to see whether it includes the official XHTML media type. If so (the
xhtml_ok flag), it's safe to send XHTML and the middleware doesn't do anything meaningful for that request.
When the client can't handle XHTML, the class defines the specialized nested function
start_response_wrapper whose job it is to check the response headers from the application to see whether the response is XHTML. If so, the response needs to be translated to plain HTML, a fact flagged as
safexhtml.active in the environment. One reason to use the environment for this flag is because it takes care of scoping issues in communicating the flag back to the rest of the middleware code. Remember that
start_response_wrapper is called asynchronously at a time the application chooses, and it can be tricky to manage the needed state in the middleware.
Another reason to use the environment is to communicate down the WSGI stack the content has been modified. If the response body needs to be translated, not only does the
start_response_wrapper set the
safexhtml.active, but it also changes the response media type to
text/html and removes any
Content-Length header because the translation will almost certainly change the length of the response body, and it will have to be recalculated downstream, probably by the server.
WSGI applications produce the HTTP response body by passing back an
iterator that yields a sequence of chunks of plain text to build the
response. The wrapper class
processes each fragment according to the rules for the middleware, and
ensures that WSGI rules are obeyed. For example, if the iterator from
the application has a
close method, the
server, and thus this middleware, must call it.
iterwrapper ensures that. The
handle_response_chunk nested function does the
actual work of processing each fragment from the application. If
translation to HTML is needed, this function gathers up the fragments
response_blocks list. For simplicity
of the code,
safexhtml runs the translation
mechanism only against a complete XHTML document, which might have to be
assembled from the fragments yielded by the application. WSGI rules,
however, stipulate the middleware must pass on something to the server
every time the application yields a fragment. It's okay to pass on an
empty string and that's what
handle_response_chunk does. Once the application is
safexhtml might need to yield one
more fragment, containing the results of the translation to HTML. The
produce_final_output generator handles this,
stitching together the response body and running it through the
translation code, finally yielding the entire output in a single string.
itertools.chain function is used to
append this final fragment (if it exists) for sending to the server.
Listing 2 (wsgireftest.py) is server code to test the middleware. It uses wsgiref, which includes a very simple WSGI server. The module will be included in the Python 2.5 standard library.
Listing 2 (wsgireftest.py). Server code for testing Listing 1
import sys from wsgiref.simple_server import make_server from safexhtml import safexhtml XHTML = open('test.xhtml').read() XHTML_IMT = "application/xhtml+xml" HTML_IMT = "text/html" PORT = 8000 def app(environ, start_response): print "using IMT", app.current_imt start_response('200 OK', [('Content-Type', app.current_imt)]) #Swap the IMT used for response (alternate between XHTML_IMT and HTML_IMT) app.current_imt, app.other_imt = app.other_imt, app.current_imt return [XHTML] app.current_imt=XHTML_IMT app.other_imt=HTML_IMT httpd = make_server('', PORT, safexhtml(app)) print 'Starting up HTTP server on port %i...'%PORT # Respond to requests until process is killed httpd.serve_forever()
Listing 2 reads a simple XHTML file, given in Listing 3 (test.xhtml), and serves it up with alternating media types. It uses the standard XHTML media type for the first request, the HTML media type for the second, back to XHTML for the third, and so on. This exercises the middleware's capability to leave a response alone if it isn't flagged as XHTML.
Listing 3 (test.xhtml). Simple XHTML file used by the sample server in Listing 2
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" > <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p> </body> </html>
You should be able to see the effect of this middleware if you run Listing 2 and view it in an XHTML-aware browser like Firefox and then an XHTML-challenged browser like Microsoft Internet Explorer. Make the request twice in a row for each browser to see the effect of the response media type on the operation of the middleware. Use View Source to see the resulting response body and the Page Info feature to see the reported response media type. You can also test the example using the command-line HTTP tool cURL:
curl -H 'Accept: application/xhtml+xml,text/html' http://localhost:8000/ to simulate an XHTML-savvy browser, and
curl -H 'Accept: text/html' http://localhost:8000/ to simulate the opposite case. If you want to see the response headers, use the
-D <filename> and inspect the given file name after each cURL invocation.
You've now learned about Python's WSGI and how to use it to implement a middleware service that you can plug into any WSGI server and application chain. You could easily chain this article's example middleware with middleware for caching or debugging. These all become components that let you quickly add well-tested features into your project regardless of what WSGI implementations you choose.
WSGI is a fairly young specification, but compatible servers, middleware, and utilities are emerging rapidly to completely revamp the Python Web frameworks landscape. The next time you have a major Web project to develop in Python, be sure to adopt WSGI by using existing WSGI components, and perhaps creating your own either for private use or for contribution back to your fellow Web developers.
- WSGI community site:
Find out all you need to know about WSGI. You probably do not need to read the full WSGI specification (PEP 333).
- Zend Framework: If you're a
PHP developer, learn about this new, open-source framework for developing Web applications and services.
- XHTML, step-by-step (Uche Ogbuji, developerWorks, Sep 2005): In the tutorial, learn more about XHTML, including some deployment issues.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- Sockets programming in Python (Hariprasad Nellitheertha, developerWorks, October 2005): Learn about the basic API for writing sockets as well as a module that eases socket server development, then build a simple chat server using these techniques.
- Charming Python series: (Dr. David Mertz, developerWorks): Learn more advanced techniques for programming in Python.
- Discover Python series: (Robert Brunner, developerWorks): Learn all about Python.
- "Python Web frameworks, Part 1: Develop for the Web with Django and Python" and "Python Web frameworks, Part 2: Web development with TurboGears and Python" (Ian Maurer, developerWorks, June and July 2006): Get your feet wet with Python Web frameworks reading.
- developerWorks Web Architecture zone: Expand your site development skills with articles and tutorials that specialize in Web technologies.
Get products and technologies
- 4Suite: Grab this application used to transform XHTML to HTML in this article's example. The author is a lead developer of 4Suite. The easiest way to get it is to use easy_install and run
- cURL: Get the ultimate tool for Web testing and script integration.
- The WSGI Reference Library (wsgiref): It is to be included with Python 2.5, but if you are on Python 2.4, the easiest way
to get it is to use easy_install and run
easy_install wsgiref. Get the module documentation from the Python 2.5 documentation sandbox. wsgiref also ships with a module, named validate, that allows you to check for WSGI conformance in your own code.
Application Server Version 6.0: Download a free trial version.
trial software: Build your next development project with software available for download directly from developerWorks.
- developerWorks blogs: Get involved in the developerWorks community.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia.