Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Mix and match Web components with Python WSGI

Learn about the Python standard for building Web applications with maximum flexibility

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia.

Summary:  Learn to create and reuse components in your Web server using Python. The Python community created the Web Server Gateway Interface (WSGI), a standard for creating Python Web components that work across servers and frameworks. It provides a way to develop Web applications that take advantage of the many strengths of different Web tools. This article introduces WSGI and shows how to develop components that contribute to well-designed Web applications.

Date:  22 Aug 2006
Level:  Introductory
Also available in:   Russian

Activity:  11978 views
Comments:  

The main reason for the success of the Web is its flexibility. You find almost as many ways to design, develop, and deploy Web sites and applications as there are developers. With a huge wealth of choices, a Web developer often chooses a unique combination of Web design tools, page style, content language, Web server, middleware, and DBMS technology, using different implementation languages and accessory toolkits. To make all of these elements work together to offer maximum flexibility, Web functionality should be provided through components as much as possible. These components should perform a limited number of focused tasks competently and work well with each other. This is easy to say, but in practice it's very difficult to achieve because of the many different approaches to Web technology.

The best hope to keep your sanity is the growth of standards for Web component interoperability. Some of these important standards are already developed, and the most successful Web development platforms have them as their backbone. Prominent examples include the Java servlet API and the Ruby on Rails framework. Some languages long popular for Web programming are only recently being given the same level of componentization and have learned from the experience of preceding Web framework component standards. One example is the Zend Framework for PHP (see Resources). Another is Web Server Gateway Interface (WSGI) for Python.

Many people have complained the popular Python programming language has too many Web frameworks, from well-known entrants such as Zope to under-the-radar frameworks such as SkunkWeb. Some have argued this diversity can be a good thing, as long as there is some underpinning standardization. Python and Web expert Phillip J. Eby went about the task of such standardization. He authored Python Enhancement Proposal (PEP) 333, which defines WSGI.

The goal of WSGI is to allow for greater interoperability between Python frameworks. WSGI's success brings about an ecosystem of plug-in components you can use with your favorite frameworks to gain maximum flexibility. In this article, I'll introduce WSGI, and focus on its use as a reusable Web component architecture. In all discussions and sample code, I'll assume that you're using Python 2.4 or a more recent version.

The basic architecture of WSGI

WSGI was developed under fairly strict constraints, but most important was the need for a reasonable amount of backward compatability with the Web frameworks preceding it. This constraint means WSGI unfortunately isn't as neat and transparent as Python developers are used to. Usually the only developers who have to deal directly with WSGI are those who build frameworks and reusable components. Most regular Web developers will pick a framework for its ease of use and be insulated from WSGI details.

If you want to develop reusable Web components, you have to understand WSGI, and the first thing you need to understand about it is how Web applications are structured in the WSGI world view. Figure 1 illustrates this structure.


Figure 1. Illustration of how HTTP request-response passes through the WSGI stack
The WSGI stack

The Web server, also called the gateway, is very low-level code for basic communication with the request client (usually the user's browser). The application layer handles the higher-level details that interpret requests from the user and prepare response content. The application interface to WSGI itself is usually just the more basic layer of an even higher level of application framework providing friendly facilities for common Web patterns such as Ajax techniques or content template systems. Above the server or gateway layer lies WSGI middleware. This important layer comprises components that can be shared across server and application implementations. Common Web features such as user sessions, error handling, and authentication can be implemented as WSGI middleware.

Code in the middle

WSGI middleware is the most natural layer for reusable components. WSGI middleware looks like an application to the lower layers, and like a server to the higher layers. It watches the state of requests, responses, and the WSGI environment in order to add some particular features. Unfortunately, the WSGI specification offers a very poor middleware example, and many of the other examples you can find are too simplistic to give you a feel for how to quickly write your own middleware. I'll give you a feel for the process WSGI middleware undertakes with the following broad outline. It ignores matters that most WSGI middleware authors won't need to worry about. In Python, where I use the word function, I mean any callable object.

  • Set-up phase. A set-up phase occurs once each time the Web server starts up. It accepts an instance of the middleware, which wraps the application function.
  • Handling a client request. Handling a client request occurs each time the Web server receives a request.
    1. Server calls the middleware function with the environment and server.start_response parameters.
    2. Middleware processes the environment and calls the application callable, passing on the environment and a wrapped function middleware.start_response.
    3. The application executes; first it prepares the response headers, then it calls middleware.start_response.
    4. Middleware processes response headers and calls server.start_response.
    5. Server passes control back to the middleware and then back to the application, which starts yielding response body blocks (as strings).
    6. For each response, body block middleware makes any modifications and passes on some corresponding string to the server.
    7. Once all blocks from the application have been processed, middleware returns control to the server, finished for the current request.

A bold step toward XHTML

Many component technologies are rather complex, so the best examples for instruction are simple throwaway toys. This isn't the case with WSGI, and, in fact, I'll present a very practical example. Many developers prefer to serve XHTML Web pages because XML technologies are easier to manage than "tag soup" HTML, and emerging Web trends favor sites that are easier for automatons to read. The problem is that not all Web browsers support XHTML properly. Listing 1 (safexhtml.py) is a WSGI middleware module that checks incoming requests to see if the browser supports XHTML and, if not, translates any XHTML responses to plain HTML. You can use such a module so all of your main application code produces XHTML and the middleware takes care of any needed translation to HTML. Review Listing 1 carefully and try to combine it with the general outline of WSGI middleware execution from the previous section. I've provided enough comments so you can identify the different stages in the code.


Listing 1 (safexhtml.py). WSGI middleware to translate XHTML to HTML for browsers that can't handle it
import cStringIO
from itertools import chain
from xml import sax
from Ft.Xml import CreateInputSource
from Ft.Xml.Sax import SaxPrinter
from Ft.Xml.Lib.HtmlPrinter import HtmlPrinter

XHTML_IMT = "application/xhtml+xml"
HTML_CONTENT_TYPE = 'text/html; charset=UTF-8'


#This class is not specific to the safexhtml example middleware and
#Can be reused as-is in other WSGI middleware implementations
#It's part of the wsgi.xml library (http://uche.ogbuji.net/tech/4suite/wsgixml)
#You can install it by using easy_install wsgixml and then using
#from wsgixml.util import import iterwrapper
class iterwrapper:
    """
    Wraps the response body iterator from the application to meet WSGI
    requirements.
    """
    def __init__(self, wrapped, response_chunk_handler):
        """
        wrapped - the iterator coming from the application
        response_chunk_handler - a callable for any processing of a
            response body chunk before passing it on to the server.
        """
        self._wrapped = wrapped
        self._response_chunk_handler = response_chunk_handler
        if hasattr(wrapped, 'close'):
            self.close = self._wrapped.close

    def __iter__(self):
        return self

    def next(self):
        return self._response_chunk_handler(self._wrapped.next())


class safexhtml(object):
    """
    Middleware that checks for XHTML capability in the client and translates
    XHTML to HTML if the client can't handle it
    """
    def __init__(self, app):
        #Set-up phase
        self.wrapped_app = app
        return

    def __call__(self, environ, start_response):
        #Handling a client request phase.
        #Called for each client request routed through this middleware

        #Does the client specifically say it supports XHTML?
        #Note saying it accepts */* or application/* will not be enough
        xhtml_ok = XHTML_IMT in environ.get('HTTP_ACCEPT', '')

        #Specialized start_response function for this middleware
        def start_response_wrapper(status, response_headers, exc_info=None):
            #Assume response is not XHTML; do not activate transformation
            environ['safexhtml.active'] = False
            #Check for response content type to see whether it is XHTML
            #That needs to be transformed
            for name, value in response_headers:
                #content-type value is a media type, defined as
                #media-type = type "/" subtype *( ";" parameter )
                if ( name.lower() == 'content-type'
                     and value.split(';')[0] == XHTML_IMT ):
                    #Strip content-length if present (needs to be
                    #recalculated by server)
                    #Also strip content-type, which will be replaced below
                    response_headers = [ (name, value)
                        for name, value in response_headers
                            if ( name.lower()
                                 not in ['content-length', 'content-type'])
                    ]
                    #Put in the updated content type
                    response_headers.append(('content-type', HTML_CONTENT_TYPE))
                    #Response is XHTML, so activate transformation
                    environ['safexhtml.active'] = True
                    break

            #We ignore the return value from start_response
            start_response(status, response_headers, exc_info)
            #Replace any write() callable with a dummy that gives an error
            #The idea is to refuse support for apps that use write()
            def dummy_write(data):
                raise RuntimeError('safexhtml does not support the deprectated 
                                                  write() callable in WSGI clients')
            return dummy_write

        #Get the iterator from the application that will yield response
        #body fragments
        iterable = self.wrapped_app(environ, start_response_wrapper)

        #Gather output strings for concatenation
        #(only used if HTML translation is required)
        response_blocks = []

        #This function processes each chunk of output (simple string) from
        #the app, returning The modified chunk to be passed on to the server
        def handle_response_chunk(chunk):
            if xhtml_ok:
                #The client can handle XHTML, so nothing for this middleware to do
                #Notice that the original start_response function is passed
                #On, not this middleware's start_response_wrapper
                return chunk
            else:
                if environ['safexhtml.active']:
                    response_blocks.append(chunk)
                    return '' #Obey buffering rules for WSGI
                else:
                    return chunk

        #After the application has finished sending its response body
        #fragments, if HTML translation is required, it is necessary
        #to send one more chunk, with the fully translated XHTML
        #This is handled by the following function, a generator.
        #If HTML translation is not required the generator produces nothing
        def produce_final_output():
            if not xhtml_ok and environ['safexhtml.active']:
                #Need to convert response from XHTML to HTML 
                xhtmlstr = ''.join(response_blocks) #First concatenate response

                #Now use 4Suite to transform XHTML to HTML
                htmlstr = cStringIO.StringIO()  #Will hold the HTML result
                parser = sax.make_parser(['Ft.Xml.Sax'])
                handler = SaxPrinter(HtmlPrinter(htmlstr, 'UTF-8'))
                parser.setContentHandler(handler)
                #Don't load the XHTML DTDs from the Internet
                parser.setFeature(sax.handler.feature_external_pes, False)
                parser.parse(CreateInputSource(xhtmlstr))
                yield htmlstr.getvalue()

        return chain(iterwrapper(iterable, handle_response_chunk),
                     produce_final_output())

The class safexhtml is the full middleware implementation. Each instance is a callable object because the class defines the special __call__ method. You pass an instance of the class to the server, passing the application you are wrapping to the initializer __init__. The wrapped application might also be another middleware instance if you are chaining safexhtml to other middleware. When the middleware is invoked as a result of a request to the server, the class first checks the Accept headers sent by the client to see whether it includes the official XHTML media type. If so (the xhtml_ok flag), it's safe to send XHTML and the middleware doesn't do anything meaningful for that request.

When the client can't handle XHTML, the class defines the specialized nested function start_response_wrapper whose job it is to check the response headers from the application to see whether the response is XHTML. If so, the response needs to be translated to plain HTML, a fact flagged as safexhtml.active in the environment. One reason to use the environment for this flag is because it takes care of scoping issues in communicating the flag back to the rest of the middleware code. Remember that start_response_wrapper is called asynchronously at a time the application chooses, and it can be tricky to manage the needed state in the middleware.

Another reason to use the environment is to communicate down the WSGI stack the content has been modified. If the response body needs to be translated, not only does the start_response_wrapper set the safexhtml.active, but it also changes the response media type to text/html and removes any Content-Length header because the translation will almost certainly change the length of the response body, and it will have to be recalculated downstream, probably by the server.

WSGI applications produce the HTTP response body by passing back an iterator that yields a sequence of chunks of plain text to build the response. The wrapper class iterwrapper processes each fragment according to the rules for the middleware, and ensures that WSGI rules are obeyed. For example, if the iterator from the application has a close method, the server, and thus this middleware, must call it.

iterwrapper ensures that. The handle_response_chunk nested function does the actual work of processing each fragment from the application. If translation to HTML is needed, this function gathers up the fragments into the response_blocks list. For simplicity of the code, safexhtml runs the translation mechanism only against a complete XHTML document, which might have to be assembled from the fragments yielded by the application. WSGI rules, however, stipulate the middleware must pass on something to the server every time the application yields a fragment. It's okay to pass on an empty string and that's what handle_response_chunk does. Once the application is finished, safexhtml might need to yield one more fragment, containing the results of the translation to HTML. The produce_final_output generator handles this, stitching together the response body and running it through the translation code, finally yielding the entire output in a single string. The itertools.chain function is used to append this final fragment (if it exists) for sending to the server.

Listing 2 (wsgireftest.py) is server code to test the middleware. It uses wsgiref, which includes a very simple WSGI server. The module will be included in the Python 2.5 standard library.


Listing 2 (wsgireftest.py). Server code for testing Listing 1
import sys
from wsgiref.simple_server import make_server
from safexhtml import safexhtml

XHTML = open('test.xhtml').read()
XHTML_IMT = "application/xhtml+xml"
HTML_IMT = "text/html"
PORT = 8000


def app(environ, start_response):
    print "using IMT", app.current_imt
    start_response('200 OK', [('Content-Type', app.current_imt)])
    #Swap the IMT used for response (alternate between XHTML_IMT and HTML_IMT)
    app.current_imt, app.other_imt = app.other_imt, app.current_imt
    return [XHTML]

app.current_imt=XHTML_IMT
app.other_imt=HTML_IMT

httpd = make_server('', PORT, safexhtml(app))
print 'Starting up HTTP server on port %i...'%PORT

# Respond to requests until process is killed
httpd.serve_forever()



Listing 2 reads a simple XHTML file, given in Listing 3 (test.xhtml), and serves it up with alternating media types. It uses the standard XHTML media type for the first request, the HTML media type for the second, back to XHTML for the third, and so on. This exercises the middleware's capability to leave a response alone if it isn't flagged as XHTML.


Listing 3 (test.xhtml). Simple XHTML file used by the sample server in Listing 2
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
  </body>
</html>



You should be able to see the effect of this middleware if you run Listing 2 and view it in an XHTML-aware browser like Firefox and then an XHTML-challenged browser like Microsoft Internet Explorer. Make the request twice in a row for each browser to see the effect of the response media type on the operation of the middleware. Use View Source to see the resulting response body and the Page Info feature to see the reported response media type. You can also test the example using the command-line HTTP tool cURL: curl -H 'Accept: application/xhtml+xml,text/html' http://localhost:8000/ to simulate an XHTML-savvy browser, and curl -H 'Accept: text/html' http://localhost:8000/ to simulate the opposite case. If you want to see the response headers, use the -D <filename> and inspect the given file name after each cURL invocation.


Wrap-up

You've now learned about Python's WSGI and how to use it to implement a middleware service that you can plug into any WSGI server and application chain. You could easily chain this article's example middleware with middleware for caching or debugging. These all become components that let you quickly add well-tested features into your project regardless of what WSGI implementations you choose.

WSGI is a fairly young specification, but compatible servers, middleware, and utilities are emerging rapidly to completely revamp the Python Web frameworks landscape. The next time you have a major Web project to develop in Python, be sure to adopt WSGI by using existing WSGI components, and perhaps creating your own either for private use or for contribution back to your fellow Web developers.


Resources

Learn

Get products and technologies

  • 4Suite: Grab this application used to transform XHTML to HTML in this article's example. The author is a lead developer of 4Suite. The easiest way to get it is to use easy_install and run easy_install 4Suite.

  • cURL: Get the ultimate tool for Web testing and script integration.

  • The WSGI Reference Library (wsgiref): It is to be included with Python 2.5, but if you are on Python 2.4, the easiest way to get it is to use easy_install and run easy_install wsgiref. Get the module documentation from the Python 2.5 documentation sandbox. wsgiref also ships with a module, named validate, that allows you to check for WSGI conformance in your own code.

  • WebSphere Application Server Version 6.0: Download a free trial version.

  • IBM trial software: Build your next development project with software available for download directly from developerWorks.

Discuss

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=151353
ArticleTitle=Mix and match Web components with Python WSGI
publish-date=08222006
author1-email=uche@ogbuji.net
author1-email-cc=htc@us.ibm.com