Tip: Make your CGI scripts available via XML-RPC

Providing a programmatic interface to Web services

For a large class of CGI scripts, it is both easy and useful to provide an alternate XML-RPC interface to the same calculation or lookup. If you do this, other developers can quickly utilize the information you provide within their own larger applications. This tip shows you how.


David Mertz (mertz@gnosis.cx), Interfacer, Gnosis Software, Inc.

David Mertz knows a little bit about a lot of things, but a lot about fewer things than he once did. The smooth overcomes the striated. David can be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/.

30 April 2003

Many CGI scripts are, at their heart, just a form of remote procedure call. A user specifies some information, perhaps in an HTML form, and your Web server returns a formatted page that contains an answer to their inquiry. The data on this return page is surrounded by some HTML markup, but basically it is the data that is of interest. Examples of data-oriented CGI interfaces are search engines, stock price checks, weather reports, personal information lookup, catalog inventory, and so on.

A Web browser is a fine interface for human eyes, but a returned HTML page is not an optimal format for integration within custom applications. What programmers often do to utilize the data that comes from CGI queries is screen-scraping of returned pages -- that is, they look for identifiable markup and contents, and pull data elements from the text. But screen-scraping is error-prone; page layout might change over time or might be dependent on the specific results. A more formal API is better for programmatic access to your CGI functionality.

XML-RPC is specifically designed to enable application access to queryable results over an HTTP channel. Its sibling, SOAP, can do a similar job, but the XML format of the SOAP is more complicated than is needed for most purposes. An ideal system is one where people can make queries in a Web browser, while custom applications can make the same queries using XML-RPC. The underlying server can do almost exactly the same thing in either case.

An example

I have created a service within my Web site that enables users to send e-mail to anonymized recipients. Rather than a traceable address, recipients can create a local anonym where they can get mail. You can read about the goals and architecture of Gnosis-Anon at its home page (see Resources). At the same URL, you can enter a query into an HTML form, and in return be presented with an HTML page informing you of an anonym. From there, you need to either write down the information or cut-and-paste the information into a tool other than your Web browser.

Suppose you want to utilize the anonym automatically in an application such as a Mail User Agent (MUA) or Mail Transport Agent (MTA). You might do some screen-scraping like the following:

Listing 1. get-anonym-cgi.py
#!/usr/bin/env python
from urllib import urlencode, urlopen
from sys import argv base_url = 'http://gnosis.cx/cgi-bin/encode_address.cgi' query = urlencode({'duration':argv[1], 'email':argv[2]}) html_answer = urlopen(base_url+'?'+query).readlines() result = "NO ANONYM FOUND!"
for line in html_answer: if line.find("<dt>Anonym:</dt>") >= 0: start = line.find('<dd>')+4 end = line.find('</dd>') result = line[start:end] break
print result

You can run this with a command line like the following:

Listing 2. Running get-anonym-cgi
                % get-anonym-cgi.py perm mertz@gnosis.cx

This works if I do not change the format of the HTML -- but that's a big if. A more robust (and simpler) client application might look like this:

Listing 3. get-anonym-xmlrpc.py
#!/usr/bin/env python
import sys, xmlrpclib server = xmlrpclib.Server("http://gnosis.cx:8000") print server.anonym(sys.argv[1], sys.argv[2])

This XML-RPC application behaves exactly the same as the CGI-based one -- except that it will not break if the layout of the Web page changes slightly.

Setting up the XML-RPC server

Writing an XML-RPC server is not much different from writing a CGI script. The actual calculation or lookup code is identical; you only need to change the format of the output and do a little extra work parsing the inputs for CGI. My CGI script looks something like this:

Listing 4. encode_address.py
#!/usr/bin/env python
import cgi query = cgi.FieldStorage() email = query.getvalue('email','test@test.lan') duration = query.getvalue('duration', 'Unknown') anonym = FIND_THE_ANONYM(duration, email) html_template = open('template').read() html = html_template % (email, anonym, duration) print"Content-Type: text/html"
print html

This leaves out the details of how FIND_THE_ANONYM() works and what the HTML template looks like, but those details are unimportant here. An XML-RPC server is even easier to program:

Listing 5. anonym-xmlrpc-server.py
#!/usr/bin/env python
from SimpleXMLRPCServer import SimpleXMLRPCServer class Anonym: def anonym(self, duration, email): return FIND_THE_ANONYM(duration, email) def container_test(self): return {'spam':'eggs', 'bacon':'toast'} server = SimpleXMLRPCServer(('', 8000)) server.register_instance(Anonym()) server.serve_forever()

As you can see, the same lookup function is used; its return value is what is returned to a remote call to the .anonym() method. On the wire, return values are encoded as XML-RPC, but Python's xmlrpclib module automatically translates XML-RPC encoded structures back into native data structures, as do analogous libraries in other languages. The method .container_test() in Listing 5 can be called remotely as well, in which case the client will see a Python dictionary.

A few notes

These code examples use Python, but implementations of both XML-RPC clients and servers exist for a large number of programming languages. Moreover, XML-RPC itself is completely language-neutral; multiple clients written in different languages can call the same server, and none of them will care what language the server was written in.

There is a difference in the way a CGI script runs and the way this XML-RPC server runs. The XML-RPC server is its own process (and uses its own port). CGI scripts, on the other hand, are automatically generated by a general HTTP server. But both still travel over HTTP (or HTTPS) layers, so any issues with firewalls, statefulness, and the like remain identical. Moreover, some general-purpose HTTP servers support XML-RPC internally. But if, like me, you do not control the configuration of your Web host, it is easier to write a stand-alone XML-RPC server like the eight-line version in Listing 5.



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into XML on developerWorks

ArticleTitle=Tip: Make your CGI scripts available via XML-RPC