There are a number of messaging technologies available to Web services developers; some offer high performance, others ease of use and human readability. In this installment of The Python Web Services Developer, you will look at four different messaging technologies and write a simple benchmark application for each. When you are done, you will gather some basic statistics from each application for a simple comparison between the technologies. The four technologies that you will examine in this column are SOAP, CORBA, XML-RPC, and good, old-fashioned low-level sockets.
In this article, we will demonstrate a simple client-server application that sends three different messages. The first message is in the form of a string sent from client to server; from this, you can gather information on timing and message overhead. The second message that you create is the opposite of the first: you receive a string from the server. Lastly, you will send an integer to the server. This will allow us to gather timing information as well as information on message overhead for binary-based messages.
Be careful not to draw too broad a conclusion from the analysis presented in this article. The four technologies that we'll examine, while all suited for a message-based client-server application, differ greatly in their strengths and weaknesses. To choose a messaging technology that's right for you, you need to do more than just compare messaging overhead directly. As we look at each technology in more detail, we will attempt to uncover some of their important differences and allow developers to make a more complete choice as to the technology to use.
To gather the timing information, you will simply use the
time.time() function call available with Python. As you review the statistical data generated, keep in mind that
time.time()'s accuracy is platform dependent and may in some cases have no better than one second of precision.
As you execute an application with each different technology, you will want a standard way to see how much information is actually sent over your TCP connection. To do this, you will use a tool called
tcpdump (see the Resources section below for a link). The process for downloading and installing
tcpdump will vary based on your OS, so we will not cover that here. The documentation on the site is decent, and at any rate, you won't need
tcpdump to actually run the example code, just to test for message overhead by gathering the messages sent to a specific TCP connection.
When you are testing the applications, you will use the following options on
tcpdump to siphon the messages off into a temporary file:
tcpdump -i lo -ae -s 0 -w /tmp/packets.txt
There are many other options that you can use (you could filter by port or hostname, for example) but for this simple case these will suffice. The
-i lo option tells
tcpdump what interface to listen on (the loopback device, in this case). The
a option attempts to convert addresses to names. The
e option includes link-level header information in the output.
s 0 tells
tcpdump that you want all data in every packet body. (If you want
tcpdump to keep only a set number of bytes per packet, enter the number of bytes desired in place of zero here.) The last option tells
tcpdump to copy all packets reviewed by the machine to a file called
/tmp/packets.txt. Note that the command line above is for a Linux machine, and that you may need to run
tcpdump with root or administrator permissions.
There are a few third-party packages you need to have running on your system before you can run the test suite. We'll cover each briefly; if you already have these packages installed, you can skip the following sections.
The CORBA ORB that you are going to use is called omniORB (see Resources for a link where you can download it). For this example you'll use omniORB version 3.04 and omniORBPy version 1.4. For non-Unix platforms and some versions of Unix there are binaries available. If you are building from source, then the appropriate README file will help; the basic steps follow. Note that if you are installing from binaries then you will still need to update your
PATH environment variables as shown in steps 7 and 8.
- Download the source code and untar it
- Uncomment your platform in
cd omni/src && make export
- Download omniORBPy (from the same site; see Resources)
- Untar it in the
cd omni/src/lib/omniORBPy && make export
- Update you
PYTHONPATHenvironment variable to include
- Update your
With omniORB installed, you need to generate the stub and skeleton files for the example. Download and unarchive the example from here. Then change into the
src/omni directory and execute:
omniidl -bpython server.idl
If you are using Python version 2.1 or later, then you can skip this section. However, if you are using an earlier version of Python, or you just want to get the latest
xmlrpclib, then you should download that package; see Resources for a link. Installation is simple: just create a directory and unzip the archive to a location on your Python path. Don't forget to add a
__init__.py to the new directory so it can be imported as a Python module.
[molson@penny python]$ mkdir XmlRpc [molson@penny python]$ cd XmlRpc/ [molson@penny XmlRpc]$ unzip ~/downloads/xmlrpc-0.9.8-990621.zip Archive: /home/molson/downloads/xmlrpc-0.9.8-990621.zip inflating: README inflating: xmlrpclib.py inflating: xmlrpcserver.py inflating: xmlrpc_handler.py [molson@penny XmlRpc]$ touch __init__.py [molson@penny XmlRpc]$
The last third-party library that you'll need is a SOAP implementation. For your benchmarking purposes you will use ZSI; see Resources for a link to the latest version. Once you've downloaded it, unarchive it and execute
python setup.py install, and you'll be ready to use it.
Now that your system is prepared, you can use your benchmark applications to test each technology in turn.
Let's look at raw Python sockets first. You can consider this your control, as there is no protocol overhead (except what you add yourself). That's the good part about programming raw sockets; the downside is that you'll have many other nontrivial issues that you need to worry about, such as:
- Much more code to develop and maintain.
- Lack of data integrity over the wire (though only CORBA provides more than TCP itself).
- Low-level error conditions (that is, wrong message protocol, dropped connection).
So, let's get into the application. All of the example code for this application is available in the Resources section. After you have downloaded the archive and expanded it, you will have four subdirectories. The first one to look at is called
python. One of the benefits of writing your own protocol for TCP communications is that there are no additional third-party libraries that you need to install. This can minimize distribution headaches.
In the example distribution, the file
python/server.py is the server you will use to receive raw TCP requests. It is based on the standard
SocketServer module that comes with Python. There are a few things to note when you look at the code. The first is that there is absolutely no error checking. For every message type (and for nonexistent message types as well) the client needs to thoroughly check every byte received from the server. The second thing to note is that the handler is designed to handle a single request. This will put your little socket server at a disadvantage when you start sending a lot of integers really quickly because the client will have to create a new connection for every integer that it sends. You could have written this server to handle multiple requests in a single connection, but then you would have had to add additional message type support to your server in the form of
stop messages so that the server would know when to stop listening on a connection. The last thing to note about the server file is that there is no form of connection time out. With the way that this server is configured, a malicious client could connect to the server and send one extremely large string, effectively bringing the entire server to its knees.
To start the server, simply execute the script
server.py in one window and the test client, called
time-client.py, in a second. This will print out timing information to the terminal that executed the client script. When the client is done, you can press Ctrl-C to stop the server. To test the message overhead, you will use
tcpdump; run it in a third window, then run the script
size-client.py (with the server still running). The results of these two scripts are shown in Table 1 below.
The second technology we'll look at, SOAP, should be the most familiar to readers of this column. SOAP is the newcomer to this set of technologies, however, only coming to the scene within the last couple of years. It (generally) uses HTTP for its transport, creating XML-based messages that are sent as the HTTP body. From the get-go you can already see that there will be some overhead in the SOAP messages, from both the HTTP and the XML. However, this should not always be considered a bad thing as it tends to make the message more readable to humans.
In the example distribution, there is a
soap subdirectory. It contains the files
size-client.py. These are executed exactly as their counterparts in the raw socket application. Again, you'll look at the results in Table 1.
Of SOAP, CORBA, and XML-RPC, CORBA has been around the longest and has tackled some of the more difficult issues of distributed programming: concurrency, transactions, security, and authentication. SOAP has made starts on these and XML-RPC requires that you, the programmer, handle these issues. (Actually, CORBA itself does not provide these features; however, the protocol is designed to support these features, and most ORBs will come with some or all of these features implemented through the CORBA Services specification.) The second benefit to CORBA (over SOAP or XML-RPC) is that its message format is binary, though this is in some ways a hindrance as well. It is nice because CORBA messages are much smaller than SOAP or XML-RPC messages and have much less overhead. The downside is that if you ever need to debug a CORBA message, you're in trouble: only a GIOP server can understand it.
In the example distribution there is a
CORBA subdirectory. Execute the scripts here just as you executed their counterparts in the previous two examples. The results have been tabulated in Table 1 below.
Lastly, let's look at the XML-RPC protocol. Many people have commented to us that XML-RPC can fill most of the roles that SOAP does. However, we think it falls short in several aspects, including Unicode support and flexibility of data typing. XML-RPC is a much simpler protocol than SOAP, which makes it easier to use; however, SOAP is working towards solving many of the issues that CORBA has already solved (concurrency, transactions, etc.) while XML-RPC seems to be content where it is at the moment, as most XML-RPC developers are now working on SOAP.
In the example distribution there is a
xml-rpc subdirectory; the examples are run as before and the results are all tabulated in the next section.
The data from your tests of all four messaging technologies are summarized in Table 1.
|Technology||Connect time||Send string (21,000 characters)||Receive string (22,000 characters)||Send 5,000 integers||Client LOC||Server LOC||Actual message size sending 1,000 characters||Actual message size sending 100 integers|
Wow! Human readability sure does come at an expensive price. The SOAP and XML-RPC messages are just over 14 times as large as the binary CORBA messages. This is not to say that large messages are horrible in today's world, where T1s connect most businesses and 256 Kbps DSL connections connect many households; however, as you can see from the test where you sent 5,000 integers to the server, SOAP and XML-RPC took 882 and 66 times longer than CORBA on the same machine, respectively.
One statistic that did come initially as a surprise is that the CORBA server had smaller messages and was faster than the raw sockets implementation. The reason for this (as mentioned earlier) is that with raw sockets you need to establish a new connection for every request that you make to the server. When connecting to a server over TCP there are several messages sent back and forth to create the connection. Three message are sent from the client to the server requesting a connection, and three acknowledgements are sent from the server. When you do this 5,000 times, these little messages add up. CORBA (and most current HTTP servers) get around this by using connection pooling.
There are other considerations to take into account that are not easily quantified in a statistical analysis. Each messaging technology has different attributes associated with it that make it more appealing in some situations and less in others. Take raw sockets, for instance. In general, raw sockets require the programmer to do the most work in creation, debugging, and maintenance of the communication code. However, if the application that you are creating is very sensitive to overhead, or if you are reluctant to leave the implementation of the communications layer in the hands of a third party (by using a messaging technology library), then the amount of work required to create and maintain a custom communications layer becomes less and less of an obstacle.
Although CORBA is the fastest of the protocols that you looked at in this column, it also has the steepest learning curve, and will most likely have the largest memory footprint (depending on the CORBA implementation that you use, of course). Also, even though there are many open source implementations of the CORBA specification, to get many of CORBA's must-have features, like CosTransactions, CosSecurity, and CosConcurrency, you will either have to implement them yourself, or purchase a commercial CORBA implementation; the latter will likely cost tens or hundreds of thousands of dollars.
XML-RPC and SOAP fall roughly into the same category. Both feature a great amount of message overhead, and their performance is quite a bit slower than CORBA's; however, for many applications you really don't care about these factors. With the cost of bandwidth, processors, memory, and disk space dropping daily, it is pretty easy to justify spending an additional $5,000 to $10,000 on additional equipment to compensate for any drop in performance. In the process, you would spend less than you would on a commercial ORB, but still have the benefit of human-readable messages. One last thing to note about SOAP and XML-RPC is that they are just based on plain XML. This means that if you support an in/outbound XML-based message structure in your application, no matter where the technology goes in the future, you will have a much easier time of moving your communications infrastructure to another format, as XML can be transformed into just about anything with a bit of XSLT.
In the next installment of this column we will look in more detail at the XML-RPC library that comes standard in Python 2.2. We will write a simple client and server application to show you how to use the library, and compare some of the features of XML-RPC with similar features in SOAP.
- Participate in the discussion forum.
- Download this article's sample code as a .zip file or a .tgz file.
- Read the previous installment of The Python Web Services Developer.
- Download omniORB from the omniORB home page.
tcpdumphome page includes documentation and information on this tool; download it here.
- Download the latest version of
- You can download ZSI from SourceForge.
- The Python Web Services Developer covered ZSI in some detail in "Python SOAP libraries, Part 2" (developerWorks, February 2002).
- Python.org includes some excellent Python socket documentation.
Mike Olson is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, and 4Suite Server, open source platforms for XML middleware. You can contact Mr. Olson at firstname.lastname@example.org.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, and 4Suite Server, open source platforms for XML middleware. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at email@example.com.