XML-RPC is a remote function invocation protocol with a great virtue: It is worse than all of its competitors. Compared to Java RMI or CORBA or COM, XML-RPC is impoverished in the type of data it can transmit and obese in its message size. XML-RPC abuses the HTTP protocol to circumvent firewalls that exist for good reasons, and as a consequence transmits messages lacking statefulness and incurs channel bottlenecks. Compared to SOAP, XML-RPC lacks both important security mechanisms and a robust object model. As a data representation, XML-RPC is slow, cumbersome, and incomplete compared to native programming language mechanisms like Java's serialize, Python's pickle, Perl's Data::Dumper, or similar modules for Ruby, Lisp, PHP, and many other languages.
In other words, XML-RPC is the perfect embodiment of Richard Gabriel's "worse-is-better" philosophy of software design (see Resources). I can hardly write more glowingly on XML-RPC than I did in the previous paragraph, and I think the protocol is a perfect match for a huge variety of tasks. To understand why, it's worth quoting the tenets of Gabriel's "worse-is-better" philosophy:
- Simplicity: The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
- Correctness: The design must be correct in all observable aspects. It is slightly better to be simple than correct.
- Consistency: The design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
- Completeness: The design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.
Writing years before the specific technology existed, Gabriel identifies the virtues of XML-RPC perfectly.
I have written a moderately popular module for Python called xml_pickle. The purpose of this module (discussed previously in this column, see Resources) is to serialize Python objects, using an interface that's mostly the same as those of the standard cPickle and pickle modules. The only difference is that in my module, the representation is in XML. My intention all along with xml_pickle was to create a very lightweight format that could also be read from other programming languages (and across Python versions). A DTD accompanies the module for users who want to validate XML pickles, but feedback from users has suggested that formal validation is rarely a concern.
A recurrent question I have received from users of xml_pickle is whether XML-RPC would be a better choice, given its more widespread use and existing implementations in many programming languages. While the answer to the narrow question probably favors xml_pickle, the comparison is worthwhile -- and it raises some points about data-type richness.
On first pass, XML-RPC seems to do something different from xml_pickle: XML-RPC calls remote procedures and gets results back. The typical usage example in Listing 1 appears at the XML-RPC Web site and in the Programming Web Services with XML-RPC book (see Resources):
Listing 1. Python shell example of XML-RPC usage
>>> import xmlrpclib
>>> betty = xmlrpclib.Server("http://betty.userland.com")
>>> print betty.examples.getStateName(41)
South Dakota |
By contrast, xml_pickle creates string representations of local in-memory objects. These may not seem the same, but in order to call a remote procedure, XML-RPC first needs to convert its arguments to suitable XML representations (in other words, pickle/serialize the parameters). Similarly, a return value from an XML-RPC call can contain a nested data structure. Moreover, the .dumps() method of the xmlrpclib shares its name with an xml_pickle module (both inspired by several standard modules), and does the same thing -- writes the XML serialization without performing an actual call.
On first examination, xml_pickle and xmlrpclib appear to be functionally interchangeable, at least if one only cares about the serialization aspect. But as we will see, a closer look reveals some differences.
Let's create an object, then serialize it using two different approaches. Some contrasts will come to the fore:
Listing 2. Python shell example of XML-RPC serialization
>>> import xmlrpclib >>> class C: pass .. >>> c = C() >>> c.bool, c.int, c.tup = (xmlrpclib.True, 37, (11.2, 'spam') ) >>> print xmlrpclib.dumps((c,),'PyObject') <?xml version='1.0' ?> <methodCall> <methodName>PyObject</methodName> <params> <param> <value><struct> <member> <name>tup</name> <value><array><data> <value><double>11.2</double></value> <value><string>spam</string></value> </data></array></value> </member> <member> <name>bool</name> <value><boolean>1</boolean></value> </member> <member> <name>int</name> <value><int>37</int></value> </member> </struct></value> </param> </params> </methodCall> |
You should note a few things already. First, the whole XML document has a root <methodCall> element which is irrelevant to our current purposes. Other than a few bytes extra, however, the additional enclosing element is unimportant. Likewise, the <methodName> is superfluous, but the example gives a name that indicates the role of the document. Moreover, a call to xmlrpclib.dumps() accepts a tuple of objects, but we are only interested in "pickling" one (if there were others, they would have their own <param> element). But other than some wrapping, the attributes of our object are well-contained within the <struct> element's <member> elements.
Now let's look at what xml_pickle does (the object is the same as above):
Listing 3. Python shell example of XML-RPC serialization
>>> from xml_pickle import XML_Pickler >>> print XML_Pickler(c).dumps() <?xml version="1.0" ?> <!DOCTYPE PyObject SYSTEM "PyObjects.dtd"> <PyObject class="C" id="1840428"> <attr name="bool" type="PyObject" class="Boolean" id="1320396"> <attr name="value" type="numeric" value="1" /> </attr> <attr name="int" type="numeric" value="37" /> <attr name="tup" type="tuple" id="1130924"> <item type="numeric" value="11.199999999999999" /> <item type="string" value="spam" /> </attr> </PyObject> |
There is both less and more to the xml_pickle version (the actual sizes of both are comparable). Notice that even though Python does not have a built-in Boolean type, when you use a class to represent a new type, xml_pickle adjusts readily (albeit more verbosely). XML-RPC, by contrast, is limited to serializing its eight data types, and nothing else. Of course, two of those types,<array> and <struct>, are themselves collections and can be compound. In addition, xml_pickle can point multiple collection members to the same underlying object; this is absent by design from XML-RPC (and introduced in later versions of xml_pickle also). As a small matter, xml_pickle contains only a single numeric type attribute, but the actual pattern of the value attribute allows for decoding to integer, float, complex, and so on. No real generality is lost or gained by these strategies, although the XML-RPC style will appeal aesthetically to programmers working with statically typed languages.
The problem with XML-RPC as an object-serialization format is that it just plain does not have enough types to handle the objects in most high-level programming languages. Listing 4 illustrates this shortcoming.
Listing 4. Python shell example of XML-RPC overloading
>>> c = C()
>>> c.foo = 'bar'
>>> d = {'foo':'bar'}
>>> print xmlrpclib.dumps((c,d),'PyObjects')
<?xml version='1.0'
?>
<methodCall>
<methodName>PyObjects</methodName>
<params>
<param>
<value><struct>
<member>
<name>foo</name>
<value><string>bar</string></value>
</member>
</struct></value>
</param>
<param>
<value><struct>
<member>
<name>foo</name>
<value><string>bar</string></value>
</member>
</struct></value>
</param>
</params>
</methodCall> |
In Listing 4, two things are serialized -- an object instance and a dictionary. While it is fair to say that Python objects are particularly dictionary-like, you lose a lot of information by representing a dictionary and an object in exactly the same way. Additionally, the excessively generic meaning for <struct> in XML-RPC affects pretty much any OOP language, or at least any language that has native hash/dictionary constructs; it is not a Python quirk here. On the other hand, failing to distinguish Python tuples and lists within the <array> type of XML-RPC is a fairly Python-specific limitation.
xml_pickle handles all the Python types much better (including data types defined by user classes, as we saw). Actually, there is no direct pickling of dictionaries in xml_pickle, basically because no one has asked for this (it would be easy to add). But dictionaries that are object attributes get pickled, as shown in Listing 5.
Listing 5. Python shell example of xml_pickle dictionaries
>>> c, c2 = C(), C()
>>> c2.foo = 'bar'
>>> d = {'foo':'bar'}
>>> c.c, c.d = c2, d
>>> print XML_Pickler(c).dumps()
<?xml version="1.0"
?>
<!DOCTYPE PyObject SYSTEM "PyObjects.dtd">
<PyObject class="C" id="1917836">
<attr name="c" type="PyObject"
class="C" id="1981484">
<attr name="foo" type="string" value="bar" />
</attr>
<attr name="d" type="dict" id="1917900">
<entry>
<key type="string" value="foo" />
<val type="string" value="bar" />
</entry>
</attr>
</PyObject> |
Another virtue of the xml_pickle approach that is implied in the example is that dictionary keys need
not be strings. In XML-RPC <struct> elements, <name> keys are always strings. However, Perl, PHP, and most languages are closer to the XML-RPC model in this.
Unfortunately, xml_pickle lacks some types that many programming languages have. If our goal is not simply to save and restore Python objects, but to exchange objects across languages, then xml_pickle is not currently quite adequate. The issue of floats and integers is not really important in principle; but designing an "unpickler" for, say, Java would be easier if the XML parser were able to determine the type needed, rather than defer that until the format of the value attribute is analyzed.
Of more serious concern for cross-language pickling are the <boolean> and <dateTime.iso8601> tags that XML-RPC has, but Python lacks as a built-in type. Even though I claimed that xml_pickle handled user classes that define custom data types easily and well, this is not quite as true when it comes to the cross-language case. For example, the fragment of an xml_pickle representation in Listing 6 describes an iso8601 Date/Time:
Listing 6. xml_pickle version of an iso8601 Date/Time
<attr name="dte" type="PyObject" class="DateTime" id="1984076"> <attr name="value" type="string" value="20011122T17:28:55" /> </attr> |
Two issues make it difficult to utilize this data in, say, Perl or REBOL or PHP. One is the namespace of the restored class. In
Python, the namespace of the restored xmlrpclib.DateTime becomes, by default, xml_pickle.DateTime (but the namespaces can be manually manipulated prior to unpickling). The way Python's instantiation and namespaces work, little rests on this fact, at least not if we're interested in the instance attributes rather than its methods. But various languages handle scoping matters in very different ways.
The second and far more important issue is the fact that this custom class cannot be easily recognized as a native type in languages where it is one. Perl and PHP do not have a native DateTime type anyway, so nothing is really lost as long as unpicklers in those languages restore the value instance attribute. REBOL, by contrast, has many more native data types -- not just dates, but also exotic types like e-mail addresses and URLs. These are lost in the xml_pickle process. Of course, XML-RPC also loses those data types. Either way, we are left with plain string type to represent something more specific (or <base64> in XML-RPC, which xml_pickle handles by escaping high bit values -- for example, "\xff").
Conclusion: Where to go from here?
Neither XML-RPC nor xml_pickle are entirely satisfactory as means of representing the object instances of popular programming languages. But they both come pretty close. Let me suggest some approaches to patching up the short gap between these protocols and offer a general object serialization format.
"Fixing" xml_pickle is actually amazingly simple -- just add more types to the format. For
example, since xml_pickle was first developed, the UnicodeType has been added to Python. Adding complete support for it took exactly four lines of new code (although this was simplified slightly by the fact that XML is natively Unicode). Or again, at the request of users, the numeric module's ArrayType was added with little more work. Even if a type is not present in Python, a custom class can be defined within xml_pickle to add the behavior of that type -- for example, REBOL's "e-mail address" type may be supported with a fragment like this:
<attr name="my_address" type="email" value="mertz@gnosis.cx" /> |
Once unpickled, either xml_pickle could just treat "email" as a synonym for "string," or we could implement an EmailAddress class with some useful behaviors. One such behavior, if we took the latter route would be pickling into the above xml_pickle fragment.
"Fixing" XML-RPC is more difficult. It would be easy to suggest simply adding a bunch of new data types, and from a purely technical point of view there would be no particular problem with this. But as a social matter, XML-RPC's success makes it difficult to introduce incompatible changes: A hypothetical "data-enhanced" XML-RPC would not play nice with all the existing implementations and installations. Actually, some implementors have felt sufficiently bothered by the lack of a "nil" type that they have added a nonstandard (or at best semi-standard) type to correspond to Java null, Python None, Perl undef, SQL NONE, and the like. But the addition of many more types that only some programming languages use is not going to fly.
One approach to enhancing XML-RPC as an object serializer is to coopt the <struct> element to do double duty. Everything that is incompletely typed by standard XML-RPC could be wrapped in a <struct> with a single <member>, where the <name> indicates the special type. While existing XML-RPC libraries do not do this, the XML-RPC protocol and DTD are so simple that adding this behavior is
fairly trivial (but in most cases requires that the libraries be modified, not just wrapped).
For example, XML-RPC cannot natively describe the difference between Python lists and tuples. So the fragment in Listing 7 is incomplete as a description of a Python object.
Listing 7. XML-RPC fragment for either list or tuple
<array>
<data>
<value>
<double>11.2</double>
</value>
<value>
<string>spam</string>
</value>
</data>
</array> |
One could substitute the following representation, which is valid XML-RPC, and a suitable implementation could restore to a specific Python object:
Listing 8. XML-RPC fragment for a tuple
<struct>
<member>
<name>NEWTYPE:tuple</name>
<value>
<array>
<data>
<value>
<double>11.2</double>
</value>
<value>
<string>spam</string>
</value>
</data>
</array>
</value>
</member>
</struct> |
A true <struct> can be represented in two (or more) ways. First, every <struct> can be wrapped in another <struct> (maybe with the <name> OLDTYPE:struct, or the like). For Python, this is probably best anyway, since dictionaries and object instances are both NEWTYPEs. Second, the namespace-like prefix NEWTYPE: can be reserved for this special usage (accidental collision seems unlikely).
By design, xml_pickle is more naturally extensible for representing new data types than is XML-RPC. Moreover, extensions to xml_pickle maintain good backward compatibility across versions. As its designer, I am happy with the flexibility I have included for xml_pickle. However, the fact is that XML-RPC is far more widely used and implemented. Fortunately, with only slight extra layering -- and without breaking the underlying DTD -- XML-RPC can also be adapted to represent arbitrary data types. The mechanism is somewhat less elegant, but XML-RPC is well thought out enough to allow compatibility with existing implementations after these adaptations.
- Userland's XML-RPC home page (http://xmlrpc.com) is, naturally, the place to start investigating XML-RPC. Many useful resources can be found there.
- While at the XML-RPC home page, it is particularly worthwhile to investigate the tutorial and article links they provide (http://www.xmlrpc.com/directory/1568/tutorialspress).
- Kate Rhodes has written a nice comparison called "XML-RPC vs. SOAP" (http://weblog.masukomi.org/writings/xml-rpc_vs_soap.htm). In it, she points to a number of details that belie SOAP's description as a "lightweight" protocol.
- Richard P. Gabriel wrote the rather famous paper "Lisp: Good News, Bad News, How to Win Big" (http://www.ai.mit.edu/docs/articles//good-news/good-news.htm). What everyone reads and refers to is the section called "The Rise of 'Worse is Better'".
- The O'Reilly title Programming Web Services with XML-RPC (http://www.oreilly.com/catalog/progxmlrpc/), by Simon St. Laurent, Joe Johnston, and Edd Dumbill, is quite excellent. Its spirit matches that of XML-RPC itself.
-
xml_picklecan be found at: http://gnosis.cx/download/xml_pickle.py. - The associated DTD lives at: http://gnosis.cx/download/PyObjects.dtd.
- Secret Lab's
xmlrpcPython module can be found at: http://www.pythonware.com/products/xmlrpc/index.htm. - If you want to know how IBM's WebSphere Application Server (WAS) supports XML development, see this technical background info on XML in the WAS Advanced Edition 3.5 online help.
- Find out more on the WebSphere Developer Domain Studio zone.
- Find other articles in David Mertz's XML Matters column.

David Mertz puts the "lite" in "lightweight." David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/.