Like almost anything that is related to XML, the Simple Object Access Protocol (SOAP) has received plenty of press lately. It may come as a surprise to you that while SOAP's window dressing is new, what's present under the hood dates back years, even decades. In this article, I cut through the hype surrounding SOAP and look at what it's supposed to be, what it actually is, and how it stacks up to similar technologies. As always with my articles, the bottom line is to determine whether this technology works for you, and here I'll try to get beyond the buzzword-mania SOAP comes with and identify the value it can bring to your applications.
I'll start with a quick look at the acronym soup that makes up SOAP, including its less-than-auspicious origins in RPC (remote procedure calls), and its use of XML to solve some of RPC's early problems. Next I'll address the features SOAP brings to the table that normal XML-RPC toolkits do not deliver, and why these additions are, or aren't, important. From there, I'll go on to compare SOAP, and RPC in general, with one of its biggest competitors, remote method invocation (RMI). I'll discuss the RPC model, the RMI information-flow model, and advantages of using XML in this context. I'll also take a look at how to make SOAP work for you. Finally, I'll cover the actual practicalities versus future promises of SOAP, and whether the underlying XML is the complete answer for your communication needs, or just part of a larger equation.
SOAP is YAA--"yet another acronym." (Yes, I'm using acronyms for acronyms; if you are confused, look up sarcasm in the dictionary!) As mentioned above, "SOAP" stands for Simple Object Access Protocol, and like most acronyms these days, it doesn't mean much on its own. Just another four letters to remember, right? It's actually more complicated than that. Under the hood, SOAP requires knowledge of several more acronyms. It's based on remote procedure calls, or RPC. On top of that, it's layered on top of the more traditional XML-RPC. Add to that the fact that SOAP requires basic knowledge of how remote procedure calls work, and you're looking at stirring into the soup the acronyms XDR (the External Data Representation standard) and HTTP (Hypertext Transfer Protocol). That's quite a lot to remember, wouldn't you say? To understand the true nature of SOAP, it's necessary to take a look at its underpinnings.
This discussion on RPC will clarify the aspects of SOAP that are intrinsic to SOAP specifically. Here you may discover that the things you like about SOAP aren't related to SOAP directly but, in fact, are features of XML-RPC. This brings up an important point in using any technology: Make sure you don't use an overly complex package to address what may be your rather simple needs. As I'll say many times in this article, use what you need, and throw out what you don't. There's no place for kitchen sinks in your applications. With that said, it's time to take a stroll down memory lane.
The road to SOAP starts with RPC, which is an entirely different model from what you may be used to using if you've been doing object-oriented programming for a while. RPC is a message-based means of making requests over a network. Rather than try to explain this thing in the abstract, I'd like to look at a practical example.
Think about applications that perform work on the human genome project. These applications allow scientists to enter information and receive computed results while the core of the application performs massive computations. As these computations often take an incredibly long time, additional information can come in before an equation is finished. So the ideal scenario is the following: A scientist enters data into the application. The computation starts. The scientist, and the application, continue to execute. The program doesn't pause and wait for these huge computations to finish before accepting new data; it merely launches the process. (Think about it as starting a thread in Java.) More information becomes available, it is entered into the program, the calculations are modified, and the process continues (in that thread in the background). At some point, without user intervention, the program completes the calculation and renders the results.
RPC processes versus OO interactions
The interaction described in the RPC example is very different from OO-type interactions in which you invoke a method, that method completes, and the result is returned. In the RPC situation I just described, data is sent to the machine that performs the complex calculations, and that machine sends back a response equivalent to "OK, I've started." The computations go, and go, and go. Meanwhile the application continues forward, without waiting on the results of the computations. So how does this work? Well, it's the basis of what RPC actually is. RPC is message-based: A message was sent to the big server, and, at some point, a message saying "OK, I'm done" will be received from that big server. Are you beginning to see the difference in methodology here between RPC and the OO approach you are probably used to?
Now, in RPC, the data sent to the machine performing the computations
has to be encoded, since that machine is often communicated to over
a network. The data has to be converted to some format that can be transmitted
easily across this network. So the arguments (the data) -- whether they are
floating-point numbers, integers, character strings, or complex objects -- are
encoded into a format that can be transferred to the RPC receiver. In RPC,
this format is the External Data Representation (XDR) standard I mentioned
earlier. The receiver then decodes the message from XDR and does something
with the data. A return message follows the same process in reverse, moving
from the server to the client. These messages often are sent across HTTP,
since it is easy to use. So if all you want is this procedural
means of communication, then RPC may be for you. In other words, SOAP doesn't
provide this functionality; the RPC base that underlies SOAP does. Keep
in mind that nothing here is specific to SOAP; I'm talking about the basic
underpinnings of any RPC system.
You may already have figured out the problem with RPC: the encoding. As I've shown, RPC uses the XDR encoding format. Of course, if you're like me, when I first saw XDR I said "What the heck is that?" Well, it was created specifically for RPC. Initially, this encoding was useless in any other context. Because it remained too closely tied to RPC, developers used XDR only for RPC in nearly all their applications. It was simply not suitable for use in any other form of communication or data representation. In 1998, however, with XML on the rise, developers began to connect the dots, realizing that XML might be able to replace XDR as the RPC data encoding format. They also realized that XML could be used in many other places within an application, such as for database storage and presentation. XML-RPC is the realization of this idea: it is nothing more than ordinary RPC, but with XML -- instead of XDR -- as the encoding format.
XML is a standard language, and it is very easy to write encoders and decoders for. But the advantages go beyond this. Converting a character stream into XML, and parsing and understanding that XML is easy as pie. XML buys you a whole lot more than just ease of encoding and decoding. It is increasingly common for data to be stored as XML. Consider, for example, the fact that your XML data can be sent, relatively unchanged, into an XML-RPC call with little to no overhead. Also, imagine the XML transformations (XSLT) that occur allowing XML-RPC systems with disparate formats to convert from one to the other easily. Finally, picture an XML-RPC sender at one place of business sending an XML-RPC payload to another business. Rather than dealing with it strictly as RPC, the business that takes that call uses that information in a B2B fashion (sorry, another acronym, this one for "business-to-business"), and then returns an RPC-style response. You suddenly find yourself writing B2B applications using nothing other than XML-RPC, a library you can get for free for almost any programming language!
I'd say that close to 75 percent of those who are enamored with SOAP actually are enamored with simple XML-RPC. If you find yourself thinking: "I can send XML remote calls and use simple libraries to do it? Wow, that's great! I'm going to start using SOAP today," then hold your horses on SOAP. You can skip the SOAP download and just stick with simple XML-RPC (see the links in the Resources section). You'll be happier with stable code, simpler interfaces, and much less overhead. In suggesting that you can use less processing power to do the same job, I know I'm treading on sacred ground here. But remember the mantra, "use what you need, and throw out what you don't." With that in mind, this next section covers what you do get with SOAP.
The big question is: "What does SOAP add to the equation?" The answer is fairly simple. The note on SOAP submitted to the W3C defines two major items that SOAP contains in addition to its basic XML-RPC features. The first is an envelope, which carries information about the included message. The second is a set of rules for encoding application-specific data types. Let's look at these items.
The SOAP envelope is analogous to the envelope of an actual letter.
It supplies information about the message that is being encoded in a SOAP
payload, including data relating to the recipient and sender, as well as
details about the message itself. For example, the header of the SOAP envelope
can specify exactly how a message must be processed. This means that before
an application goes forward with processing a message, the application can determine
information about a message, including whether it can
process the message. Distinct from the situation with standard XML-RPC
calls, with SOAP you have actual interpretation occurring in order to determine
something about the message. A typical SOAP message also can include the
encoding style, which assists the recipient in interpreting the message.
Listing 1 shows the SOAP envelope, complete with the specified
encoding.
A simple set of encoding rules
The second major element that SOAP brings to the table is a simple means of encoding user-defined data types. In RPC (and XML-RPC), encoding can only occur for a predefined set of data types. Encoding other types requires modifying the actual RPC server and client themselves. With SOAP, however, XML schemas can be used to easily specify new data types (using the complexType structure), and those new types then can be easily represented in XML as part of a SOAP payload. Getting into exactly how this works is beyond the scope of this article and would require me to go into great detail about both SOAP and XML schemas. For the purposes of this article, it is sufficient to show how easy it is to encode any data type in a SOAP message that you can logically describe in an XML schema.
From the discussion so far, you can see that SOAP is made up of not just one but three acronyms. If all you need is a means to send messages across the wire, stick with XML-RPC. But if you find yourself constantly fretting over trying to communicate using complex, user-defined types, if your applications must inspect a message for instructions prior to processing the message, or if you want to subscribe to the latest trend, then SOAP is for you. If I sound a bit skeptical about the uses of SOAP, I am. If you are sending complex data types across a network, you're probably spending as much time encoding and decoding these structures into XML as you would working with a technology like RMI (remote method invocation). Additionally, all things being even, RMI is almost always recommended over SOAP. RMI has been around a lot longer, which means it has fewer bugs, has had more time to mature, and has gained general acceptance in programming communities. Does that mean I'd never use SOAP or never recommend it? Certainly not! But as with any new technology, caution and a good reason to use a technology will save you the embarrassment of trying to explain to your pointy-haired boss that your desire to use new technology outweighed common sense.
In the next article, I'll continue my in-depth look at SOAP. First, I'll pick up where this first part leaves off, comparing and contrasting (sound like a high-school English assignment?) RPC and RMI. Not surprisingly, there will be times when SOAP and RPC make sense and times when RMI does, so my next article in this series will help identify the tradeoffs involved. Next I'll examine what it means to use SOAP today, taking a look at what it buys you, what's still missing, and what's in the future for the emerging standard. Through it all, I'll keep a rather skeptical eye out to be sure that this SOAP evaluation avoids the hype the protocol has attracted.
I hope I've demonstrated that SOAP is not the magic bullet that some people believe it to be. Even more importantly, I hope you can see that many of SOAP's "features," rather than being unique to SOAP, actually are parts of RPC and XML-RPC. I did identify some specific features of SOAP, such as the SOAP envelope.
Is it possible to make any conclusions at this point about the value and feasibility of using SOAP in your own work? Absolutely! First, and this is nothing new, you should always keep your eye firmly on business needs, not technology needs. While SOAP is lots of fun to play with and very chic among all your geek friends, the fact is, if it doesn't offer you a way to solve your problems, it's probably going to waste a lot of time. Also, and this is an important point, it's very possible that the task for which you've chosen to use SOAP could be accomplished more easily using XML-RPC. So don't be fooled by the hype. SOAP has arrived, but it's not a stranger in a strange land. Instead, SOAP is just the big, sometimes bloated, brother of technologies that have been around for quite a while and often are easier to use. I'll see you next time, when I'll carve even deeper into SOAP and talk more about what it can do for you.
-
Bone up on SOAP by reading the note at the W3C: SOAP
1.1 Note
-
Get the implementation from XML Apache at xml.apache.org.
-
Compare IBM's SOAP implementation with Microsoft's in these two articles:
MS SOAP
SDK vs IBM SOAP4J
and
MS
SOAP SDK vs. IBM/Apache XML-SOAP
.
-
Get the lowdown on XML-RPC at the Userland XML-RPC
Web site.
-
Read more from Brett on XML and related technologies in his book,
Java
and XML
.
- A page on IBM's SOAP security extensions describes proposals (including a link to the Note to the W3C) for adding security safeguards to SOAP implementations.

Brett McLaughlin specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he recently founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project, EJBoss EJB server, and a co-founder of the Apache Turbine project. You can contact him at brett@oreilly.com.




