Level: Introductory Graham Glass (graham-glass@mindspring.com), CEO/Chief Architect, The Mind Electric
01 Jan 2001 This article provides an explanation of how SOAP works, including information about its on-the-wire protocol and how messages are processed. It also explains how objects can be passed by value between Web services, and touches on performance and security issues.
Welcome to the third installment of my column that focuses on the evolutionary
and revolutionary aspects of Web services technology. In the second installment,
I showed how to build, deploy, and invoke a simple Web service using
the Apache Simple Object Access Protocol (SOAP) implementation. In this installment, I'll explain how SOAP
works under the hood, which is technically interesting and will help to
demystify the future standards that we'll discuss like Web Services Description Language (WSDL) and Uniform Description, Discovery and Integration (UDDI) standard (see Resources).
Tools and installation
No new software needs to be installed since we are using the same tools as in the previous installment. Additionally, we will use the examples from the previous installment, so be sure you still have the
\demo1
directory
with the IExchange.java, Exchange.java, and Client.java
source files present.
One new software tool I will introduce this time around
is the TCP tunneling GUI, which allows you to see the TCP messages move
between the client and server. This tool will allow you to examine
the SOAP on-the-wire protocol and even check that the various competing
SOAP implementations are standards-compliant.
Sniffing SOAP
The TCP tunneling GUI allows you to "sniff" SOAP messages by acting
as a TCP router. To see it in action, first start the Apache Tomcat server
(see last issue in Resources) in the \demo1 directory
by typing:
> cd \demo1
> tomcat run
Tomcat starts up on port 8080 by default. Then start the TCP tunneler
in a separate window by typing:
> java org.apache.soap.util.net.TcpTunnelGui 8070 localhost 8080
This starts a TCP server on port 8070 that acts as an intermediary between
any client and port 8080 on the local host, which in this case is serviced
by Tomcat. The GUI (see Figure 1) will display outgoing
messages in the left pane and incoming replies in the right pane. Figure 1: The TCP Tunneler Interface

Now edit the Client.java code from the previous example and change the
Web service URL to use port 8070 instead of 8080. The first line of the
client program should now read:
URL url = new URL( "http://localhost:8070/soap/servlet/rpcrouter" );
Then compile and run the client. The GUI should display Figure
2. Figure 2: Running the Tunneler

As you can see, the left-hand pane shows an outgoing SOAP request comprised
of 5 lines of standard HTTP header followed by an XML document that represents
the client service invocation. The right-hand pane shows the resultant
SOAP response consisting of 6 lines of standard HTTP header followed by
an XML document that is the server response. Even without an explanation
of the SOAP format, you can probably figure out what most of it means.
Contrast this with the CORBA and DCOM protocols, which are binary, not
self-describing, and tough to trace. I know this firsthand, having written a CORBA ORB in a previous
lifetime.
Anatomy of a SOAP Request
The first thing to mention is that although this particular set up uses HTTP to deliver
SOAP messages, SOAP can ride on any other transport protocol. You can use
SMTP, the Internet email protocol, for example, to deliver SOAP messages. The header
differs between transport layers, but the XML payload remains the same.
A complete SOAP/HTTP request, with the XML content indented for clarity,
is shown in Listing 1. A SOAP request is sent as an HTTP POST with the content type set to
text/xml and a field called SOAPAction set to either
an empty string or the name of the SOAP method. The SOAPAction
field allows a receiving Web server to detect that it's an incoming SOAP
message and potentially route or filter it. The XML part of the SOAP request consists of three main portions:
-
The
Envelope
defines the various namespaces that are used by the
rest of the SOAP message, typically including xmlns:SOAP-ENV (SOAP
Envelope namespace), xmlns:xsi (XML Schema for Instances) and
xmlns:xsd
(XML Schema for DataTypes).
-
The
Header
is an optional element for carrying auxiliary information
for authentication, transactions, and payments. Any element in a SOAP processing
chain can add or delete items from the Header; elements can also choose to ignore
items if they are unknown. If a Header is present, it must be the first
child of the Envelope. Because our example is simple and does not involve
routers, the Header is absent.
-
The
Body
is the main payload of the message. When SOAP is used to
perform an RPC call, the Body contains a single element that contains the
method name, arguments, and Web service target address. The namespace of
the element is equal to the target address, and the base name is the method
name. In this example, ns1:getRate indicates that the target address
is urn:demo1:exchange (the expanded form of ns1) and
the method name is getRate. If a Header is present, the Body must
be its immediate sibling, otherwise it must be the first child of the Envelope.
When using SOAP as a remote procedure call (RPC) system, the SOAP parameters can be typed or untyped. The current
version of Apache only accepts typed arguments, although there is a version
in the works that will properly allow untyped arguments as well. The default
SOAP encoding scheme uses the xsi:type attribute to indicate an
XSD type. XSD defines several basic types, including int, byte, short,
boolean, string, float, double, date, time and URL. It also specifies a
format for sending arrays and blocks of opaque data. Because SOAP is intended to be platform and language neutral, XSD does
not define formats for encoding objects or structures unique to a single
language. Later in this article, I'll show you how to send Java objects
between machines that are both running Apache SOAP 2.0. The SOAP/HTTP message is typically accepted by a servlet running on
a Web server. In this example, Apache 2.0 installs a servlet into Tomcat
that accepts HTTP requests on soap/servlet/rpcrouter. When the
servlet gets a request, it first checks that the request has a SOAPAction
field; if it does, it forwards it to the Apache SOAP engine. The engine
then parses the XML payload and uses the target Web service address to
perform a lookup in its local registry (populated at startup when it reads
the DeployedServices.ds file). It then uses reflection to locate
and invoke the specified method on the service and get the result. The next section describes the format of the SOAP/HTTP response that
is used to return the result back to the client.
Anatomy of a SOAP Response
A SOAP/HTTP response (see Listing 2) is returned as
an XML document within a standard HTTP reply whose content type is set
to text/xml. The XML document is structured just like the request
except that the Body contains the encoded method result. The namespace
of the result is the original target object URI and the base name is the
name of the method that was invoked. The XSI/XSD tagging scheme is optionally
used to denote the type of the result (see Resources). The SOAP standard does not specify
what should be returned from a void method, although most implementations
simply omit the <return> part of the Body.
SOAP Exceptions
If an exception occurs at any time during the processing of a message,
a SOAP exception is thrown. SOAP exceptions are encoded in a manner similar
to regular SOAP responses, with the exception that the Body is used to contain information
about the exception. To illustrate this, edit the Exchange.java
file that describes our currency exchange Web service and force it to throw
an exception as in Listing 3. Listing 3: A SOAP Exception
public class Exchange implements IExchange
{
public float getRate( String country1, String country2 )
{
throw new RuntimeException( "cannot calculate rate" );
/*
System.out.println( "getRate( " + country1 + ", " + country2 + " )" );
return 144.52F; // always return the same value for now
*/
}
} |
Then shutdown and restart Tomcat so that it uses the new version of
the Web service. The invoke the service using the Client program. The output from the TCP tunneling GUI is shown in Figure 3. Figure 3: Tunneler output on SOAP Exception

The standard HTTP reply header indicates an exception by using status
code 500. The XML payload contains an Envelope and Body just like a regular
response, except that the content of the Body is a Fault structure whose
fields are defined as follows:
-
Faultcode
is a code which indicates the type of the fault. The valid
values are SOAP-ENV:Client (incorrectly formed message), SOAP-ENV:Server
(delivery problem), SOAP-ENV:VersionMismatch (invalid namespace
for Envelope element) and SOAP-ENV:MustUnderstand (error processing
header content).
-
Faultstring
is a human readable description of the fault.
-
Faultactor
is an optional field that indicates the URI of the source
of the fault.
-
Detail
is an application-specific XML that contains detailed information
about the fault.
Some SOAP implementations use the detail element to encode information
about remote exceptions such as their type, data, and stack trace so that
they can be rethrown automatically on the client. This allows Remote Method Invocation (RMI)-style
remote exceptions to be implemented using SOAP. Of course, if the client
and server are not using the same SOAP implementation, the feature is automatically
disabled.
Performance
Now that you've seen how SOAP messages are passed back and forth using
HTTP and XML, it is interesting to contemplate performance issues. CORBA, DCOM, and RMI use binary encoding for arguments and return values.
In addition, they assume that both the sender and the receiver have full
knowledge of the message context and do not encode any meta-information
such as the names or types of the arguments. This approach results in good
performance, but makes it hard for intermediaries to process messages.
And since each system uses a different binary encoding, it's hard to build
systems that interoperate. Because SOAP uses XML to encode messages, it's very easy to process
messages at every step of the invocation process. In addition, the ease
of debugging SOAP messages is leading to a quick convergence of the various
SOAP implementations, which is important because large scale interoperability
is what SOAP is all about. On the surface, it seems that an XML-based scheme would be intrinsically
slower than that of a binary-based model, but it's not as straightforward
as that. First, when SOAP is used for sending messages across the Internet,
the time to encode/decode the messages at each endpoint is tiny compared
with the time to transfer the bytes between endpoints, so using XML in
this case is not significant. Second, when SOAP is used to send messages between endpoints in a closed
environment, such as between departments within the same company, its likely
that the endpoints will be running the same implementation of SOAP. In
this case, there are opportunities for optimizations that are unique to
that particular implementation. For example, a SOAP client could add an
HTTP header tag to a SOAP request that indicates that it supports a particular
optimization. If the SOAP server also supports that optimization, it could
return a HTTP header tag in the first SOAP response that tells the client
that its okay to use that optimization in subsequent communications. At that
point, both the client and server could start using the optimization. In my experiments with Apache, I normally get about 30 round trip messages
per second between Java programs on the same machine. I have been evaluating
another SOAP implementation that gets about 700 round trip messages per
second in the same configuration, so obviously there is a lot of room for
improvement.
Passing objects
Once the initial excitement of building and deploying a simple service
has worn off, many developers wish to start sending objects between clients
and servers. There are at least two approaches for performing this task. If you can guarantee that both the client and the server are configured
with the same proprietary extensions, then you could pass objects between
the machines using whatever approach you wanted. For example, if the client
and server are both running Java programming language, they could agree to send objects using
Java serialization. On the other hand, if you want your system to work with any combination
of client and server, start off by specifying XML schemas for the objects
that you wish to send. Then ensure that both client and server are configured
to recognize XML documents conforming to these schemas and can convert
them to or from objects as necessary. For example, if a C++ client wanted
to send an object to a Java server, the client should be able to convert
C++ objects to or from the schema, and the server should be able to convert
Java objects to/from the same schema. Apache SOAP 2.0 includes a flexible scheme for passing objects between
SOAP engines by including a special mapping registry. You can register
custom serializers/deserializers that are used to encode/decode data. The
current documentation is pretty sparse on how to create your own serializers,
but fortunately a useful default Java BeanSerializer is included that can serialize/deserialize
any class conforming to the JavaBean specification -- it must include a
public default constructor and declare public set/get methods for all its
attributes. The following program illustrates how you can send and receive objects.
A Purchasing Web service is deployed on the Tomcat server and then the
Client sends an Invoice object by value. The object is displayed when it
arrives on the server and then redisplayed when it is returned back to
the client. You can also pass the object by reference in SOAP, as I will
explain in a future installment of this column. The sample source code for this Purchasing Web service is given in Listings
4 through 7. Listing 4. Code for IPurchasing.java
public interface IPurchasing
{
Invoice receive( Invoice invoice );
} |
Listing 5. Code for Purchasing.java
public class Purchasing implements IPurchasing
{
public Invoice receive( Invoice invoice )
{
System.out.println( "got invoice " + invoice );
return invoice;
}
} |
Listing 6. Code for Invoice.java
public class Invoice
{
String name;
int amount;
public Invoice()
{
}
public Invoice( String name, int amount )
{
this.name = name;
this.amount = amount;
}
public String toString()
{
return "Invoice( " + name + ", " + amount + " )";
}
public void setName( String name )
{
this.name = name;
}
public String getName()
{
return name;
}
public void setAmount( int amount )
{
this.amount = amount;
}
public int getAmount()
{
return amount;
}
} |
Listing 7. Code for Client2.java
To run the program, perform the following steps:
-
Copy the following Java files into a new directory called
\demo2
and add \demo2 to your CLASSPATH.
-
Compile the source files.
-
Run the TCP tunneling GUI as before, listening to port 8070 and routing
to/from port 8080.
-
Start Tomcat in the
\demo2 directory.
-
Open a browser and enter the URL
http://localhost:8080/soap
-
Use the admin interface to deploy a Purchasing service. Enter the following
values:
-
ID=urn:demo2:purchasing
-
Scope=Request
-
Methods=receive
-
Provider Type=Java
-
Provider Class=Purchasing
-
Static=No
-
Number of Mappings=1
-
Encoding Style=SOAP
-
Namespace URI=urn:my_encoding
-
Local Part=Invoice
-
Java Type=Invoice
-
Java To XML Serializer=org.apache.soap.encoding.soapenc.BeanSerializer
-
XML To Java Deserializer=org.apache.soap.encoding.soapenc.BeanSerializer
-
Run the Client.
The Namespace URI and Local Part values are used by the
Apache SOAP engine for the xsi:type attribute value. It is not
particular important what these values are, as long as they don't clash
with any other mappings you use. Assuming all goes well, you should see the results in the TCP
GUI in Figure 4, and the client output in Figure 5. Figure 4: Running the Purchasing Web service

Figure 5: The Purchasing Web service client output
window

Security
Security is a complex topic, and the current approaches to securing
SOAP seem to be taking a layered approach. At the lowest level, SOAP messages can be passed over HTTPS. Since HTTPS
uses SSL for its transport, this ensures that the contents of the message
are encoded to prevent snooping, and that the client and server can verify
each other's identity. SOAP solutions based on SSL are available today,
but not particularly easy to install and configure. Although HTTPS solves the problem of shielding messages from eavesdroppers,
it doesn't help much with the kind of finer grain security that is necessary
to authenticate particular users of specific Web services. Many Web services
will require a user to obtain some kind of user/password combination during
initial registration and then use this authentication information when
accessing the Web service in future. The UDDI Web services registries that are hosted online by IBM,
Microsoft, and Ariba are examples of Web services that require a user to
register before being able to use the publish service. There is no standard
yet for how a Web service supports registration and authentication, but
such a standard will no doubt emerge soon. Microsoft and Verisign recently
announced their work on a standard called XML Key Management Specification (XKMS)
that they will submit to the standards bodies for possible incorporation
into SOAP and other systems (see Resources). I intend to devote more time to the topic of security in a future installment
of this column, when hopefully progress on the security standards front
has progressed.
Next installment
In the next installment, I'm going to introduce WSDL (Web Services Description
Language) and UDDI (Universal Description, Discovery and Integration),
two important standards that will help move Web services technology into
the mainstream.
Resources
About the author  | |  |
Graham Glass is founder, CEO, and Chief Architect of The Mind Electric, which designs, builds and licenses forward-thinking distributed computing infrastructure. He believes that the evolution of the Internet will mirror that of a biological mind, and that architectures for helping people and businesses to network effectively will provide insight into those that wire together the human brain. He can be reached at graham-glass@mindspring.com. |
Rate this page
|