We live in a time that offers software developers the greatest choice ever of software development tools, application servers, and connectivity. Each choice that you make affects the scalability and reliability of your finished application, especially if you're building Web services. For example, in a recent study of SOAP encoding styles in a specific client environment, I found a 30-fold performance improvement by choosing one SOAP encoding style over the others. By understanding the performance impact of SOAP encoding styles, Web service development tools, application servers, and platforms, you can greatly improve system performance.
SOAP uses XML to marshal data that is transported to a software application. Most of the time, SOAP moves data between software objects, but the SOAP specification was intended to be useful for legacy systems as well as modern object-oriented systems. Consequently, SOAP defines more than one encoding method to convert data from a software object into XML format and back again. The SOAP-encoded data is packaged into the body of a message and sent to a host. The host then decodes the XML-formatted data back into a software object.
Since SOAP's introduction, three SOAP encoding styles have become popular and are reliably implemented across software vendors and technology providers:
- SOAP Remote Procedure Call (RPC) encoding, also known as Section 5 encoding, which is defined by the SOAP 1.1 specification
- SOAP Remote Procedure Call Literal encoding (SOAP RPC-literal), which uses RPC methods to make calls but uses an XML do-it-yourself method for marshalling the data
- SOAP document-style encoding, which is also known as message-style or document-literal encoding.
There are other encoding styles, but software developers have not widely adopted them, mostly because their promoters disagree on a standard. For example, Microsoft is promoting Direct Internet Message Exchange (DIME) to encode binary file data, while the rest of the world is promoting SOAP with Attachments. SOAP RPC encoding, RPC-literal, and document-style SOAP encoding have emerged as the encoding styles that a software developer can count on.
Before I discuss SOAP encoding style's impact on performance, you should understand the differences between the three styles. SOAP RPC is the encoding style that offers you the most simplicity. You make a call to a remote object, passing along any necessary parameters. The SOAP stack serializes the parameters into XML, moves the data to the destination using transports such as HTTP and SMTP, receives the response, deserializes the response back into objects, and returns the results to the calling method. Whew! SOAP RPC handles all the encoding and decoding, even for very complex data types, and binds to the remote object automatically.
Now, imagine that you have some data already in XML format. SOAP RPC also allows literal encoding of the XML data as a single field that is serialized and sent to the Web service host. This is what's referred to as RPC-literal encoding. Since there is only one parameter -- the XML tree -- the SOAP stack only needs to serialize one value. The SOAP stack still deals with the transport issues to get the request to the remote object. The stack binds the request to the remote object and handles the response.
Lastly, in a SOAP document-style call, the SOAP stack sends an entire XML document to a server without even requiring a return value. The message can contain any sort of XML data that is appropriate to the remote service. In SOAP document-style encoding, the developer handles everything, including determining the transport (e.g., HTTP, MQ, SMTP), marshaling and unmarshaling the body of the SOAP envelope, and parsing the XML in the request and response to find the needed data.
The three encoding systems are compared in Figure 1.
Figure 1. Which encoding style is right for you?
SOAP RPC encoding is easiest for the software developer; however, all that ease comes with a scalability and performance penalty. In SOAP RPC-literal encoding, you are more involved with handling XML parsing, but it requires there to be overhead for the SOAP stack to deal with. SOAP document-literal encoding is most difficult for the software developer, but consequently requires little SOAP overhead.
Why is SOAP RPC easier? With this encoding style, you only need to define the public object method in your code once; the SOAP stack unmarshals the request parameters into objects and passes them directly into the method call of your object. Otherwise, you are stuck with the task of parsing through the XML tree to find the data elements you need before you get to make the call to the public method.
There is an argument for parsing the XML data yourself: since you know the data in the XML tree best, your code will parse that data more efficiently than generalized SOAP stack code. You will find this when measuring scalability and performance in SOAP encoding styles.
But before going into that further, let's look at how enterprise information systems managers are coming to grips with SOAP encoding styles and scalability. (For more details on the different encoding styles, see Resources.)
Elsevier is the leading research content publisher for the science, technology, and medical industries. (See Resources for more information.) Elsevier's next-generation content publishing platform uses SOAP to build application programming interfaces. Elsevier's information managers needed to know if their choices of SOAP encoding style would scale and perform to handle millions of transactions every day. Their decisions would affect how Elsevier would invest capital in new infrastructure. Over time, these managers will need to know how new releases of their own software, new releases of application server software, and platform changes will affect scalability and performance.
Elsevier asked me to build a new test environment based on my free, open-source TestMaker utility (see Resources) to answer these scalability and performance questions. The Elsevier test environment that I delivered, illustrated in Figure 2, includes a Test Web Service (TWS) that handles RPC, RPC-literal, and document-style SOAP messages and installs on a variety of application servers. The environment is completed with a set of intelligent test agents to check TWS for scalability and performance.
Figure 2. The Elsevier test environment
TestMaker checks Web services for scalability, performance, and reliability. Software developers, QA analysts, and IT managers use TestMaker to build intelligent test agents that implement archetypal user behavior. The agents drive a Web service using native protocols (HTTP, HTTPS, SOAP, XML-RPC, SMTP, POP3, IMAP) just as a real user would. Running multiple intelligent test agents concurrently creates near-production-level loads to check a system's scalability and performance.
In addition to checking SOAP encoding scalability, the Elsevier test environment provides a benchmark specific to Elsevier's systems to show a performance comparison for a variety of application servers and platforms. For example, TWS is currently implemented to run on IBM WebSphere Studio, BEA WebLogic, and the SunONE Application Server. I am confident that ports to ElectricMinds Glue, Apache Axis, Systinet WASP, and other application servers would be straightforward.
I built the Elsevier test environment by customizing TestMaker to support SOAP RPC, SOAP RPC-literal, and SOAP document-style requests and by implementing TWS to respond to requests in these encoding styles. The request to TWS contains two parameters: the first defines the size of the response and the second defines a delay value before responding. TWS responds by creating a response document containing random gibberish words that appear in five response elements -- each element has one child element. A TestMaker test agent uses the Apache SOAP library to make requests to TWS. The test agent varies the number of concurrent requests to TWS and the payload size of the response. The test agent logs the results to a delimited log file, which is subsequently summarized by a tally script. The tally script determines the number of transactions per second (TPS) performed by the test by counting the duration of successful transactions. Success is defined as the absence of transport or SOAP faults.
With Sun Microsystems support, I ran the tests on Sun Solaris E4500 servers with 6 CPUs and 4 GB of RAM. The TWS uses the SOAP stack provided by the underlying application server. For example, WebSphere Studio provides Apache SOAP, BEA WebLogic provides its own implementation, which uses the JAX-RPC APIs, and the SunONE Application Server uses the Java 1.4 JAX-RPC library. On the client side, TestMaker uses the Apache SOAP library.
In the Elsevier project, I found that a developer's choice of encoding style greatly determines the scalability and performance of a Web service. The SOAP implementations universally showed scalability problems when using SOAP RPC encoding, especially as payload sizes increased, as illustrated in Figure 3.
Figure 3. Scalability problems become noticeable with increased payload sizes
As Figure 3 shows, the test agent recorded 294 transactions per second when making requests where the response SOAP envelope measured 600 bytes of SOAP RPC-encoded data. As the test agent increased the response size, the transactions per second plummeted. When making requests of 96,000 bytes of SOAP RPC-encoded data, the agent measured only 9.5 transactions per second.
When the test environment used SOAP document-style encoding, the performance fared much better, as you can see in Figure 4.
Figure 4. Document-style encoding: Performance stays relatively stable with increased payload sizes
With 600 bytes of document-encoded data, the test agent measured 469 TPS. Recall that the SOAP RPC-encoded requests gave 294 TPS for requests of the same size. Additionally, when the test agent increased the response size, the TPS values did not degrade significantly when the test environment used document-style encoded responses.
When the test environment used SOAP RPC-literal encoding, I found an efficient middle ground, as you can see in Figure 5.
Figure 5. SOAP RPC-literal provides the performance benefits of SOAP document-style encoding with a little more work required to parse through the XML data
With 600 bytes of SOAP RPC-literal encoded data, the test agent measured 422 TPS. That is nearly the performance recorded for SOAP document-style requests. SOAP RPC-literal encoding did not show the plummeting TPS function of SOAP RPC-encoded performance as payload size increased.
You may be wondering how the various application servers compare in scalability and performance. In my experience every production environment is unique. So, rather than show you a comparison here, I urge you to download the test environment yourself and try it in your own production environment. I've made a generalized version of the Elsevier test environment available for free download for your immediate use (see Resources).
While the SOAP encoding styles provide a good range of power and flexibility, they also introduce interoperability problems. Most of the SOAP tools on the Java platform default to SOAP RPC encoding styles. For example, with IBM WebSphere Studio Application Developer, the default encoding style is set to SOAP RPC. On the other side of the divide, .NET development tools implement document-style SOAP calls by default. This is akin to watching two boats pass in the night. Both can be made to interoperate, but developers need to be wise to the different encoding styles to avoid problems.
In their attempt to make software developers' lives easier, these tools may be making decisions for you that affect scalability. This was highlighted when Microsoft and Sun debated the relative virtues of J2EE and .NET at a recent event that the Software Development Forum hosted in Silicon Valley. Microsoft made the argument that it serves developers best by being the sole supplier of a complete solution. On the other end of the spectrum, Sun posited that developers should have a choice of tools that they can assemble into a solution. This top-down versus bottom-up argument permeates into both companies' development tools. For example, representatives from Sun and Microsoft were asked to explain why developers would choose SOAP RPC encoding over SOAP document-style encoding. Microsoft's reps gave a somewhat technical answer, but conceded that they thought the issue was moot, since developers should rely on their development tools to make decisions about encoding styles.
Software developers serve themselves best by making informed decisions on how helpful their development tools and environments should be. Understanding each tool's handling of SOAP encoding styles is an important factor if you plan to deliver reliable, high-performance software projects.
Another place where developers should watch for scalability impact is in SOAP stacks in application servers. While building the Elsevier test environment, I noticed that each Web service platform had its own implementation of a SOAP stack -- some even shipped with more than one stack. For example, BEA WebLogic Server comes with one SOAP stack that implements the JAX-RPC API and another that implements the Java Web Services (JWS) APIs. IBM WebSphere Studio V4 comes with the DOM-based Apache SOAP library, but WebSphere Studio V5 has the SAX-based Apache AXIS library. I would love the opportunity to test the differences between all of these in some future project.
I also noticed that, while WSDL did a fair job at describing the interface to a SOAP service, often the WSDL was not complete. I found that a network monitor was necessary to actually see what values were moved over HTTP transport.
One thing that struck me while building the test environment for Elsevier was the nature of SOAP implementations. In 2002, BEA and IBM announced major new versions of their Web service application servers. Consequently, reliability and performance of SOAP application servers is a moving target. This further reinforces the chief lesson learned here: that you should use a dynamic test environment to check reliability and performance now -- and keep it handy to test new implementations as they become available.
- For more details on document-style encoding, read "Reap the benefits of document style Web services," by James McCarthy (developerWorks, June 2002).
- Gavin Bong gets into the details of RPC encoding in his series, "Apache SOAP type mapping" (developerWorks, April 2002).
- Find out more about Elsevier.
- Learn about the WebSphere Studio Application Server.
Frank Cohen is the go-to guy for enterprises that need to solve scalability and performance problems in information systems, especially Web services. Frank maintains TestMaker, a popular, free, and open-source utility that checks systems for scalability and reliability. Contact Frank at email@example.com.