Three easy ways to make your life miserable
As a consultant with IBM® Software Services and Support, I see a lot of customers use XML and Web services in a lot of different ways. I’ve seen many users be successful with Web services, trimming costs and improving reuse, and I’ve also seen users dig themselves into a hole by using Web services and XML in inappropriate ways. Reflecting on these experiences, I have identified three particular situations that tend to cause developers great pain and suffering when using XML and Web services. I will take you through these three common pain points here, explain why they occur, and examine some alternative approaches that just might solve the underlying problems. Let’s begin.
"My message ate my server!"
Sending extremely large messages, or -- even worse -- extremely large opaque (binary) messages over Web services transports.
Why does it happen?
I’ve seen this problem a number of times, and it always has the same set of symptoms. It begins with a user either calling IBM Support for one of the WebSphere products because they are experiencing out-of-memory errors, or engaging one of our consultants to help them solve "performance problems" with their Web services architecture. In this situation, we might find that the server processing the XML has extremely high processing loads, with CPU utilization hovering close to 100%. We might also see low throughput and high network latency.
When we begin to examine the structure of the Web services messages, we discover that what is causing these problems that is that the messages are very large -- often 50 megabytes or more. Looking deeper into the services, we usually find a common thread: that these messages often contain very large embedded binary information encoded using base-64 encoding as part of the main body of the XML message.
Now, why does this occur? Well, this situation is often what happens when a developer looks at Web services as a one-for-one replacement for EDI or FTP. However, the basic underlying problem is that the developer does not understand the limitations of the technology: XML processing is very useful for a lot of problems, but you have to realize that messages are parsed -- and in most WebSphere products, that means that much or all of the message will be held in memory.
How do you fix it?
When you encounter this situation, there are two options you can try to improve it:
- Don’t send redundant information. When sending binary data, in many cases, you might find that the messages are highly repetitive. If this is true, you might want to consider using compression at the HTTP level to improve your network latency. While this won’t help with your processing load, it might help mitigate one of the problems.
- Don’t embed the binary information in the body of the XML message at all. This is the better solution, and there are a few different ways you can achieve this. For example, you could instead use SOAP with Attachments or the Message Transmission Optimization Mechanism (MTOM) to bypass the parsing overhead, although this doesn’t help with the issue of network latency.
An even better option might be to not send the large binary blob using SOAP at all. Instead, use an out-of-band transmission through a managed file transfer system, like IBM WebSphere MQ File Transfer Edition or the Claim Check pattern to avoid sending large binary files over SOAP and HTTP altogether.
"Pardon me, your data is showing."
Using Web services in the wrong place in an architecture, to expose low-level data access through Web services.
Why does it happen?
On occasion, we are brought in to deal with unspecified “performance problems” in an SOA architecture, or to consult on a Web services architecture that is not providing the reuse and maintenance benefits that were promised from SOA. When we begin digging, often find that Web services are indeed being used, but the level of Web services that are being used are very, very low -- often with each Web service corresponding to a single SQL statement. In the worst cases of this, the Web service itself doesn’t even reformat the data into a meaningful domain schema; rather, it presents the data “raw” as it comes from the database. Why does this happen? In my opinion, this derives from a misunderstanding of SOA architectural principles.
One of the key principles of a good SOA architecture is that your services should be at a high enough level to be reusable. The SOMA method defines what it calls a Services Litmus Test that can be used to assess each service for:
- Business alignment
- Externalized service description
- Redundancy elimination.
It’s on the first two of these, business alignment and composability, that such low-level data services often fail. If a service corresponds to a single query, it usually does not align well with a particular business function; it’s simply too fine-grained to represent the business function in its entirely. Likewise, such low-level services often fail the composability test, since they encode low-level implementation details (like the structure of the database) into the description of the service.
How do you fix it?
It’s very common to see this type of service emerge in designs where the services are derived “bottom up” from existing code; often, developers will look at their existing architectural diagrams and decide to turn each layer in the architecture -- including the persistence layer -- into a set of services.
Instead, it would be better to apply coarse-grained Web services in the right place in an SOA architecture -- again, looking at a standard layered model of an architecture, there is often a well-defined place in the architecture that already encapsulates the business logic of a system. These services can be wrapped using the Remote Facade Pattern to expose model-based services in an appropriate manner.
Having identified more coarse-grained Web services, you can then proceed to develop the WSDL top-down to avoid data and programming language specifics from leaking into the schema. The reality is that some mapping is likely to be required between the reusable, “good” WSDL/schema and the backend service interface.
"Schema? We don’t need no stinking schema!"
Putting arbitrary, undocumented XML inside a SOAP envelope and calling the result a Web service.
Why does it happen?
This situation can arise from developers that adopt Web services in a bottom-up manner. Symptoms of this problem are that the organization doesn’t see the benefits of SOA that are promised, and that maintenance of their solutions seems to be harder (not easier) than before Web services were adopted. What we typically see here is an application that was originally built using a RESTful approach that either communicated via XML over HTTP, or it might have simply transferred XML documents over another protocol, like JMS. The issue is not how the XML is transmitted; the problem lies in a cavalier attitude that is sometimes taken toward documenting the structure of the XML itself.
This problem will usually occur when you try to reuse existing code that has already been implemented to generate and parse XML; in these cases, you would typically have directly used a JAXP compatible XML parser and also directly accessed the Java HTTP classes needed to send and receive the XML documents. The problem comes when you then try to adopt this to Web services. Often the approach taken is to create a WSDL document that uses the <xml:Any> schema element to enable the XML to pass through unimpeded, which is then parsed with your existing code.
If the XML has no external schema, then there is no chance of validation of the XML, either in the program itself, or, more importantly, in any mediations (such as in an ESB) that might exist between the Web service provider and requestor. One of the points of Web services is that XML is not just human readable, but machine comprehensible as well. When the XML schema used is undocumented, then it becomes difficult (if not impossible) to write mediations that can effectively operate on that XML.
Likewise, consumers of the Web service might prefer to use strongly typed description information separate from the WSDL. For instance, if the WSDL they are given includes <xml:Any>, it is impossible to automatically create the programming language specific objects that correspond to the XML. Having non-deterministic WSDL greatly reduces the value of Web services for the consumer.
How do you fix it?
The simple solution to that is that whenever you are writing Web services, either using the WS-* standards or using a REST approach, you need to make sure that you create a complete and accurate XML schema to represent the structure of your document. Again, referring back to SOMA, an externalizable services description is an important part of a well-described service.
If you are building WS-* Web services, then this XML should be included as part of the WSDL that describes your Web services. Even if you are following a REST approach, having an easily accessible XML schema will encourage reuse of your services. I would recommend that the schema be stored and maintained in a repository like the IBM WebSphere Services Registry and Repository, which enables you to effectively manage different versions of your schema and provides a mechanism for retrieving the schema from both your clients and your mediations.
In this brief article, I’ve examined a few common practices that we often encounter that can really make your life more complicated than it needs to be. Hopefully, this information will help you avoid these issues and get the most out of your use of XML and Web services. However, if you do find yourself in one of these situations, then perhaps the suggestions presented here will help cut your misery short.
Thanks a million to Rachel Reinitz for her many helpful suggestions and improvements to this article.
- WebSphere MQ File Transfer Edition product information
- WebSphere Service Registry and Repository product informatioin
- Enterprise Integration Patterns: Claim Check pattern
- Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Gregor Hohpe and Bobby Woolf, Addison-Wesley, 2003
- Patterns of Enterprise Application Architecture, Martin Fowler, Addison-Wesley, 2002
- IBM Systems Journal: SOMA: A method for developing service-oriented solutions
- IBM developerWorks WebSphere