Note: IBM was unable to get sufficient industry agreement around HTTPR and has stopped all work on it. The current direction for standards-based reliable messaging is WS-ReliableMessaging. You can also find a demo version of this in the Emerging Technologies Toolkit (ETTK) on alphaWorks.
Authors contributing to this document are Michael Conner, Richard King, Francis Parr, Stephen Todd and Karen Witting. The ideas reflect the work of a broader HTTPR team responsible for design of the protocol, prototype implementations, demonstrations, and toolkits. Key members of this team include, Andrew Banks, Paul Clarke, Jim Challenger, Doug Davis, John Ibbotson, Richard King, Francis Parr, and Karen Witting.
The delivery of messages using a reliable transport mechanism is a fundamental component for middleware in e-business systems, and a needed technology in enterprise computing. However, in the wider context of the Internet, synchronous transport protocols such as HTTP do not currently provide those facilities. Reliable HTTP (HTTPR) addresses these deficiencies by proposing rules that make it possible to ensure that all messages are delivered to their destination in their exact form and only once. In cases where the message delivery fails, the protocol will reliably report the message as undeliverable.
Messaging agents can use HTTPR together with persistent storage capability to provide reliable messaging for applications. The specification of HTTPR does not include the design of a messaging agent, nor does it say what storage mechanisms should be used by a messaging agent; it does specify the state information needs to be in to be stored safely and when to store it so that a messaging agent can provide reliable delivery using HTTPR.
HTTP version 1.1 serves as the base upon which HTTPR builds its reliability. As such, all of the facilities of HTTP/1.1 (SSL, keep-alive, communication through proxies and firewalls, and so on) are available.
IBM is making the HTTPR specification specification available to the public to stimulate public discussion on reliable message delivery on the Internet. The authors welcome feedback on the technical contents of this article and the specification.
Developing a function that requires communication between two distributed components can be surprisingly difficult when you consider all the possible failure scenarios. Consider the case of a program sending a purchase request to another program over a network. If the message is not delivered and the sender is not made aware of the delivery failure, then the purchase request will be lost. If the message is delivered more than once and the target is unaware of these multiple deliveries, then too many purchases will occur.
Reliable messaging refers to the ability of a sender to deliver a message once and only once to its intended receiver. It is a necessary building block for most non-query communication between programs. The basic method for achieving reliable delivery is for the sender to send the message repeatedly to the target until the target acknowledges receipt of the message. The message must contain an identifier of some kind so that the target will discard any duplicates it receives. While this should be simple task to perform, it is surprisingly difficult to achieve in the full context of possible failures and acceptable efficiency.
For example, consider the task of resending a message. If the sender_s server goes down the sender may lose its copy of the message and, therefore, will not be able to resend it. For this reason, senders need to record the message in a reliable store until it is definitely delivered. Furthermore, the sender frequently needs to make a record of the fact that the request has been sent (that is, the sender needs to record the fact that the business process in which it is participating has moved to the next step).
To carry this task out reliably, the sender needs to send the message and update its record of the transmission in a single transaction. This is quite a difficult task to perform efficiently. The only simple way to do it is to hold the transaction open for the entire process, which lengthens the lock holding time at both the sending and receiving ends and reduces system throughput. It is also difficult to get the degree of system cooperation needed to achieve such a distributed transaction; further, it is quite inconsistent with the loose-coupling principles behind Web services.
The complexity of this process has resulted in a standard design pattern known as an end-point manager is employed at each end of the line of communication to coordinate the message delivery. In this pattern, the sender delivers the message to the end-point manager via a synchronous request. Once delivered to the end-point manager, the sender can be assured that the message will be delivered or a definite event (time out) will be raised. The end-point manager participates in local transactions with other resource managers so that "queuing" the message with the end-point manager and, for example, recording the business process step in a database can be completed in one transaction.
Reliable message support is not a new technology. Messaging middleware products such as the IBM MQSeries, Oracle Message Broker, and Microsoft Message Queuing have supported it for years and are widely deployed in enterprise computing environments. However, reliable messaging is currently supported via product specific protocols. This article proposes how reliable messaging can be brought to communication over an open, standardized, and product neutral protocol.
Consider how communication occurs using SOAP over HTTP. HTTP is a "reliable" protocol in that it delivers messages at most once, in order, and with a definite acknowledgement for each message delivery or delivery failure. That is, HTTP is reliable when nothing goes wrong. However, when something does go wrong then a number of problems can arise. Figure 1 below shows the sequence of events that occur in the delivery of a SOAP message under normal circumstances. Figure 1 distinguishes between application software and communication software as this makes it easier to discuss some of the failure scenarios. Consider what happens if the communication connection is lost after step 1 and before step 2. In this case, the requester will get a connection failure event and it will be "in doubt" as to the status of its request. Several cases might hold:
- The request might not have been delivered to the responder.
- The request might have been delivered to the responder and processed, but the responder might detect that the connection has been lost and "rollback" its processing.
- The request may have been delivered and processed successfully at the responder, but the reply confirming this is still waiting for delivery back to the requester.
- The request might have been delivered and processed and the reply could have been lost while being sent back to the requester, but the responder might not know about the connection failure or might not be able to rollback its processing at this point.
Figure 1: Synchronous, unreliable message delivery
If the request was a simple query request with no important side effects, then the requester could resend the message on a new connection (assuming one can be established). This is not too difficult, but it does require additional application logic to resend the request. If the request was an update request that should not be repeated (such as a purchase request) then the requester should send a message to the responder inquiring as to the status of the original request and recover appropriately. This requires considerable application logic including a way to identify the request and the logic to generate and process the message.
Figure 2: Synchronous, reliable message delivery
Figure 2 shows how communication is simplified when the communication software is assumed to support reliable delivery. It adds the operations of making messages persistent before and after sending them by saving them in a reliable store. Not shown in the figure, but an important part of reliable message delivery, is that all messages are given unique identifiers at the transport level so that a message sender can resend a message until it gets a positive acknowledgement that the message has be received and made persistent by the receiver.
In Figure 2, a pseudo-synchronous request interface provides reliable transport for a service request -- sending the request and receiving the reply from the requested service using a reliable messaging protocol. The communication support at the requester keeps a copy of the request until it is assured that this request has been received and made persistent at the responder. After processing at the responder node to provide the requested service, the communication support saves a copy of the reply until it is assured that this reply has been safely received back at the requesting node.
This also illustrates the "normal" case where no retransmissions are needed. Even in this case some simplifications have been made. Although the client knows that the request is completed at the end of this flow and has its reply, the responder is still in doubt as to whether the reply has been returned safely to the requester. This is not a problem since the responder has a persistent copy of the reply to resend on request, and the processing that provided the service believes this request to be complete and is holding no locks on its behalf. Uncertainty about safe delivery of the reply will be removed either by a piggybacked acknowledgement on the next request from this requester to this server or, if too much time elapses before such a request is needed, by a supporting standalone flow from the requester confirming delivery of the previous reply.
In failure situations, additional flows would occur. For example, if some network failure caused the requester's communication support to be "in-doubt" as to whether or not the request had been received and saved by the responder's communication support, a reliable messaging interaction would force the responder to report if it had received and saved it. If the request was received at the responder, the remote service application would eventually process it and return a reply -- hence the requester must not resend the request. If it does, the service will probably be provided twice which, for a request to buy or sell something, would not be what the requesting application wanted. If the requestor's communication support determines that the original request was never received at the responder, the requestor should reconnect and resend the request. If the request for service is confirmed as received at the responder but no reply is returned, then the requester's communication support can safely "poll" to get the reply returned as soon as it is available. More sophisticated (and more scalable) reply retrieval schemes are also possible, but these will require the requesting node to act as a "server" and receive unsolicited messages from the network. Scalability can be improved, but at the cost of a more complex requester.
The advantage of the this model for the requester of a remote service is that its application logic does not need to deal with communication failures either for simple queries or for update requests such as buy and sell commands. Without a reliable messaging service and communication support to provide reliable remote request delivery, the application logic at the requester would inherit this responsibility. After any failure of a remote service request, the application would have to include logic to assess the likely state of remote processing, decide whether or not to reissue requests, and so on. Having a reliable messaging service leads to simpler, more reliable applications.
The advantage of a reliable messaging system to service providers whose role is to respond to requests from other applications is that they can use transactional logic to protect the processing of the service requests in a way that promotes the scalability of the service provider. The transaction boundary is shown in Figure 2 as a shaded region. As shown, the processing application providing a service can dequeue the request from the persistent store in its communication support, process it (maybe updating a database), and enqueue the reply for return to the requestor; all in one transaction. Therefore, any failure during processing of the service request at the responder_s side results in the request waiting for processing or the response waiting to go out, but no in doubt states. Furthermore, as soon as this service processing transaction is complete, all locks associated with it can be released; there is no end-to-end transactional coordination involving network delay times.
However, there are still important failure scenarios that leave the requestor in doubt. Consider the case where the requester_s system crashes before it completes its processing of the request response. Then when the requestor comes back up it is in doubt about whether or not it has initiated the request. This failure scenario is not very serious for simple queries. It is also usually manageable for update requests that are directly driven by a human as the human can drive the recovery. However, when the operation is part of an automated business process we are back in the situation of having to write complex logic to resynchronize after this kind of failure, or possibly not attempt automated recovery, but rather let human experts repair the damage later.
Figure 3: Asynchronous, reliable message delivery
To overcome this final problem, Figure 3 illustrates how the scenario can be extended to leverage asynchronous messaging . In this scenario the requesting application sees the request processing as two stages: (1) sending the message and (2) receiving the response. This allows the requesting application to leverage transactional logic to remove the final in-doubt opportunities. Now if the requester_s system crashes at any time the requester will know where to restart by the records it keeps in its local database. The asynchronous model has the additional benefit that it offers performance advantages in high transaction rate systems because the decoupled requests and responses can be batched. For example, the requester_s communication software might send three or four requests in a single message (and also make these requests persistent in a single operation). This can reduce traffic and processing overhead considerably when the message volume is high.
Our example shows how reliable messaging can be leveraged to reduce the complexity of program-to-program interactions. With this serving as context, the discussion below examines some of the issues surrounding how to provide reliable messaging support over an open, standard protocol.
A key issue that must be addressed is how to layer the support for reliable messaging. One possibility is to put it in the application. For example, applications that use Web services-based communication could extend the SOAP messaging formats and protocols to support reliable messaging. This would have the advantage of supporting reliable messaging over a whole range of transport protocols. However, doing this involves the application in low-level processes, such as message resend, complicating and confusing the definition of higher-level business processes. Also, this would make it very difficult to do things like the message batching optimizations discussed above. Furthermore, there are important existing transports that already support reliable messaging and encode the necessary formats and protocols at the transport level. If we required that reliable message delivery formats be added at the application level, it would make the use of these existing transports very inefficient in some cases.
The other obvious approach is to add support for reliability at the transport level. This keeps the concerns of reliable delivery out of the business process and allows for transport appropriate optimizations. For these reasons, we feel this is the best approach. This also allows a number of higher-level models such as SOAP to leverage support for reliable messages without requiring each to define its own extensions for reliable message delivery.
To this end we recommend that a protocol layer be added to HTTP that we call HTTPR and similarly to HTTPS that we call HTTPSR.
HTTPR is a protocol for the reliable transport of messages from one application program to another over the Internet, even in the presence of failures either of the network or the agents on either end. It is layered on top of HTTP. Specifically, HTTPR defines how metadata and application messages are encapsulated within the payload of HTTP requests and responses. HTTPR also provides protocol rules making it possible to ensure that each message is delivered to its destination application exactly once or is reliably reported as undeliverable.
Messaging agents use the HTTPR protocol and some persistent storage capability to provide reliable messaging for application programs. This specification of HTTPR does not include the design of a messaging agent, nor does it say what storage mechanisms should be used by a messaging agent; it does specify information necessary for safe storage, when to store it, and for a messaging agent to provide reliable delivery using HTTPR.
HTTP/1.1 serves as the base on which HTTPR builds. All the facilities of HTTP/1.1 such as the Secure Sockets Layer, session keep-alives, proxy and firewall support, and so on, are thus available in HTTPR. One feature, the chunked transfer encoding, is especially convenient in the construction of batches of messages where the size of the entire batch is not known a priori. It should not be assumed, however, that this feature, nor any other feature, is actually being used on any particular occasion; any correct use of HTTP/1.1, as defined in the Internet RFC 2616 (see Resources), when used by one messaging agent, should be acceptable to any other messaging agent.
Layering HTTPR on HTTP in this way has the additional benefit that HTTPR can be used for reliable messaging with enterprises whose only presence on the Internet is a Web server behind a firewall admitting only Web-related traffic.
Given the asymmentric behavior of HTTP, it is still convenient to use the terms "client" and "server" even though under HTTPR messaging agents regard themselves as peers. The agent initiating an HTTPR interaction (the client agent) does so by sending a POST command, in the HTTP sense, including with it a payload that identifies itself, specifies an HTTPR command, and, if the command asks the server to accept messages, includes a batch of messages. ( A single message is handled as the special case of a batch with only one member.) The server sends back a response whose payload includes status information and, if the client requested, a batch of messages intended for that client. The messages, and any accompanying meta-data, are uninterpreted bytes as far as HTTPR is concerned and are assigned no other meaning by it.
Each batch is assigned an identifier by its sender (either client or server) that is sent along as HTTPR metadata with the batch. Correctly functioning messaging agents will, in accordance with the specification, store this identifier and the state of their processing of that batch of messages in stable storage at the appropriate times. In the event of a failure, this information can be recovered from stable storage and used by the messaging agents, through specified interchanges of that state information, to resolve the status of the batch of messages, thereby achieving exactly-once delivery.
The HTTPR protocol places no constraints on the interface used by an application program to pass messages to its local messaging agent for reliable delivery to a partner application program. SOAP and JMS are two examples of application messaging interfaces for which reliable delivery using the HTTPR protocol can be provided.
SOAP messages transported over HTTPR will have the same format as SOAP messages over HTTP. The additional information need to correlate request and response in the HTTPR asynchronous (or pseudo-synchronous) environment is put into the HTTPR message context header. The
SOAPAction parameter is carried in the HTTPR message context header as the type app-soap-action. When request-response style SOAP messages are used, the HTTPR rules for response matching to specific requests must be followed. In particular, the
message-id of the request message must be copied into the
correlation-id of the response. Extensions to SOAP such as in ebXML and SOAP-RP contain application-level correlation information that must also be carried in the HTTPR message context header for this protocol.
The example in Listing 1 shows a complete SOAP message as it might appear when transported over HTTPR. The semicolons (';') are used for comments and would not appear in the actual SOAP message.
A WSDL specification for HTTPR will be almost precisely the same as for HTTP. The following binding:
<soap:binding style="..." transport="http://JohnIbbotson/ToFillIn/httpr"/>will reference this HTTPR binding, and the service port address location will be an HTTPR uri. For example,
Â Â Â <soap:address location="httpr://gateway.orgBig.com/gway/httprmq.jsp#SOAPQ@QM"/>
Listing 2 shows a full example of what a WSDL specification for a SOAP message using HTTPR would look like.
Listing 2: A WSDL specification for a SOAP over HTTPR service
<?xml version="1.0"?> <definitions name="AddressService"Â Â Â Â Â Â Â Â Â Â Â Â Â targetNamespace="urn:show-address" Â Â Â Â Â Â Â Â Â Â Â Â xmlns="http://schemas.xmlsoap.org/wsdl/" Â Â Â Â Â Â Â Â Â Â Â Â xmlns:xsd="http://www.w3.org/1999/XMLSchema" Â Â Â Â Â Â Â Â Â Â Â Â xmlns:xsd1="http://www.addressbook.com/ns/ShowAddress" Â Â Â Â Â Â Â Â Â Â Â Â xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"> Â Â <message name="AddressInput"> Â Â Â Â Â <part name="theAddress" type="xsd1:address"/> Â Â </message> Â Â Â Â <portType name="AddressHandler"> Â Â Â Â Â <operation name="printAddress"> Â Â Â Â Â Â Â Â <input message="AddressInput"/> Â Â Â Â Â </operation> Â Â </portType> Â Â <binding name="AddressSoapBinding" type="AddressHandler"> Â Â Â Â Â <soap:binding style="rpc"Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â transport="http://JohnIbbotson/ToFillIn/httpr"/> Â Â Â Â Â <operation name="printAddress"> Â Â Â Â Â Â Â Â <soap:operation soapAction=""/> Â Â Â Â Â Â Â Â <input> Â Â Â Â Â Â Â Â Â Â Â <soap:body use="encoded"Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â encodingStyle="http://schemas.xmlsoap.org/soap/encoding/ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â http://www.ibm.com/namespaces/xmi" Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â namespace="urn:show-address"/> Â Â Â Â Â Â Â Â </input> Â Â Â Â Â </operation> Â Â </binding> Â Â <service name="AddressService"> Â Â Â Â Â <port name="AddressPort" binding="AddressSoapBinding"> Â Â Â Â Â Â Â <soap:address location="httpr://gateway.orgBig.com/gway/httprmq.jsp#SOAPQ@QM"/> Â Â Â Â Â </port> Â Â </service> Â Â Â Â <types> Â Â Â Â Â Â <xsd:schema targetNamespace="http://www.addressbook.com/ns/ShowAddress"Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â xmlns:xsd="http://www.w3.org/1999/XMLSchema"> Â Â Â Â Â Â Â Â Â Â <xsd:complexType name="address"> Â Â Â Â Â Â Â Â Â Â Â Â Â Â <xsd:element name="street" type="xsd:string"/> Â Â Â Â Â Â Â Â Â Â Â Â Â Â <xsd:element name="city" type="xsd:string"/> Â Â Â Â Â Â Â Â Â Â Â Â Â Â <xsd:element name="state" type="xsd:string"/> Â Â Â Â Â Â Â Â Â Â Â Â Â Â <xsd:element name="zip" type="xsd:string"/> Â Â Â Â Â Â Â Â Â Â </xsd:complexType> Â Â Â Â Â Â </xsd:schema> Â Â </types> </definitions>
HTTPR provides the features of reliable messaging that are lacking on the Web. With the advantages explained in this article, we have shown that this new protocol will not only fit into the current infrastructure of the Web without major redevelopment, but will also satisfy the needs of enterprise applications that require such features. We have also shown how SOAP can operate over HTTPR to allow Web services to make use of these reliability features.
- UPDATE: Version 1.1 of the HTTPR spec is now available from developerWorks.
- Internet Request
for Comments 2616 defines the HTTP 1.1 specification.
The World Wide Web Consortium has just updated its draft implementation
of SOAP into version
Find more dW Web services resources.
Francis Parr is a staff member at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, and is currently responsible for technology transfer activities involving joint work between IBM Research and IBM Transaction and Messaging Products in Hursley, UK. He's been involved in e-commerce, messaging and integration products, parallel DB, scalable object technologies and distributed processing. Francis joined IBM Research in 1979. He can be reached at email@example.com.
Michael H. Conner, Ph.D., is a Distinguished Engineer and Member of the IBM Technical Academy. He is the Chief Technical Officer for Web services where he is leading the IBM efforts to provide customers with tools for developing and deploying Web services. In his previous position as technical lead for software strategy, Dr. Conner help set IBM's e-business direction including leading the team that defined the architecture for the Framework for e-business. He can be reached at mconner.ibm.com.