In the second article in this series I selected a cross-section of the W3C XML specification stack and described the role (or roles) each XML language could potentially play in a W3C Multimodal Architecture. I then presented a speculative example of how four of these languages -- SCXML, XHTML, REX, and XML Events -- could be combined to form a complete, distributed multimodal application. The application was entirely declarative in that I declared all event notification, event processing, and interaction processing as XML.
In this final article in the series, I'll show you how to implement the W3C Multimodal Architecture as a generic template to define a multimodal Web service. As you will see, in this type of implementation the Web service defines the data types and Web service operations for the Multimodal Architecture's life-cycle API.
For the sake of example I'll build on the multimodal online coffee maker introduced in Part 2 of this series. The multimodal coffee maker is a virtual appliance connected to an actual coffee maker. The actual coffee maker has its own Web server, enabling a user to remotely start or stop the coffee maker. In the previous example I used XML Events and REX remote events to communicate interaction between the client browser and the interaction manager running on the server. For this example, the online coffee service, I'll replace the application's XML Events and REX components with a Web service running on the server and SOAP calls to the Web service interface on the browser. My ability to implement this solution relies on the support for SOAP in the latest FireFox browser.
I'll still need the interaction manager running on the server for this
example. I no longer require a client- or server-side adapter, however.
Recall that in the original coffee maker application I used adapters to map
XML Events and REX to <data> events
consumed by the application's SCXML. In this case, I replace the adapters with
Web service operations. The operations perform the mapping instead, with
the actual mapping implementation hidden from the client.
Data types for multimodal interaction
The core of a Web service is its WSDL (Web Service Description Language) document, which defines the Web service. The WSDL defines an interface by separating the abstract description of the messages and operations that can be performed by the service from a concrete specification of the data exchanged between the client and service. The WSDL then defines how the abstract operations are bound to the actual end points available to clients. The format of the data to be exchanged between client and service is defined in terms of the XML The XML schema for the online coffee service is shown in Listing 1.
Listing 1. Data types for the online coffee service
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="urn:mycoffeemaker"
xmlns="urn:mycoffeemaker"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:annotation>
<xs:documentation>
Coffee Maker 1.0 WSDL data types schema
</xs:documentation>
</xs:annotation>
<xs:complexType name="tData">
<xs:sequence>
<xs:element ref="context"/>
<xs:choice>
<xs:element ref="event"/>
<xs:element ref="coffeeStatus"/>
<xs:element ref="grammar"/>
<xs:element ref="start"/>
<xs:element ref="prepare"/>
<xs:element ref="done"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
<xs:complexType name="tDone">
<xs:sequence>
<xs:element ref="context"/>
<xs:element ref="status"/>
<xs:element ref="errorInfo"/>
<xs:element ref="data" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
...
<xs:element name="data" type="tData"/>
<xs:element name="done" type="tDone"/>
<xs:element name="startResponse" type="tStartResponse"/>
<xs:element name="start" type="tStart"/>
<xs:element name="prepareResponse" type="tPrepareResponse"/>
<xs:element name="prepare" type="tPrepare"/>
<xs:element name="newContextResponse" type="tNewContextResponse"/>
<xs:element name="newContextRequest" type="tNewContextRequest"/>
</xs:schema> |
Having defined the format of the XML exchanged between client and service, the next important part of a WSDL is the section that describes the input and output messages and operations that define the Web service interface. The WSDL is a contract between the client and the Web service, ensuring that the client knows both how to call the service and what to expect will be returned by the service.
Here the operations described are implementations of the W3C Multimodal
Architecture's life-cycle API. For example, the "DataOp" operation is an
implementation of the Multimodal Architecture's <data> event. However, the Multimodal
Architecture specifies <data> as an
asynchronous notification, while the WSDL specifies "DataOp" as a call that
sends and returns a data message.
Note that the data types defined in Listing 1 are imported at the beginning of the WSDL document shown in Listing 2, below. Note also that I've used WSDL Version 1.1 for this implementation because Mozilla's WSDL API currently supports only WSDL 1.1. The latest W3C WSDL recommendation is 2.0.
Listing 2. Initial WSDL for the online coffee service
<?xml version="1.0" encoding="utf-8"?>
<definitions name="CoffeeMaker"
targetNamespace="urn:mycoffeemaker"
xmlns:tns="urn:mycoffeemaker"
xmlns:xsdl="urn:mycoffeemaker"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns="http://schemas.xmlsoap.org/wsdl">
<documentation>
This document describes the 'My Coffee Maker' web service.
</documentation>
<import namespace="urn:mycoffeemaker"
location="http://example.com/schemas/coffeemaker.xsd"/>
<message name="Data">
<part name="body" element="xsdl:data"/>
</message>
<message name="NewContextRequest">
<part name="body" element="xsdl:newContextRequest"/>
</message>
<message name="NewContextResponse">
<part name="body" element="xsdl:newContextResponse"/>
</message>
<message name="PrepareResponse">
<part name="body" element="xsdl:prepareResponse"/>
</message>
<message name="StartResponse">
<part name="body" element="xsdl:startResponse"/>
</message>
<portType name="CoffeeMakerPortType">
<operation name="DataOp">
<input message="tns:Data"/>
<output message="tns:Data"/>
</operation>
<operation name="NewContextOp">
<input message="tns:NewContextRequest"/>
<output message="tns:NewContextResponse"/>
</operation>
<operation name="PrepareResponseOp">
<input message="tns:PrepareResponse"/>
<output message="tns:Data"/>
</operation>
<operation name="StartResponseOp">
<input message="tns:StartResponse"/>
<output message="tns:Data"/>
</operation>
</portType>
</definitions> |
Listing 2 only describes the input and output messages and operations that define the Web service interface; the complete WSDL is shown below in Listing 3.
The complete Coffee Maker Web service
The messages and port types described in Listing 2 are
imported at the beginning of the document shown in Listing 3. Below the
import statement are the bindings and service definitions for the Web
service. The service definition specifies that SOAP calls for this Web
service should be made to the Web address
http://example.com/mycoffeemaker.
Listing 3. The complete WSDL for the online coffee service
<?xml version="1.0" encoding="utf-8"?>
<definitions name="CoffeeMaker"
targetNamespace="urn:mycoffeemaker"
xmlns:tns="urn:mycoffeemaker"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns:defs="urn:mycoffeemaker/definitions"
xmlns="http://schemas.xmlsoap.org/wsdl">
<documentation>
This document describes the 'My Coffee Maker' web service.
</documentation>
<import namespace="urn:mycoffeemaker/definitions"
location="http://example.com/wsdl/11/coffeemaker.wsdl"/>
<binding name="CoffeeMakerBinding"
type="defs:CoffeeMakerPortType">
<soap:binding style="rpc"
transport="http://schemas.xmlsoap.org/soap/http"/>
<operation name="DataOp">
<soap:operation soapAction="DataOp" style="rpc"/>
<input>
<soap:body use="encoded" namespace="urn:mycoffeemaker"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>
</input>
<output>
<soap:body use="encoded" namespace="urn:mycoffeemaker"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>
</output>
</operation>
<operation name="NewContextOp">
<soap:operation soapAction="NewContextOp" style="rpc"/>
<input>
<soap:body use="encoded" namespace="urn:mycoffeemaker"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>
</input>
<output>
<soap:body use="encoded" namespace="urn:mycoffeemaker"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>
</output>
</operation>
...
</binding>
<service name="CoffeeMakerService">
<port name="CoffeeMakerPortType" binding="tns:CoffeeMakerBinding">
<soap:address location="http://example.com/mycoffeemaker"/>
</port>
</service>
</definitions>
|
With the server-side set up, your next step is to establish the client. While Web browsers do not support the W3C life-cycle events as asynchronous notifications, SOAP calls can be invoked asynchronously. This enables the user to continue to interact with the client while the client waits for a response. A client should not poll for a response (because it can't know how often it should poll), but it can keep track of its own interaction state, and make an asynchronous call to the Web server whenever it receives a return message from the Web service. In this way, even without polling, the client is always in a state waiting for a response from the Web service, which simulates an asynchronous notification.
Table 1 shows the interaction states on the client as it interacts with
the online coffee service. When the client receives a context response,
start response, or data from the Web service, it processes the message and
then makes a data request call to request more data. The data may contain
"coffee status," or a <start> or <done> life-cycle event. Because the Multimodal
Architecture allows <data> to contain
anything, the online coffee service includes the <start>, <prepare>, and <done> events as <data> event content.
Table 1. Client interaction states for making SOAP calls
| State | Transition | Next state |
| Begin | Send: context request | Context Response |
| Context Response | Receive: context response Send: data request | Data Request |
| Data Request | Receive: data/start Send: start response | Data Request |
| Data Request | Receive: data/event Send: data request | Data Request |
| Data Request | Receive: data/coffee status Send: data request | Data Request |
| Data Request | Receive: data/done | End |
The client implementation uses JavaScript functions to set up, send, and
process the Mozilla SOAP calls to the online coffee service. The callCoffeeMaker function creates a new SOAPCall object and asynchronously invokes the Web
service method at transport URI http://example.com/mycoffeemaker. The specified
call-back function handles the response from the Web service.
One of the call-back functions is dataResponse. The dataResponse function handles on and off events by
checking the radio button associated with the coffee maker's current state.
The "coffee status" data event is handled by updating the text area with the
"box" ID with the contents of the event. If the event is <start> and its contentURL field is not empty, then a document cookie
is set with the context ID and the application moves to a new XHTML page by
setting the href property of the location object
with the new URI. After the browser has moved to the new page the
application can get the contextId property from
the cookie and, if it exists, send a start
Response message to the interaction manager.
Listing 4. JavaScript functions for calling the Mozilla SOAP API
function data(name, value) {
// create the parameters array
var params = new Array();
params[0] = new SOAPParameter(contextId, "context");
params[1] = new SOAPParameter(value, name);
callCoffeeMaker("DataOp", params, dataResponse);
}
function newContext() {
var date = new Date();
requestId = "coffeemaker"+date.getTime();
// create the parameters array
var params = new Array();
params[0] = new SOAPParameter(requestId, "requestID");
params[1] = new SOAPParameter("text/xhtml", "media");
callCoffeeMaker("NewContextOp", params, contextResponse);
}
function callCoffeeMaker(method,params,callback)
{
try {
netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead");
}
catch (e) {
alert(e);
return false;
}
var soapCall = new SOAPCall();
soapCall.transportURI = "http://example.com/mycoffeemaker";
soapCall.encode(0, method, "urn:mycoffeemaker", 0, null, params.length, params);
soapCall.asyncInvoke(
function (response, soapcall, error)
{
var r = handleSOAPResponse(response,soapcall,error);
callback(r);
}
);
}
...
function dataResponse(result)
{
if (result)
{
var params = result.getParameters(false,{});
if (params[1].name == "event") {
if (params[1].value == "on") {
document.getElementById("on").checked = true;
}
else if (params[1].value == "off") {
document.getElementById("off").checked = true;
}
data("event", "request");
}
else if (params[1].name == "start") {
var url = params[1].element.getElementsByTagName("contentURL");
if (url.length > 0) {
document.cookie = "contextId="+contextId;
location.href = url.nodeValue;
}
else {
startResponse("Failure");
}
}
else if (params[1].name == "coffeestatus") {
document.getElementById("box").innerHTML = params[1].value;
data("event", "request");
}
else if (params[1].name != "done") {
data("event", "request");
}
}
}
var contextId; // Global context ID
var requestId; // Global request ID |
Listing 5 shows the XHTML document for the online coffee service. The
document loads the JavaScript in Listing 4 and then calls the newContext function to initiate interaction with the
Web service. When the user clicks on either the On or Off
radio button, the data function is called to send
the on or off data
event to the Web service.
Listing 5. XHTML for the online coffee service
<?xml version="1.0" encoding="ISO-8859-1" ?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My Coffee Maker</title>
<style>
h2 { background: #ffffff;
color : #0000a0;
font-weight: bold;
font-size : 18pt;
font-family : Arial }
body { color : #000000;
background: #ffffff;
font-size : 14pt;
margin: 30px 0px 0px 30px;
font-family : Comic Sans MS }
textarea { width: 400px;
height: 300px;
border: 1px solid #0000a0;
padding: 5px;
margin: 2px 2px 2px 2px;
}
</style>
<script type="text/javascript" src="coffeemaker.js"/>
<script type="text/javascript">
newContext();
</script>
</head>
<body>
<h2>My Coffee Maker</h2>
<p></p>
<form action=".">
<input name="OnOff" type="radio" id="on"
value="On" onfocus="data('event','on')"/> On<br/>
<input name="OnOff" type="radio" if="off"
value="Off" onfocus="data('event','off')"/> Off<br/>
<p></p><p></p>
<p>Status Messages<br/>
<textarea id="box" name="box">
</textarea>
</p>
</form>
</body>
</html> |
Figure 1 shows the online coffee service running in Firefox. The screenshot shows the text area updated after Firefox has received and processed a SOAP message from the Web service containing the coffee maker's status.
Figure 1. The coffee service running in Firefox

Because the online coffee service implements the Multimodal
Architecture's life-cycle API, its operations are available to other
modality components besides XHTML. A voice component could send and receive
<data> events, which in turn would contain
the life-cycle <prepare> event for
pre-fetching resources, and a grammar event for loading a new grammar.
Because the voice modality would reside on a server it would support
asynchronous notifications and would not need to send the <data> request event to the interaction
manager.
Modality components in the Multimodal Architecture are loosely coupled,
so they cannot know about each other. This leads to the question of how to
initiate the SIP (Session Initiation Protocol) call to the voice modality
component. One solution would be to use ActiveX components and Java applets
(which are available to the Web browser) to manage the SIP call, as well as
call the start listening and stop listening functions necessary for Push-to-Talk
(PTT). If the SIP component were not accessible from the XHTML, the voice
component would have to initiate the call. In this case the voice component
would have to assume that the client SIP component was listening at the
default SIP port number 5060, which is what is assumed in the sample application.
In this article I conclude my analysis of the W3C Multimodal Architecture by showing how multimodal interaction as specified by the W3C Multimodal Architecture can be implemented today as a Web service. I presented a simple example of an online coffee service, including the WSDL which defined the data types, operations, and service bindings needed by the client. Finally, I showed how a client can call the Web Service operations using the Mozilla SOAP API.
In the first article of this series I pointed out several challenges faced by implementations of the W3C Multimodal Architecture. A Web Service implementation of the architecture overcomes these challenges by providing a well-defined interface and data exchange format for a distributed application. This is important because the W3C Architecture is best used to develop a distributed multimodal application. In addition, a Web Service implementation allows for the definition of additional operations for querying the capabilities of multimodal components. It also provides SOAP and the SOAP client API as the multimodal protocol and JavaScript interface. Implementing a distributed multimodal application as a Web service turns out to be an excellent way to leverage the strengths of the W3C Multimodal Architecture and its generic life-cycle API, while also working around some of its weaknesses.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code from this article | webservice.zip | 4KB | HTTP |
Information about download methods
Learn
- The W3C Multimodal Architecture series (Gerald McCobb, developerWorks, May 2007): A three-part introduction to the emerging architecture for multimodal application development.
- "Multimodal interaction and the mobile Web, Part 1: Multimodal auto-fill" (Gerald McCobb, developerWorks, November 2005): Get started with multimodal application development.
- "Call SOAP Web services with Ajax" (James Snell, developerWorks, October 2005): Learn more about using
XMLHttpRequestto send and receive data in a Web service implementation. - "Developing RESTful SIP services for a high availability environment" (Yutaka Obuchi and Erik Burckhart, developerWorks, February 2007): Introduces a REST-based design pattern to provide SIP services for an active SIP dialog.
- New to SOA and Web services: A developerWorks resource for learning more about Web services and service-oriented architectures.
- The World Wide Web Consortium (W3C): Home to all the XML language and interface specifications in the W3C specification stack.
- W3C's Multimodal Interaction Activity page: Home of the Multimodal Architecture and Interfaces working draft; the Multimodal Application Developer Feedback working group note; and the EMMA: Extensible MultiModal Annotation markup language working draft.
- XML Web Services at the Mozilla Developer Center: Learn more about Mozilla's support for SOAP and Web services.
- XUL Planet Scriptable Objects: Learn about all the JavaScript object interfaces available to Mozilla developers, including SOAP and
XMLHttpRequest.
Get products and technologies
- IBM developerWorks SCXML page: Get the IBM Modeling and Integration Tools for State Chart XML and try the plug-ins for Rational Software Architect and Mozilla.
Discuss
- developerWorks blogs: Get involved in the developerWorks community.
Gerald McCobb has worked for IBM for over 15 years. He currently works in WebSphere Business Activity Monitor development. He was IBM's representative to the W3C Multimodal Interaction Working Group from 2002 to last May, 2007.
Comments (Undergoing maintenance)





