In my previous column, I introduced my mission to investigate and deploy SyncML. Increasingly, people are becoming users of multiple devices, depending on location and occupation. When they travel or change devices, they want their data to come with them. This is the central function of the SyncML XML protocol, which is rapidly becoming a checkbox item on the feature lists of today's mobile phones.
Last time, I gave a high-level overview of SyncML, and showed what happened when I captured the first SyncML message sent from my Ericsson R520m mobile phone to my Web server. The most surprising thing about this message was that it was encoded not in XML, but in Wireless Binary XML (WBXML). WBXML is a standard developed by the WAP Forum, and is intended to provide a space and a CPU-efficient XML representation.
Many XML developers have never encountered WBXML before, as it is largely used in proprietary cell phone networks. However, supporting SyncML requires the ability to handle this encoding, as well as the straight XML encoding. In this installment, I give a brief overview of the WBXML encodings, the steps involved in processing the WBXML encoding of SyncML into XML, and what is required to go back the other way, from XML to WBXML. I also introduce the main elements of the SyncML protocol, to set the stage for creating a SyncML server.
The topic of binary notations for XML is one of the more enduring permathreads that have continued through the five or so years of XML developer discussion. At a high level, opinion is divided into two camps: The first of these favors a custom encoding, such as that used by WBXML; the second maintains that compressing normal XML will achieve similar space savings. To me, the second approach has always seemed preferable, as it provides the ability to re-use common and well-known software components.
However, the WAP Forum decided to pursue a custom encoding scheme -- WBXML. Along with several other technical decisions made by the forum, this has received its share of criticism over time. (See Resources for links to critiques of these specifications.)
As Figure 1 from the previous column might indicate to you, WBXML takes a tokenizing approach to encoding XML. The most common constructs -- such as tags, attributes, and attribute values -- are reduced to one-byte tokens, with some literal text left in the clear. WBXML also allows for common strings to be reduced to tokens as well, with the token table sent as part of the document preamble.
WBXML implements the equivalent of XML namespaces through code pages. As only 27 tokens are available for elements -- 5 bits, with the values 0 through 4 being reserved -- complex vocabularies need to be multiplexed by organizing token sets into separate code pages. Switching code pages is analogous to switching the default namespace. The WBXML encoding of SyncML uses a code page for each of the DTDs used in the protocol: SyncML, SyncML Meta Information, and SyncML Device Information.
Processing WBXML is reasonably simple: It is a matter of reading
the document preamble, selecting a set of appropriate token tables
(specific to the DTD of the application), and then consuming the
tokens of the document. Possible tokens include start/end
element, switch code page, entity, processing instruction,
tokenized string, literal string, various extension tokens, and
opaque data. This latter, opaque, is the WBXML equivalent of
XML's CDATA. The extension tokens are used in different ways by
different WBXML applications. The SyncML encoding doesn't use them
at all, but WML uses them for including tokenized body-text strings
from a string table sent in the preamble. It should be obvious by
now that WBXML isn't a general-purpose binary encoding of XML.
Every application requires at least a token table to perform
lookups, and often needs code to interpret the extension tokens
as well.
For these SyncML investigations, I propose to convert the incoming WBXML representation of SyncML into its XML representation, to make debugging and module separation more convenient. This still leaves the issue of how to go back the other way.
If you thought interpreting WBXML was awkward, I'm afraid you will view creating WBXML even less favorably. Again, although there is a general-purpose algorithm for converting XML into WBXML, this is complicated by the application-specific utilization of extensions. To give you a good idea of what is involved, read this extract from the WAP Forum's WBXML specification (WAP-192-WBXML-20010725-a Version 1.3 -- see Resources for a link to this document).
The process of tokenising an XML document MUST convert all markup and XML syntax (i.e., entities, tags, attributes, etc.) into their corresponding tokenised format. All comments, the XML declaration, and the document type declaration MUST be removed. Processing instructions intended for the tokeniser MAY be removed; all other processing instructions MUST be preserved. All text and character entities MUST be converted to string (e.g.,STR_I) or entity (ENTITY) tokens. All character entities in the textual markup (e.g., &) which can be represented in the target character encoding MUST be converted to string form when tokenised. All others (i.e., those which can not be represented in the target character encoding) MUST be encoded using theENTITYtoken. XML parsed entities (both internal and external) MUST be resolved before tokenisation. XML notations and unparsed entities are resolved on an application basis (e.g., using inline opaque data). Attribute names MUST be converted to an attribute start token (which, if so defined, will also specify all or part of the attribute value) or MUST be represented by a singleLITERALtoken. Attribute values MUST NOT be encoded using aLITERALtoken.
One significant consideration with the WBXML encoding of SyncML is the length of each document. Wireless devices have relatively small amounts of memory, and clearly cannot process responses of arbitrary length. The most common example of this restriction can be seen in WML that's used for WAP pages, where each deck of pages should not exceed approximately 1500 bytes. Obviously, the chunking of output into appropriate sizes is a matter of negotiation between the application and the encoding module, as only the application knows the most appropriate place to break.
Rather than leaving it to guesswork, SyncML allows for each
device to indicate the size of the message it can handle. The Meta
Information DTD's MaxMsgSize element is used for this
purpose. For example, look at the extract in Listing 1, taken
from the example.xml file in the accompanying download
(see Resources).
Listing 1. Header information from a SyncML payload, showing meta information
<SyncHdr><VerDTD>1.0</VerDTD> <VerProto>SyncML/1.0</VerProto> <SessionID>10</SessionID> <MsgID>1</MsgID> <Target><LocURI>sync.example.com</LocURI> </Target> <Source><LocURI>520327511080721</LocURI> </Source> <Cred><Meta><Format xmlns='syncml:metinf'>b64</Format> <Type xmlns='syncml:metinf'>syncml:auth-basic</Type> </Meta> <Data>ZDpk</Data> </Cred> <Meta><MaxMsgSize xmlns='syncml:metinf'>2700</MaxMsgSize></Meta> </SyncHdr><dl> |
Basic SyncML server requirements
That's enough bits and bytes for now. Let's conclude by looking at the basic features that a SyncML server is required to implement to provide useful data synchronization functionality.
At a minimum, the server must be able to understand the basic SyncML vocabularies. Additionally, it must support the vCard, vCalendar, vTodo, and RFC2822/RFC2045 specifications if it implements contacts, calendars, tasks, and e-mail respectively. (See Resources for links to these specifications.)
However, a server is not required to implement all of the SyncML protocol's functionality. Full details of conformance requirements can be found in section 7 of the SyncML Representation Protocol specification (see Resources for a link to the SyncML specifications). Table 1 describes the semantics of basic SyncML operations, and summarizes basic functionality.
Table 1: Description of minimum server commands for SyncML
| Command | Description in the context of a SyncML server |
Add | Used to indicate to the server new additions made in the client's database (for example, a new entry in the phone book). |
Alert |
Used to carry notifications to the server. These are requests
for synchronization that carry data about the state of the client's
database. Refer to the Alerts with CmdID2 and 3 in example.xml to see
requests for synchronization of calendar and phone book. The
associated code in the Data element specifies the type
of request, in this case 201, which means "Slow
Synchronization". A full list of codes can be found in the "Errata
to SyncML Sync Representation" specification (see
Resources). |
Copy | >Requests the creation of a copy of an item in a new location on the recipient's database. |
Delete | >Requests the permanent removal of an item from the server's database. |
Get | Explicitly requests the retrieval of a data item with the requested URI from the server's database. Is used as a one-shot command outside of device synchronization. |
Map | Used to maintain a map that correlates local resource identifiers to remote ones. For instance, an item on a phone might have a 2-byte identifier, while a 16-character string is used on the server as the same item's ID. |
Put |
Used to upload a data item to the server to the specified URI.
For instance, in example.xml see the Put with
CmdID 1. This requests the server to store
the phone's capabilities (encoded using the SyncML Device Information DTD) at
the relative URI ./devinf10. Put is used
outside of device synchronization. |
Replace | Requests the replacement of a specified object as part of synchronization. |
Results |
Used to carry the objects returned as a result of a request such
as Get. |
Status | Used to return status codes associated with requests. |
Sync |
Used to wrap a selection of commands (such as Add,
Replace, and Delete) forming a
synchronization. |
The basic requirements for a SyncML client are similar to those for a server. I'll explore these further as I get deeper into implementing the protocol itself in future installments of XML Watch.
SyncML employs the semantics of URIs to indicate items on the local and remote databases. This means that a file system would serve as a reasonable substrate for a synchronization database. With this in mind, the next installment will focus on the construction of a basic server that is able to use either WBXML or XML-encoded SyncML.
- Review the
previous installment of XML Watch, which is an overview of SyncML. The
accompanying download contains the file example.xml
referred to in Listing 1 and Table 1.
- Find the formal specification of WBXML on the WAP
Forum's list
of specifications.
- See Rohit Khare's critique of the WAP Forum's protocols, "W* Effect Considered Harmful."
- Read Bilal Siddiqui's excellent overview of the WBXML
encoding process, including a SyncML example, in
"
Compressing XML -- Part 1, Writing WBXML".
- Try Robin Cover's SyncML page
for a great collection of SyncML-related articles.
- Visit the SyncML Web site, which
provides a home for the SyncML specifications.
- SyncML applications often incorporate support for the following
personal information related specifications: vCard (2.1, 3.0), vCalendar/vTodo (1.0), iCalendar, e-mail, and MIME (RFCs 822, 2822, and 2045).
- Read all of Edd Dumbill's previous XML Watch columns.
- Find more XML resources on the developerWorks XML zone.
- Rational Application Developer for WebSphere Software helps Java™ developers rapidly design, develop, assemble, test, profile and deploy high quality Java/J2EE, Portal, Web, Web services and SOA applications.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Edd Dumbill is managing editor of XML.com and the editor and publisher of the XML developer news site XMLhack. He is co-author of O'Reilly's Programming Web Services with XML-RPC, and co-founder and adviser to the Pharmalicensing life sciences intellectual property exchange. Edd was also program chair of the XML Europe 2002 conference. You can contact Edd at edd@xml.com.
Comments (Undergoing maintenance)





