 | Level: Intermediate Edd Dumbill (edd@xml.com), Editor and publisher, xmlhack.com
01 Mar 2003 In the second installment of his quest to make his data available wherever and whenever he wants by using SyncML, Edd Dumbill encounters Wireless Binary XML (WBXML) and examines the minimum functionality required for a SyncML server.
In my
previous column, I introduced my mission to investigate and
deploy SyncML. Increasingly, people are becoming users of multiple
devices, depending on location and occupation. When they travel
or change devices, they want their data to come with them. This is the
central function of the SyncML XML protocol, which is rapidly
becoming a checkbox item on the feature lists of today's mobile
phones.
Last time, I gave a high-level overview of SyncML, and showed what
happened when I captured the first SyncML message sent from my
Ericsson R520m mobile phone to my Web server. The most surprising
thing about this message was that it was encoded not in XML, but in
Wireless Binary XML (WBXML). WBXML is a standard developed by the
WAP Forum, and is intended to provide a space and a CPU-efficient XML
representation.
Many XML developers have never encountered WBXML before, as it is
largely used in proprietary cell phone networks. However,
supporting SyncML requires the ability to handle this
encoding, as well as the straight XML encoding. In this installment, I
give a brief overview of the WBXML encodings, the steps
involved in processing the WBXML encoding of SyncML into XML, and
what is required to go back the other way, from XML to WBXML.
I also introduce the main elements of the SyncML
protocol, to set the stage for creating a SyncML server.
WBXML overview
The topic of binary notations for XML is one of the more
enduring permathreads that have continued through the five or so years of XML
developer discussion. At a high level, opinion is divided into two
camps: The first of these favors a custom encoding, such as
that used by WBXML; the second maintains that compressing
normal XML will achieve similar space savings. To me, the second
approach has always seemed preferable, as it provides the ability
to re-use common and well-known software components.
However, the WAP Forum decided to pursue a custom encoding scheme -- WBXML.
Along with several other technical decisions made by the forum, this has received its share
of criticism over time. (See Resources for links to critiques of these specifications.)
As Figure 1 from the previous column might indicate to you,
WBXML takes a tokenizing approach to encoding XML. The most common
constructs -- such as tags, attributes, and attribute values -- are
reduced to one-byte tokens, with some literal text left in the
clear. WBXML also allows for common strings to be reduced to tokens
as well, with the token table sent as part of the document
preamble.
WBXML implements the equivalent of XML namespaces through code pages.
As only 27 tokens are available for elements -- 5 bits, with the values 0
through 4 being reserved -- complex vocabularies need to be
multiplexed by organizing token sets into separate code pages.
Switching code pages is analogous to switching the
default namespace. The WBXML encoding of SyncML uses a code page
for each of the DTDs used in the protocol: SyncML, SyncML Meta
Information, and SyncML Device Information.
Processing WBXML is reasonably simple: It is a matter of reading
the document preamble, selecting a set of appropriate token tables
(specific to the DTD of the application), and then consuming the
tokens of the document. Possible tokens include start/end
element, switch code page, entity, processing instruction,
tokenized string, literal string, various extension tokens, and
opaque data. This latter, opaque, is the WBXML equivalent of
XML's CDATA. The extension tokens are used in different ways by
different WBXML applications. The SyncML encoding doesn't use them
at all, but WML uses them for including tokenized body-text strings
from a string table sent in the preamble. It should be obvious by
now that WBXML isn't a general-purpose binary encoding of XML.
Every application requires at least a token table to perform
lookups, and often needs code to interpret the extension tokens
as well.
For these SyncML investigations, I propose to convert the incoming
WBXML representation of SyncML into its XML representation, to make
debugging and module separation more convenient. This still leaves the issue of how to go back the other way.
Converting back to XML
If you thought interpreting WBXML was awkward, I'm afraid you
will view creating WBXML even less favorably. Again, although there
is a general-purpose algorithm for converting XML into WBXML, this
is complicated by the application-specific utilization of
extensions. To give you a good idea of what is involved, read this
extract from the WAP Forum's WBXML specification
(WAP-192-WBXML-20010725-a Version 1.3 -- see Resources for a link to
this document).
The process of tokenising an XML document MUST convert all
markup and XML syntax (i.e., entities, tags, attributes, etc.) into
their corresponding tokenised format. All comments, the XML
declaration, and the document type declaration MUST be removed.
Processing instructions intended for the tokeniser MAY be removed;
all other processing instructions MUST be preserved. All text and
character entities MUST be converted to string (e.g., STR_I) or
entity (ENTITY) tokens. All character entities in the textual
markup (e.g., &) which can be represented in the target
character encoding MUST be converted to string form when tokenised.
All others (i.e., those which can not be represented in the target
character encoding) MUST be encoded using the ENTITY token. XML
parsed entities (both internal and external) MUST be resolved
before tokenisation. XML notations and unparsed entities are
resolved on an application basis (e.g., using inline opaque data).
Attribute names MUST be converted to an attribute start token
(which, if so defined, will also specify all or part of the
attribute value) or MUST be represented by a single LITERAL token.
Attribute values MUST NOT be encoded using a LITERAL
token.
One significant consideration with the WBXML encoding of SyncML
is the length of each document. Wireless devices have relatively
small amounts of memory, and clearly cannot process responses of
arbitrary length. The most common example of this restriction can
be seen in WML that's used for WAP pages, where each deck of pages
should not exceed approximately 1500 bytes. Obviously, the chunking
of output into appropriate sizes is a matter of negotiation between
the application and the encoding module, as only the application
knows the most appropriate place to break.
Rather than leaving it to guesswork, SyncML allows for each
device to indicate the size of the message it can handle. The Meta
Information DTD's MaxMsgSize element is used for this
purpose. For example, look at the extract in Listing 1, taken
from the example.xml file in the accompanying download
(see Resources).
Listing 1. Header information from a SyncML payload, showing meta information
<SyncHdr><VerDTD>1.0</VerDTD>
<VerProto>SyncML/1.0</VerProto>
<SessionID>10</SessionID>
<MsgID>1</MsgID>
<Target><LocURI>sync.example.com</LocURI>
</Target>
<Source><LocURI>520327511080721</LocURI>
</Source>
<Cred><Meta><Format xmlns='syncml:metinf'>b64</Format>
<Type xmlns='syncml:metinf'>syncml:auth-basic</Type>
</Meta>
<Data>ZDpk</Data>
</Cred>
<Meta><MaxMsgSize xmlns='syncml:metinf'>2700</MaxMsgSize>
</Meta>
</SyncHdr><dl>
|
Basic SyncML server requirements
That's enough bits and bytes for now. Let's conclude by looking
at the basic features that a SyncML server is required to implement to provide useful data synchronization functionality.
At a minimum, the server must be able to understand the basic SyncML
vocabularies. Additionally, it must support the vCard,
vCalendar, vTodo, and RFC2822/RFC2045 specifications if it
implements contacts, calendars, tasks, and e-mail respectively. (See
Resources for links to these specifications.)
However, a server is not required to implement all of the SyncML
protocol's functionality. Full details of conformance requirements
can be found in section 7 of the SyncML Representation Protocol
specification (see Resources for a link to the SyncML
specifications). Table 1 describes the semantics
of basic SyncML operations, and summarizes basic
functionality.
Table 1: Description of minimum server commands for SyncML
|
Command |
Description in the context of a SyncML
server | |
Add
|
Used to indicate to the server new additions made in the
client's database (for example, a new entry in the phone
book). | |
Alert
|
Used to carry notifications to the server. These are requests
for synchronization that carry data about the state of the client's
database. Refer to the Alerts with CmdID
2 and 3 in example.xml to see
requests for synchronization of calendar and phone book. The
associated code in the Data element specifies the type
of request, in this case 201, which means "Slow
Synchronization". A full list of codes can be found in the "Errata
to SyncML Sync Representation" specification (see
Resources). | |
Copy
|
Requests the creation of a copy of an item in a new location on
the recipient's database. | |
Delete
|
Requests the permanent removal of an item from the server's
database. | |
Get
|
Explicitly requests the retrieval of a data item with the
requested URI from the server's database. Is used as a one-shot
command outside of device synchronization. | |
Map
|
Used to maintain a map that correlates local resource identifiers to
remote ones. For instance, an item on a phone might have a 2-byte
identifier, while a 16-character string is used on the server as the
same item's ID. | |
Put
|
Used to upload a data item to the server to the specified URI.
For instance, in example.xml see the Put with
CmdID 1. This requests the server to store
the phone's capabilities (encoded using the SyncML Device Information DTD) at
the relative URI ./devinf10. Put is used
outside of device synchronization. | |
Replace
|
Requests the replacement of a specified object as part of
synchronization. | |
Results
|
Used to carry the objects returned as a result of a request such
as Get. | |
Status
|
Used to return status codes associated with
requests. | |
Sync
|
Used to wrap a selection of commands (such as Add,
Replace, and Delete) forming a
synchronization. |
The basic requirements for a SyncML client are similar to those
for a server. I'll explore these further as I get deeper into
implementing the protocol itself in future installments of XML Watch.
SyncML employs the semantics of URIs to indicate
items on the local and remote databases. This means that a
file system would serve as a reasonable substrate for a
synchronization database. With this in mind, the next installment will
focus on the construction of a basic server that is able to use
either WBXML or XML-encoded SyncML.
Resources - Review the
previous installment of XML Watch, which is an overview of SyncML. The
accompanying download contains the file example.xml
referred to in Listing 1 and Table 1.
- Find the formal specification of WBXML on the WAP
Forum's list
of specifications.
- For a collection of critiques on the WAP Forum's protocols, see
"The
Harm of the Wireless Application Protocol". In particular,
Rohit Khare's "
W* Effect Considered Harmful" is worth reading.
- Read Bilal Siddiqui's excellent overview of the WBXML
encoding process, including a SyncML example, in
"
Compressing XML -- Part 1, Writing WBXML".
- Try Robin Cover's SyncML page
for a great collection of SyncML-related articles.
- Visit the SyncML Web site, which
provides a home for the SyncML specifications.
- SyncML applications often incorporate support for the following
personal information related specifications: vCard (2.1, 3.0), vCalendar/vTodo (1.0), iCalendar, e-mail, and MIME (RFCs 822, 2822, and 2045).
- Read all of Edd Dumbill's previous XML Watch columns.
- Find more XML resources on the developerWorks XML zone, and check out the Wireless zone for more information on mobile technologies.
- IBM WebSphere Studio provides a suite of tools that automate XML development, both in Java and in other languages. It is closely integrated with the WebSphere Application Server, but can also be used with other J2EE servers.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
About the author
Rate this page
|  |