In this section, you'll look at XMPP, its origins, and why it is a suitable protocol for real-time web communications. You will examine the components of an XMPP communication setting, and review some examples of how these can be used.
XMPP is an XML-based set of technologies for real-time applications. It was originally developed as a framework to support instant messaging and presence applications within enterprise environments. At the time, the existing instant-messaging networks were proprietary and largely inappropriate for enterprise use. AOL Instant Messenger, for example, could not be adapted for secure communications inside a company. Although commercial solutions existed, their fixed feature sets typically could not be adapted to meet an organization's needs. XMPP, then called Jabber, allowed organizations to build their own custom tools to facilitate real-time communication, as well as install ready-made third-party solutions.
XMPP is a decentralized communication network, which means that any XMPP user can message any other XMPP user, network infrastructure permitting. XMPP servers can also talk to one another with a specialized server-to-server protocol, providing interesting potential for decentralized social networks and collaboration frameworks, although that is beyond the scope of this tutorial.
As its name suggests, XMPP can be used to meet a wide variety of time-sensitive feature requirements. Indeed, Google Wave, a large multiuser collaboration environment, uses XMPP as the basis for its federation protocol. Although XMPP emerged from a requirement for person-to-person instant messaging, it is by no means limited to that task.
To facilitate message delivery, each XMPP client user must have a universally unique identifier. For legacy reasons, these are referred to as Jabber IDs, or JIDs. Because of the protocol's distributed nature, it's important that a JID contain all the information required to contact a user: no central repository links users to the servers they connect to. The structure of a JID is similar to an email address (although there is no requirement for a JID to double as a valid email recipient).
Both client and server nodes, which I refer to collectively as XMPP entities,
have JIDs. John Doe, an employee at SomeCorp, might have the JID
somecorp.com is the address of the XMPP server for
John.Doe is the username for John Doe.
JIDs can also have a resource attached to them. These allow for further addressing
granularity beyond an identifier for an XMPP entity; for example, whereas the preceding example can represent John Doe in totality,
John.Doe@somecorp.com/Work might be used to send data to his work-related tools.
These resources can take any user-defined name, and an XMPP entity can have any number of resources. As well as being context-dependent, they can be bound to a device, tool, or workstation. For your Pingstream example, each visitor to the website will log into the XMPP server as the same user but have different resources.
A real-time messaging system using XMPP comprises three broad categories of communication:
- Messaging, wherein data is transferred between parties
- Presence, which allows users to broadcast their online status and availability
- Info/query requests, which allow an XMPP entity to make a request and receive a reply from another entity
These are complementary. For example, there's often no point sending data to a user, or making an info/query request of an entity, if the user or entity is offline (although in many use cases it is desirable for the server to hold messages for users until they return). Each of them is delivered through a complete XML stanza —a discrete item of information expressed in XML.
XMPP stanzas of all three types have the following attributes in common:
from: The JID of the source XMPP entity'
to: The JID of the intended recipient
id: An optional identifier for the conversation
type: Optional subtype of the stanza
xml:lang: If the content is human-readable, description of the message's language
Data transfer over XMPP takes place over XML streams, operating on port 5222 by
default. These are effectively two complete XML documents, each corresponding to a
direction of communication. Once a session is established, the
stream element is opened. It will envelop the entire communication document. The stanzas are then injected into the second level of the document. Finally, once communication ends, the
stream element is closed, forming a complete document.
For example, Listing 1 shows a
stream element establishing communication from the client to the server:
Listing 1. Stream tag establishing communication from client to server
<stream:stream from="[server]" id="[unique ID over conversation]" xmlns="jabber:client" xmlns:stream="http://etherx.jabber.org/streams" version="1.0">
Once communication is established, the client can send a message to another user with the
message element, which can contain any of the following child elements:
subject: A human-readable string representing the message topic.
body: A human-readable string representing the message body. Multiple body tags can be included, if each one has a different
xml:langis the only possible attribute.)
thread: A unique identifier representing a message thread. Client software can then use this to string together related messages.
However, messages can be as simple as the one in Listing 2:
Listing 2. Sample message
<message from="sendinguser@somedomain" to="recipient@somedomain" xml:lang='en'> <body> Body of message </body> </message>
Message stanzas are the most useful stanzas for providing real-time web interfaces. The publish-subscribe model—an alternative to using messages to transmit data in real-time web applications—is described next.
Info/query stanzas can have a wide variety of functionality. One example is the publish-subscribe model, wherein a publisher notifies the server of updates to a particular resource, and the server in turn notifies all XMPP users who have opted to subscribe to these notifications and are authorized to do so.
A series of items from the publisher are encoded as stanzas in the Atom XML-based publishing format. Each item is contained within an
item element, then collectively within a
pubsub element, and finally as an info/query stanza. In Listing 3 (taken from the XMPP publish-subscribe specification) Shakespeare's Hamlet (with JID
email@example.com/blogbot) is publishing an update to the
pubsub.shakespeare.lit pubsub update node with his famous soliloquy:
Listing 3. Update to the
pubsub.shakespeare.litpubsub update node
<iq type="set" from="firstname.lastname@example.org/blogbot" to="pubsub.shakespeare.lit" id="pub1"> <pubsub xmlns="http://jabber.org/protocol/pubsub"> <publish node="princely_musings"> <item> <entry xmlns="http://www.w3.org/2005/Atom"> <title>Soliloquy</title> <summary> To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? </summary> <link rel="alternate" type="text/html" href="http://denmark.lit/2003/12/13/atom03"/> <id>tag:denmark.lit,2003:entry-32397</id> <published>2003-12-13T18:30:02Z</published> <updated>2003-12-13T18:30:02Z</updated> </entry> </item> </publish> </pubsub> </iq>
Info/query stanzas are also used to request information about a particular XMPP entity. For example, in the stanza in Listing 4,
boreduser@somewhere is looking for public items owned by
Listing 4. User looking for public items owned by
<iq type="get" from="boreduser@somewhere" to="friendlyuser@somewhereelse" id="publicStuff"> <query xmlns="http://jabber.org/protocol/disco#items"/> </iq>
friendlyuser@somewhereelse responds with a list of items that can be subscribed to using publish-subscribe, as in Listing 5:
Listing 5. Response with a list of items
<iq type="result" from="friendlyuser@somewhereelse" to="boreduser@somewhere" id="publicStuff"> <query xmlns="http://jabber.org/protocol/disco#items"> <item jid="stuff.to.do" name="Things to do"/> <item jid="stuff.to.not.do" name="Things to avoid doing"/> </query> </iq>
Each item returned in the result info/query stanza in Listing 5 has a JID that can be subscribed to. Info/query stanzas also allow for a wide variety of server information requests that are outside the scope of this tutorial. Many of them are useful in a web application context for multiserver environments, or as the basis of sophisticated decentralized collaboration frameworks.
Presence information is contained within a presence stanza. If the
type attribute isn't present, the XMPP client application assumes
that the user is online and available. Otherwise,
be set to
unavailable, or to the pubsub-specific values
can also be an error or a probe for the presence information of another user.
A presence stanza can contain the following child elements:
show: A machine-readable value indicating the overall category of online status to display. This can be
chat(available and interested in communication),
dnd(do not disturb), or
xa(away for an extended time).
status: A human-readable counterpart to show. The value is a user-definable string.
priority: A numeric value between -128 and 127 that defines the priority by which messages should be routed to the user. If the value is negative, messages for the user are held.
For example, the
boreduser@somewhere might use the stanza in Listing 6 to indicate a willingness to chat:
Listing 6. Sample presence notification
<presence xml:lang="en"> <show>chat</show> <status>Bored out of my mind</status> <priority>1</priority> </presence>
Note the absence of a
friendlyuser@somewhereelse, might probe the
boreduser@somewhere by sending the stanza in Listing 7:
Listing 7. Probing a user's status
<presence type="probe" from="friendlyuser@somewhereelse" to="boreduser@somewhere"/> Boreduser@somewhere's server would then respond with a tailored presence response: <presence xml:lang="en" from="boreduser@somewhere" to="friendlyuser@somewhereelse"> <show>chat</show> <status>Bored out of my mind</status> <priority>1</priority> </presence>
These presence values derive from person-to-person messaging software. It isn't clear
how the value of the
show element—normally used to
determine a status icon to display to other users—is used apart from a literal
chat application. Status values might find use in microblogging tools; for example,
changes to the status field of a user in Google Talk (an XMPP chat service) can optionally be imported as microblog entries in Google Buzz.
Another possibility is to use status values as carriers for per-user application state data. Although the specification defines status as being human-readable, nothing stops you from storing any string there to suit your purposes. For some applications, it might not be human-readable, or it might carry a data payload in the form of microformats.
You can set presence information independently for each resource owned by an XMPP entity, so only one user account is required to access and receive data for all the tools and contexts connected to a single user within an application. Each resource can be assigned an independent priority level; the XMPP server will try to deliver messages to resources with higher priorities first.
application.mydomain.com, any XMPP communication must also take place at
Firewalls are an additional complication. Ideally, if you use XMPP as the basis of the real-time element of your web interface, you want it to work for users behind a firewall. But corporate firewalls usually leave open only a few ports for a small number of protocols, to allow web data, email, and similar communications to get through. By default, XMPP uses port 5222, which a corporate firewall might well block.
Suppose you know that the firewall your users might sit behind will allow HTTP on port 80 (the default protocol and port for accessing the web). It is optimal if your XMPP communications can tunnel over HTTP on that port. However, HTTP isn't designed for continuous connections. The architecture of the web is different from the communications architecture required for real-time data.
Enter the standard for Bidirectional-streams Over Synchronous HTTP (BOSH). It provides an emulation layer for bidirectional, synchronous data. A long HTTP connection (with a duration of a minute or two) is established with an XMPP server. If new data arrives during that time, the HTTP request returns data and closes; otherwise, it simply expires. Either way, once a request has closed, another one is reestablished. Although the result is a series of repeated connections to a web server, it is an order of magnitude more efficient than Ajax polling, particularly because the connections are made to a specialized server rather than directly to the web application.
Now that you understand how XMPP fits into the real-time web, you're ready to download and set it up so that you can begin to create the Pingstream application.