Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

XML Matters: Lighter than microformats: Picoformats

Ajax without X, Microformats without angle brackets

Dethe Elza (delza@livingcode.org), Technical Architect, Justsystems
Photo of Dethe Elza
Dethe Elza's favorite job title has been Chief Mad Scientist. Dethe can be reached at delza@livingcode.org. He keeps a blog mainly about Python and Mac OS X at http://livingcode.org/ and writes programs for his kids. Suggestions and recommendations on this column are welcome
David Mertz, Ph.D (mertz@gnosis.cx), Author, Gnosis Software, Inc.
Photo of David Mertz
To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Summary:  In a past installment of the XML Matters column, David Mertz explored reStructured Text, a lightweight markup language for formatting mostly text documents, and prior to that he looked at YAML, a lightweight markup language for mostly data documents. With the rise of Ajax and microformats, are these still useful, or are microformats "light" enough? As picograms are lighter than micrograms, we'll explore how lighter than "lightweight" formats JSON (lighter than YAML) and reStructured Text (lighter than HTML) and the lightweight MochiKit library can be used for for AJAX without the X and for generating microformats.

View more content in this series

Date:  01 Aug 2006
Level:  Introductory
Also available in:   Chinese  Russian  Japanese

Activity:  15096 views
Comments:  

XML has been used to markup both documents and structured data, which has been variously interpreted as one of its greatest strengths or failings, depending on your point of view. Where the lines blur between document and data XML can be a winner, but as a general solution, XML can also be more complex than any given specific solution to a problem. David has previously discussed YAML (see Resources). YAML Ain't Markup Language, or alternatively, Yet Another Markup Language (YAML) is a dialect intended to be simpler than XML for transporting data (numbers, strings, lists, simple structures). In this article, we'll focus on JSON (JavaScript Object Notation), which is a proper subset of YAML, but even easier to create and parse. In JavaScript and Python, if the JSON is from a trusted source, it can simply be evaluated by the scripting engine, but parsers exist for that JSON comes from less trusted sources.

While we'll use JSON to approach from the data end of the document-to-data spectrum, you can use structured text formats to handle microformats with document-colored glasses on. There are three major structured text approaches (not counting the many flavours of Wiki markup): reStructured Text, Markdown, and Textile. We'll check each of these out to see how they can handle our microformat example, at least in theory.

What is in a microformat?

What kind of data does a microformat contain? The gist of a microformat is to put a (usually) small amount of data in a form that is easily processed by either human or machine, so you might see some gains if you ignore this constraint and encode the data for human or machine, but not both, and then process it to get the microformatted version. Common examples for hCalendar include the following pieces of information:

  • Summary/Title
  • Location
  • URL
  • Start date (and optionally time)
  • End date (and optionally time)
  • Timezone
  • Description

That's not really so much to encode. Listing 1 shows an example hCalendar event:


Listing 1. hCalendar event
                

<div class="vevent">
    <a class="url"
        href="http://www.vanpyz.org/conference/keynotes.html">
        <abbr class="dtstart" title="20060804T1900-0700">
            August 4, 2006 - 19:00
        </abbr> -
        <abbr class="dtend" title="20060804T2100-0700">
            21:00
        </abbr> -
        <span class="summary">
            Vancouver Python Workshop Keynotes
        </span> - at
        <span class="location">
            Fletcher Challenge Canada Theatre,
            SFU Harbour Center,
            Downtown Vancouver
        </span>
    </a>
    <div class="description">
        <p>The Vancouver Python Workshop keynote address is an
        opportunity to hear from leading members of the Python
        community. This years speakers are Guido van Rossum of
        Google and Jim Hugunin from Microsoft.</p>
    </div>
</div>

If you strip that down to the bare essence of the data, you can encode the same event using JSON. Note that the dates, times, and time zone are encoded using ISO8601, a standard way of formatting dates and times, which is a microformat in itself (see Resources). This example in Listing 2 is valid code in JavaScript or Python:


Listing 2. Encoded using JSON
                

event = {
    'title':  'Vancouver Python Workshop Keynotes',
    'location': 'Fletcher Challenge Canada Theatre, \
                 SFU Harbour Center, Downtown Vancouver',
    'url': 'http://www.vanpyz.org/conference/keynotes.html',
    'start': '2006-08-04T19:00-0700',
    'end': '2006-08-04T21:00-0700',
    'description': 'The Vancouver Python Workshop keynote address\
     is an opportunity to hear from leading members of the Python\
     community. This years speakers are Guido van Rossum of Google\
     and Jim Hugunin from Microsoft.'};

This format, as you will see further below, can also be passed around the Web quite easily. The plot thickens....


You got data in my documents!

Using JSON for microformat content represents the data approach. For the document approach we can come at microformats from the structured text side. What would hCalendar look like if you wrote it using reStructured Text? Well, reST allows you to create plugin extensions using directives, so a directive to parse hCalendar-like data would look like this:


                

.. event::

    LOCATION: Fletcher Challenge Canada Theatre,
    SFU Harbour Center, Downtown Vancouver
    DTSTART: TZID=America/Vancouver:20060804T190000
    DTEND: TZID=America/Vancouver:20060804T210000
    SUMMARY: Vancouver Python Workshop Keynotes
    DESCRIPTION: The Vancouver Python Workshop keynote address
    is an opportunity to hear from leading members of the Python
    community. This years speakers are Guido van Rossum of Google
    and Jim Hugunin from Microsoft.

One interesting thing about this picture: The hCalendar spec is a mapping of the earlier iCalendar standard into a subset of HTML. But what does this mysterious iCalendar look like? Listing 3 shows the same event in iCalendar:


Listing 3. The event in iCalendar
                

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Apple Computer\, Inc//iCal 2.0//EN
BEGIN:VEVENT
DURATION:PT3H
LOCATION: Fletcher Challenge Canada Theatre,
SFU Harbour Center, Downtown Vancouver
DTSTAMP:20060615T034522Z
UID:FE31377A-AB78-4D99-BC25-3F09C99E5928
DTSTART;TZID=America/Vancouver:20060704T100000
SUMMARY:Vancouver Python Workshop Keynotes
DESCRIPTION:The Vancouver Python Workshop keynote address
is an opportunity to hear from leading members of the Python
community. This years speakers are Guido van Rossum of Google
and Jim Hugunin from Microsoft.
END:VEVENT
END:VCALENDAR

Yes, iCalendar is a text-based format that is relatively straightforward to parse, you can find libraries to parse it, and you can manipulate the results in the language of your choice. So while you might implement a directive for reStructured Text that takes all the data elements of hCalendar, you could instead modify the .. include:: directive to allow inclusion of iCalendar content directly, formatted as hCalendar. Since lots of existing tools, such as Apple's iCal, import and export iCalendar format, reusing the format directly might be pretty useful. We won't go into all the details of adding, or modifying, directives in reStructured Text, both in interest of brevity, and because Dethe has already written a tutorial on that with reStructured Text lead developer David Goodger (see Resources).


Lighter than air, but harder to breathe

While JSON provides a data format that is simpler than either YAML or XML, reStructured Text can be a pretty complex tool. A reStructured Text document can be easier to read than XML, and can be simpler if you don't try to use all of its features. But when you start to add a lot of directives, tables, and other features, Dethe's Complexity Meter starts twitching. Since JSON gains simplicity by specialization (a common tradeoff, leading to Domain Specific Languages, or DSLs), you can further simplify structured text if you're willing to sacrifice some of the features and flexibility of reStructured Text. And it turns out that such lighter-than lightweight document-centric formats exist.

We wanted to explore Textile and MarkDown, two lighter-weight structured text options, that make it easier to create microformat data. Unfortunately, the same properties which make these tools good for creating document-centered content cause them to be less useful for more data-centered works such as hCalendar. Textile has no markup for <div>s or <abbr>s, or any way to add classes to <a>s. While it is quite flexible and makes it easy to add class, id, and style attributes, it has no mechanism to add title attributes. Markdown has no mechanism for adding attributes at all (although PyMarkdown has an extension to support them), and it has no structures for <div>s, or <abbr>s either. Either system can pass through pre-formatted HTML, but then you lose the benefit of using a lightweight format in the first place.

The lack of support for microformats in these lightweight markup languages is understandable. Microformats are, at the heart of it, data and both Textile and MarkDown are very focused on being tools for writing for the Web. They try to make it easy to get the words you want, with a bit of formatting, but they ignore issues of inserting machine-readable data. This is one of the old issues with XML, that it is used for both data (machine-readable information) and content (human-legible text). Structured text formats come down firmly on the content side, while YAML and JSON come down firmly on the data side. Each might do the job as well or better than XML, but in places where the border between data and content is blurred, as it can be with microformats, these languages can have trouble crossing the gap.


The REST of JSON

Many examples using JSON show how the browser can receive JSON messages from the server using GET. But the REST paradigm has four verbs: GET, POST, PUT, and DELETE, and they are there for a reason. You should use GET to retrieve a Web resource with no side effects (that is, GET should be idempotent), POST to create a new resource, PUT to update an existing resource, and DELETE to remove a resource. Since we want to be good RESTafarian citizens, let's put together a simple framework that moves JSON data using all four verbs.

Below is the Python code for a CGI program that encapsulates the simplest skeleton for moving around bits of JSON code that we could imagine. It uses Bob Ippolito's simplejson library to make parsing the JSON safer than simply evaling a string received from the Web. There is, of course, much more to creating a REST interface than this (creating a URI for every resource is a biggie). See Resources for pointers to more information. See Listing 4 for the simplest REST/JSON server:


Listing 4. REST/JSON server
                

#!/usr/local/bin/python

'''
Minimal JSON REST server
'''

import os
import cgi
import cgitb
cgitb.enable()
import simplejson

sample_data = {'name': 'King Arthur',
    'quest': 'To seek the holy grail',
    'airspeed_of_unladen_European_swallow': '24 MPH'}




def post(args):
    '''
    Used to create a new resource
    '''
    new_uri = 'http://example.org/' # URI of newly created resource
    sample_data['result_uri'] = new_uri
    print simplejson.dumps(sample_data)


def put(args):
    '''
    Used to update an existing resource
    '''
    sample_data.update(args)
    print simplejson.dumps(sample_data)


def get(args):
    '''
    Used for any side-effect free request
    '''
    print simplejson.dumps(sample_data)

def delete(args):
    '''
    Used to remove an existing note
    '''
    print simplejson.dumps({'content':'None'})


def main():
    method = os.environ.get('REQUEST_METHOD', 'GET')
    if method == 'GET':
        args = None
    else:
        json_value = cgi.FieldStorage().getfirst('json_value')
        args = simplejson.loads(json_value)
    if method == 'POST':
        method = args['method'] # fix for browsers that don't
                                # send PUT or DELETE properly

    # Start an HTTP response
    print "Content-type: text/plain"
    print ""

    # Python dispatch idiom using a dictionary vs. case-statement
    dict(POST=post,
         PUT=put,
         GET=get,
         HEAD=get,
         DELETE=delete)[method](args)

if __name__ == '__main__':
    main()

Listing 4 should handle the basics of server-side REST and JSON. For the client side, Dethe wrote a small bit of JavaScript which can be embedded in a Web page. This example also uses code from Bob Ippolito, in this case the MochiKit JavaScript library, which we're fond of. In Listing 5, you see code for the simplest JSON/REST client:


Listing 5. Simplest JSON/REST client
                

//
//   rest.js
//
//   Simplistic framework for bi-directional JSON transport via REST
//

SERVER = 'http://localhost/rest_cgi.py'

function handle_success(result){
    alert('Success: ' + result.responseText);
}

function handle_error(result){
    logError(result.responseText);
}

var sample_content = {'name': 'Sir Lancelot',
         'quest': 'To seek the holy grail',
         'favorite_color': 'blue'};

function _sendVerb(verb, content, handler){
    var req = getXMLHttpRequest();
    // Certain browsers don't actually support verbs besides GET
    // and POST, spoiling the party for the rest of us
    content['method'] = verb;
    if (verb != 'GET'){
        verb = 'POST'
    }
    req.open(verb, SERVER);
    # queryString is the easiest way to send data to the server
    json_content = queryString(['json_value'],
        [serializeJSON(content)]);
    var req = sendXMLHttpRequest(req, json_content)
    req.addCallbacks(handler, handle_error);
}

function get(){
    _sendVerb('GET', '', handle_success);
}

function post(){
    _sendVerb('POST', sample_content, handle_success);
}

function put(){
    _sendVerb('PUT', sample_content, handle_success);
}

function test_delete(){
    _sendVerb('DELETE', sample_content, handle_success);
}


JSON and the micronauts

Given the code in Listings Listing 4 and Listing 5, you can send bits of JSON around pretty freely, but you still need to do something with it. Given all that's come before, the natural thing is to convert it back into hCalendar, or iCalendar, depending on whether you're handling the client side or the server side and your goals. For instance, you can use MochiKit (especially the DOM shortcut functions) on the client-side to create a simple template for the JSON data in Listing 2 and reformat it to hCalendar right in the browser, by replacing the handleSuccess() function as in Listing 6:


Listing 6. Replacing the handleSuccess() function
                

function handleSuccess(result){
    var data = evalJSON(result.responseText);
    // Convert json data to DOM nodes
    var hCalendarData = objectToCalendar(data);
    // In a real application we would add/replace this in our
    // Web page somewhere, but for this demo we'll just display it.
    alert(toHTML(hCalendarData));
}

var ABBR = createDOMFunc('abbr');

function pad2(number){
    if (number < 10){
        return '0' + number;
    }else{
        return '' + number;
    }
}

function time(date){
    return pad2(date.getHours()) +
        ':' +
        pad2(date.getMinutes());
}

function objectToCalendar(obj){
    var startDate = isoTimestamp(obj.start);
    var endDate = isoTimestamp(obj.end);
    var cal = DIV({'class':'vevent'},
                  A({'class':'url', 'href': obj.url},
                      ABBR({'class':'dtstart', 'title': obj.start},
                          startDate.toLocaleDateString(),
                          ' - ',
                          time(startDate)
                      ),
                      ' - ',
                      ABBR({'class': 'dtend', 'title': obj.end},
                          time(endDate)
                      ),
                      ' - ',
                      SPAN({'class': 'summary'}, obj.summary),
                      ' - at ',
                      SPAN({'class': 'location'}, obj.location)
                  ),
                  DIV({'class':'description'},
                      P(obj.description)
                  )
              );
    return cal;
}

In case it isn't obvious what the code in Listing 6 does, MochiKit defines a lot of HTML DOM shortcut functions, which allow you to create the DOM structure of an hCalendar fragment quite easily. With no built-in shortcut for <abbr> elements, the code creates one of those as well. MochiKit does convert to and from ISO8601 format and JavaScript Date objects, but the built-in string rendering of the Date object isn't quite what you want, so you can spend a little code doing it yourself. Nothing terribly fancy here, but if it is still not clear then we encourage you to check out the first-rate MochiKit documentation (see Resources).

You can now send JSON data to the browser using AJAX and format it into hCalendar using in-browser templating. Besides being fully buzzword-compliant, it approaches something that you might actually find useful.


Weeding in the Garden of Forking Paths

What has all of this gained you? Is it really any more useful to include iCalendar in your reStructured Text, or to send bits of JSON to the browser instead of hCalendar? Why not just encode it in binary, or XML, and be done with it?

Well, whether the techniques explored here are useful to you or not comes down to the old computer science adage: it depends. If you build REST-based Web services, then the transfer of data through JSON instead of XML can be faster and more convenient. If you use reStructured Text for easier (and more readable) HTML content creation, then the ability to include microformats directly, either in their HTML form (hCalendar) or a plaintext form (iCalendar) can make things easier. When you build a new system, both techniques are worth considering.

Ultimately the goal of microformats is to make data easier for humans to parse, while keeping the data friendly for machines to parse. The philosophy of microformats is to reuse existing semantic formats, especially HTML, as much as possible. AJAX, JSON, and REST all make building systems for microformats and other content easier and richer, while lightweight markup makes creating and editing content for such systems easier, faster, and more human-friendly. So, all of these techniques have a place, and you might notice a synergistic effect from using them together.



Download

DescriptionNameSizeDownload method
Example codex-matters46-example.zip3KB HTTP

Information about download methods


Resources

Learn

  • "XML Matters: reStructured Text" (David Mertz, developerWorks, February 2003): Discover how lightweight markup can be an improvement over XML.

  • "XML Matters: YAML improves on XML" (David Mertz, developerWorks, October 2002): Explore the use of YAML where you might have used XML for data transfer.

  • "XML Matters: Pipestreaming microformats" (Dethe Elza, developerWorks, April 2006): Learn when to use microformats over arbitrary XML dialects.

  • "XML Matters: Up and Atom" (Dethe Elza, developerWorks, May 2006): Explore the Atom Publishing Protocol as an ideal REST interface.

  • "Introducing JSON:" Visit Douglas Crockford's site, which brought JSON to the world. Includes links to articles, examples, and libraries for using JSON with many different programming languages.

  • Restructured Text: Dig into the Python docutils project -- mostly on reStructured Text, which can generate not only HTML, but XML, PDF, slides, DocBook, and other formats. The authors used reStructured Text to write these articles, for instance.

  • Markdown: Check out John Gruber's entry in the lightweight markup sweepstakes. The authors find using it produces some of the clearest and most readable text, and use it for blogs, but that clarity comes with a price: it is hard to extend.

  • Textile: Check out Dean Allen's format for lightweight markup -- it is nearly as readable as Markdown (in the completely subjective opinion of the authors), but allows IDs, class attributes, and CSS styles to be embedded directly in the document (and the resulting HTML), which reduces readability somewhat in order to provide more flexibility.

  • ISO8601: Read Markus Kuhn's summary of the ISO8601 patterns for formatting dates, times, and time ranges in text.

  • Atom Publishing Protocol specification: Read the details of this new standard for content publishing and management.

  • "How to create a RESTful Protocol:" Find more detail about what issues you should think about when creating a REST interface in this article by Joe Gregario.

  • "Calculating the average airspeed of an un-laden sparrow:" Wherein the question that has most perplexed generations of Arthurian scholars is revealed.

  • "Creating reStructured Text Directives:" Read this guide to adding new directives to the reStructured Text processor, by Dethe Elza, David Goodger, and Felix Wiemann.

  • IBM XML 1.1 certification: Find out how you can become an IBM Certified Developer in XML 1.1 and related technologies.

  • XML: See developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.

  • developerWorks technical events and webcasts: Stay current with technology in these sessions.

Get products and technologies

  • "Simple JSON:" Try this Python library for parsing and serializing JSON data safely.

  • MochiKit: Get MochiKit to normalize JavaScript and the DOM across browsers, and bring in many utilities, mostly with a Pythonic flavor.

  • PyTextile: Get the Python port of Dean Allen's Textile lightweight markup system. The link goes through archive.org's Wayback Machine since the developer's site seems to have vanished.

  • iCalendar for Python: Try the iCalendar package, a parser/generator of iCalendar files for use with Python. It follows the RFC 2445 (iCalendar) specification.

  • vobject: From the Chandler project, get a Python package for parsing and generating vCalendar and vCard files.

  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.

Discuss

About the authors

Photo of Dethe Elza

Dethe Elza's favorite job title has been Chief Mad Scientist. Dethe can be reached at delza@livingcode.org. He keeps a blog mainly about Python and Mac OS X at http://livingcode.org/ and writes programs for his kids. Suggestions and recommendations on this column are welcome

Photo of David Mertz

To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=150887
ArticleTitle=XML Matters: Lighter than microformats: Picoformats
publish-date=08012006
author1-email=delza@livingcode.org
author1-email-cc=dwxed@us.ibm.com
author2-email=mertz@gnosis.cx
author2-email-cc=dwxed@us.ibm.com