Process XML in the browser using jQuery

Navigate some major pitfalls to gain the benefits of the popular Web application API

The popular jQuery JavaScript library is best known for its use working with HTML, but you can also use it to process XML, if you're careful and aware of the pitfalls. This article shows how to use jQuery to process the Atom Web feed format. Web feed XML is perhaps the most pervasive XML format around, and the main fulfillment of the promise of XML on the Web. But most such formats use XML namespaces, which cause issues with many popular JavaScript libraries, including jQuery.

Uche Ogbuji (uche@ogbuji.net), Partner, Zepheira, LLC

Photo of Uche OgbujiUche Ogbuji is a partner at Zepheira, LLC, a solutions firm specializing in the next generation of Web technologies. Mr. Ogbuji is lead developer of Akara, an open source platform for XML and other data integration services. He is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. He also writes poems and essays, and is Associate Poetry Editor at The Nervousbreakdown. You can find more about Mr. Ogbuji at his Weblog Copia.



08 December 2009

Also available in Chinese Japanese Portuguese

XML is SGML for the Web, but it hasn't made as big a splash on the Web as the XML community would like. The most prominent effort for XML on the Web, XHTML, has been dogged by politics and design-by-committee, and other ambitious, technically sound specs such as XForms and SVG have struggled with slow uptake. The success of XML on the Web has come in sometimes unexpected directions, including the popularity of Web feeds, XML formats such as the RSS flavors and Atom.

Frequently used acronyms

  • Ajax: Asynchronous JavaScript + XML
  • API: Application programming interface
  • CSS: Cascading Stylesheets
  • DOM: Document Object Model
  • HTML: Hypertext Markup Language
  • RSS: Really Simple Syndication
  • SGML: Standard Generalized Markup Language
  • SVG: Scalable Vector Graphics
  • URI: Uniform Resource Identifier
  • URL: Uniform Resource Locator
  • W3C: World Wide Web Consortium
  • XHTML: Extensible Hypertext Markup Language
  • XML: Extensible Markup Language

For XML on the Web, just as for any other technology on the Web, the browser is the centerpiece, but most discussion of processing XML on the Web focuses on the server-side. In the Firefox and XML series here at developerWorks (see Resources), I covered various ways to work with XML in the Firefox browser. Unfortunately processing XML cross-browser is even quirkier than processing HTML cross-browser, which is part of the reason so many treatments of XML on the Web stick to the relatively safe territory of the server side.

Many dynamic HTML developers are tired of the cross-browser pain and scripting quirks across browser. The emergence of several excellent JavaScript libraries makes life easier for developers. One of the most popular of these libraries is jQuery, which has been covered in several articles here on developerWorks. You can also use jQuery to process XML, if you learn how to drive around the monster potholes. In this article I'll demonstrate the use of jQuery with XML in a practical scenario, working with Atom Web feeds, introducing a useful pattern to establish XML processing in jQuery, and dealing with the unfortunate, practical problems. You will need basic familiarity with XML, XML Namespaces, HTML, JavaScript, and the jQuery library (see Resources for more introductory jQuery articles).

XML namespace misery

I'll start with the biggest problem. jQuery does not at all deal with XML namespaces. It's been a well-known problem for a long time, and people have come up with all sorts of unsatisfactory, and outright broken workarounds. The ideal solution would be for jQuery to support CSS Level 3 namespace selectors (still a W3C working draft, see Resources), which adds a new class of selectors as follows:

@namespace ex url(http://example.com);
ex|quote { font-weight: bold }

The first line is a namespace prefix declaration for the http://example.com and the second line is a type selector using the new namespace component, where a declared prefix is separated from the local name by the vertical bar character. Unfortunately, this is not supported in jQuery, so people have resorted to all sorts of hackery to deal with namespaces.

Pretending the prefix is significant

One of the most common hacks put forward for dealing with XML and namespaces in jQuery is to ignore the namespace, and select on the full qname (the prefix as well as the local part).

    $(xml).find("x\\:quote").each(function() {
      //process each node
    });

This code selects by jQuery's concept of the node's name, which is the DOM nodeName property. It contains a colon, which is a reserved character in jQuery selectors, and must be escaped using a backslash. The backslash is a reserved character in JavaScript strings and must be doubled. This hack does not work in the case of namespace-equivalent documents that use different prefixes.

Playing with attribute filters

Some people have reported success with variations of the following approach, using jQuery attribute filters on the pseudo-attribute nodeName:

    $(xml).find("[nodeName=x:quote]").each(function() {
      //process each node
    });

In versions earlier than jQuery 1.3.x, you would add @ right before nodeName. But, this has the same fundamental problem as the approach in the previous section, Pretending the prefix is significant. It will break in many real-world namespace scenarios. I tried the following variation, which makes more sense:

    $(xml).find("[namespaceURI='http://example.com'][localName='quote']")
    .each(function() {
      //process each node
    });

Unfortunately this does not work.

In search of a good plug-in

This mess is not entirely jQuery's fault. DOM provides efficient methods for finding nodes: getElementsByTagName and getElementsByTagNameNS. The latter is designed to be namespace-aware, accepting the namespace URI and ignoring the prefix, but unfortunately every browser supports it except for Microsoft® Internet Explorer®. Nevertheless, jQuery's ambition is to deal with such browser bumpiness so that people don't have to. One possible, weak excuse is that jQuery largely bases its selectors on CSS and that even W3C CSS Level 3 namespace selectors unfortunately haven't made it past working draft stage. jQuery bug #155, "Get Namespaced Elements in XML Documents" (see Resources), covers these problems, but hasn't ben addressed in three years.

Ryan Kelly ran into this problem and made a valiant attempt to create a jQuery plug-in, jquery.xmlns.js, for XML Namespace Selectors (see Resources). It looks to support code such as the following.

$.xmlns["ex"] = "http://example.com";
$(doc).find("ex|quote").each(...);

The first line is a global namespace declaration for the plug-in—global because of limitations in the underlying jQuery machinery. It does provide for a non-global block in typical jQuery idiom for namespace scoping. Unfortunately, I've had very mixed success with this extension. I hope that it changes, and that it eventually makes its way into jQuery proper.

A simpler plug-in

The solution I finally chose was to create a simple plug-in that doesn't do anything special with jQuery selectors, but rather adds a new filter to which you can directly pass a namespace and local name to trim a result set to matching nodes. You use it as follows:

  $(xml).find('*').ns_filter('http://example.com', 'quote').each(function(){
  .each(function() {
    //process each node
  });

ns_filter is the special filter that I wrote. The need to do a separate find('*') might seem inelegant, and a simpler variant might be:

  $(xml).find('quote').ns_filter('http://example.com').each(function(){
  .each(function() {
    //process each node
  });

However, this is not feasible because you cannot trust jQuery to treat a query such as find('quote') in a namespace-neutral manner (that is, as a local-name selector). The implementation of my filter is provided in the next section as part of a general system for setting up jQuery to process XML. I tested it under Firefox 3.5.5 and Safari 4.0.4 on Mac OS X Snow Leopard, and under recent versions of Internet Explore 7 and Internet Explorer 8 on Windows® XP.

The jQuery XML workbench

The namespace problems are just a symptom of the fact that, in the end, jQuery is an HTML tool. I've found that the most useful pattern for using jQuery with XML is to create an HTML workbench for XML documents, which invokes the script though reliably cross-browser means and then sets up any of the workarounds needed, such as for XML namespaces. You can use the workbench pattern to prepare and test the patterns and techniques for your browser-based XML processing, and you can even use the workbench as the basis of the browser-based app itself.

Listing 1 (quotes.html) is a simple example of HTML using the workbench. It loads some quotations dynamically from an XML file.

Listing 1 (quotes.html). HTML example using the jQuery XML workbench
<html>
        <head>
                <title>jQuery XML workbench</title>
                <script type="text/javascript" src="jquery.js"></script>
                <script type="text/javascript" src="workbench.js"></script>
                <script type="text/javascript" src="quotes.js"></script>
                <!-- Put the XML file or URL in the href attribute below: -->
        <link href="quotes1.xml" type="application/xml" rel="target_XML" />
        </head>
        <body>
        <h1>A few quotations for your enjoyment</h1>
        <div id="update-target"><ol></ol></div>
        </body>
</html>

You need script elements to load jQuery itself, the workbench JavaScript, and your application-specific script. You also need a link element that identifies the XML file to be pulled in using target_XML. If you need to work with more than one XML file, it's pretty easy to extend the workbench setup. Listing 2 (workbench.js) is the workbench script.

Listing 2 (workbench.js). jQuery XML workbench JavaScript
/*
workbench.js
*/
// The jQuery hook invoked once the DOM is fully ready
$(document).ready(function(){ 
        // Get the target XML file contents (Ajax call)
        var fileurl = $("link[rel='target_XML']").attr('href');
    $.ajax({
        url: fileurl,
        type: "GET",
        dataType: "xml",
        complete: xml_ready,
        error: error_func
     });
});

// Callback for when the Ajax call results in an error
function error_func(result) {
    alert(result.responseText);
}

// ns_filter, a jQuery extension for XML namespace queries.
(function($) {
  $.fn.ns_filter = function(namespaceURI, localName) {
    return $(this).filter(function() {
        var domnode = $(this)[0];
        return (domnode.namespaceURI == namespaceURI && domnode.localName == localName);
    });
  };

})(jQuery);

The workbench code is well commented, but here are some additional notes. The namespace filter is the last function in the listing. The first function is the usual jQuery hook invoked once the main page DOM is fully ready. It retrieves a URL for the target XML and makes an Ajax call to load the file. Notice dataType: "xml", which instructs the Ajax machinery to parse the response document as XML. If there is an error, it invokes the error_func callback, otherwise it invokes the xml_ready callback, which the user provides to implement the application behavior. This callback takes the result structure from which you can pluck the XML in the form of the responseXML property. Listing 3 (quotes.js) is the application code for this case.

Listing 3. (quotes.js) Application code for dynamic quotations viewer
/*
quotes.js
*/
function xml_ready(result){
    var xml = result.responseXML;
        //Make sure the target area for inserting data is clear
        $("#update-target ol").empty();
    $(xml).find('*').ns_filter('http://example.com', 'q').each(function(){
        var quote_text = $(this).text()

        $('<li></li>')
            .html(quote_text)
            .appendTo('#update-target ol');
    }); //close each(
}

Listing 4 (quotes1.xml) is the XML file with the list of quotations.

Listing 4. (quotes1.xml) XML file with a list of quotations
<?xml version="1.0" encoding="utf-8"?>
<x:quotes xmlns:x='http://example.com'>
  <x:q>Words have meaning and names have power</x:q>
  <x:q>Sticks and stones will break my bones, but names will never hurt me.</x:q>
  <x:q>The beginning of wisdom is to call things by their right names.</x:q>
  <x:q>Better to see the face than to hear the name. </x:q>
</x:quotes>

Notice that I use the x prefix, which means that, in theory, I might try one of the prefix-based hacks I mentioned above. But, if I did so, it would break if I substituted the quotes file with Listing 5 (quotes2.xml), which is 100% namespace equivalent to Listing 4, and the same Canonical XML (see Resources).

Listing 5. (quotes2.xml) Equivalent XML file to Listing 4, with a list of quotations
<?xml version="1.0" encoding="utf-8"?>
<quotes xmlns='http://example.com'>
  <q>Words have meaning and names have power</q>
  <q>Sticks and stones will break my bones, but names will never hurt me.</q>
  <q>The beginning of wisdom is to call things by their right names.</q>
  <q>Better to see the face than to hear the name. </q>
</quotes>

If you substitute quotes2.xml in Listing 1, you'll find it works just as well, which is a key test for namespace processing. Figure 1 is the browser display of quotes.html.

Figure 1. Quotes displayed using the jQuery XML workbench
Quotes (from Listing 5) displayed using the jQuery XML workbench

Dynamic display of Atom XML

Once you crack XML namespace processing in jQuery you can deal with many more useful XML formats, including Web feed formats such as RSS and Atom. In this section I'll use the jQuery XML workbench to display the latest entries from an Atom feed on a Web page. Listing 6 is the page HTML.

Listing 6. (home.html) Web page that hosts dynamic XML
<html>
        <head>
                <title>jQuery XML workbench</title>
                <script type="text/javascript" src="jquery.js"></script>
                <script type="text/javascript" src="workbench.js"></script>
                <script type="text/javascript" src="home.js"></script>
                <!-- Put the XML file or URL in the href attribute below: -->
        <link href="atom1.xml" type="application/xml" rel="target_XML" />
        </head>
        <body>
        <h1>Caesar's home page</h1>
        <p>GALLIA est omnis divisa in partes tres, quarum unam incolunt Belgae,
    aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli
    appellantur. Hi omnes lingua, institutis, legibus inter se differunt.
    </p>

    <p>Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et Sequana dividit.
    </p>

    <p>Horum omnium fortissimi sunt Belgae, propterea quod a cultu atque
    humanitate provinciae longissime absunt, minimeque ad eos mercatores saepe
    commeant atque ea quae ad effeminandos animos pertinent important,
    proximique sunt Germanis, qui trans Rhenum incolunt, quibuscum continenter
    bellum gerunt. Qua de causa Helvetii quoque reliquos Gallos virtute
    praecedunt, quod fere cotidianis proeliis cum Germanis contendunt, cum aut
    suis finibus eos prohibent aut ipsi in eorum finibus bellum gerunt.</p>

        <h2>My <a href="feed.xml">Web feed</a></h2>
        <div id="update-target"></div>
        </body>
</html>

Listing 7 (atom1.xml) is the referenced Atom file.

Listing 7. (atom1.xml) Sample Atom file
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xml:lang="en"
      xml:base="http://www.example.org">
  <id>http://www.example.org/myfeed</id>
  <title>My Simple Feed</title>
  <updated>2005-07-15T12:00:00Z</updated>
  <link href="/blog" />
  <link rel="self" href="/myfeed" />
  <author><name>Uche Ogbuji</name></author>
  <entry>
    <id>http://www.example.org/entries/1</id>
    <title>A simple blog entry</title>
    <link href="/blog/2005/07/1" />
    <updated>2005-07-14T12:00:00Z</updated>
    <summary>This is a simple blog entry</summary>
  </entry>
  <entry>
    <id>http://www.example.org/entries/2</id>
    <title />
    <link href="/blog/2005/07/2" />
    <updated>2005-07-15T12:00:00Z</updated>
    <summary>This is simple blog entry without a title</summary>
  </entry>
</feed>

Listing 8 is home.js, which contains the dynamic application code mounted on the workbench.

Listing 8. (home.js) application code for the home page Web feed display
/*
home.js
*/
var ATOM_NS = 'http://www.w3.org/2005/Atom';

function xml_ready(result){
    var xml = result.responseXML;
        //Make sure the target area for inserting data is clear
        $("#update-target").empty();
    $(xml).find('*').ns_filter(ATOM_NS, 'entry').each(function(){
        var title_elem = $(this).find('*').ns_filter(ATOM_NS, 'title').clone();
        var link_text = $(this).find('[rel="alternate"]')
                            .ns_filter(ATOM_NS, 'link')
                            .attr('href');
        var summary_elem = $(this).find('*').ns_filter(ATOM_NS, 'summary').clone();

        //Deal with the case of a missing title
        if (!title_elem.text()){
            title_elem = '[No title]';
        }

        //Deal with the case where rel='alternate' is omitted
        if (!link_text){
            link_text = $(this).find('*')
                                .ns_filter(ATOM_NS, 'link')
                                .not('[rel]')
                                .attr('href');
        }

        //Update the target area with the entry information
        $('<p></p>')
            .append(
                $('<a href="' + link_text + '"></a>')
                .append(title_elem)
            )
            .append(' - ')
            .append(summary_elem.clone())
            .fadeIn('slow') //bonus animation
            .appendTo('#update-target');
    }); //close each(
}

Again I commented the file, but some points deserve special emphasis. Atom has many acceptable variations on its elements, many of which are optional. This means you have to do some handling of exceptional cases. I illustrate two such common cases: the optional rel="alternate" on the one required link, and the fact that titles are optional. As you can see, jQuery provides a lot of flexibility for dealing with such cases, so you should be able to deal with even such irregular XML formats. In some cases I copy constructs directly from the XML to the main document (the host HTML). This requires some care, and you'll notice where I use the clone() method to make sure I'm not grafting nodes from one document into another, an error that would be signaled by the DOM exception WRONG_DOCUMENT_ERR. As a bonus, I used the jQuery method fadeIn so that the added content visually fades in slowly. Figure 2 is the browser display of home.html.

Figure 2. Home page with dynamically added Web feed content
Caesar's home page (text from Listing 6) with dynamically added Web feed content

Wrap up

jQuery is all about packaging up all the tricks and workarounds for dealing with Web browser oddities, and the XML workbench I introduce in this article is a first step towards such a reusable tool for those needing to deal with XML. You've seen how one of the biggest problems is dealing with namespaces. Once you get past that hurdle, jQuery gives you the tools to deal with the many sorts of irregular documents so aptly expressed with XML. You'll discover how readily the techniques developed processing Web feeds can be applied to many other XML formats within the browser.

If you find jQuery and attendant workarounds unsuitable, another option is to use a JavaScript library more directly targeted at XML processing, such as Sarissa, which is worth an article of its own, but is not as widely used, nor as easy to deploy as jQuery.


Download

DescriptionNameSize
Code listings for this articlecode.zip6KB

Resources

Learn

Get products and technologies

  • jQuery: Get jQuery and many supporting resources to simplify HTML document traversing, event handling, animating, and Ajax interactions for rapid Web development.
  • jquery.xmlns.js plug-in: Keep an eye on Ryan Kelly's plug-in, which attempts to support CSS 3 XML Namespace Selectors, but, as of the time of this writing, perhaps needs a bit more work.
  • Sarissa: Consider using this JavaScript library targeted squarely at XML processing, including namespace support. It's not as widely used, nor as easy to deploy as jQuery, but an attractive option for XML-heavy applications.
  • IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source, Web development
ArticleID=453121
ArticleTitle=Process XML in the browser using jQuery
publish-date=12082009