Before you start
About this tutorial
As advanced media- and data-rich web applications grow in population within the browser, technologies such as XML and jQuery become important components in their architecture due to their wide adoption and flexibility. In this tutorial, you explore DOM processing within the browser, narrowing the focus to how these paradigms apply to XML in particular and how the jQuery library can speed up development and increase robustness.
The tutorial covers these specific topics:
- Introduction to the DOM
- jQuery and XML
- Case study: LiveXMLEditor
- Your favorite text editor for writing and editing code.
- The jQuery library. You can either download it and serve it locally, or include and serve it directly from the Google CDN.
- A good browser. While most browsers in use today are supported, review the jQuery Browser Compatibility page for recommended browsers. Many UI engineers choose Firefox for development due to its useful plug-ins, of which the most popular is Firebug. Firebug is not required for this tutorial, but it is highly recommended.
- Familiarity with a server-side language (PHP in particular) helps with specific sections, but it is not essential.
See Resources for links to all the tool downloads.
Introduction to the Document Object Model (DOM)
Before you dig into jQuery and XML, let's go over the basics of the concepts that you explore in this tutorial.
Assume that you are working with a very basic HTML document that looks like Listing 1.
Listing 1. A simple HTML document
<!DOCTYPE html> <html> <head> <title>This is the page Title</title> </head> <body class="signed-out"> <div id="header"> <ul id="nav"> <li><a href="/home" class="home">Home</a></li> <li><a href="/about" class="about">About</a></li> <li><a href="/auth" class="auth">Sign In</a></li> </ul> </div> <div id="article"> <h1>A Plain DOM</h1> <h2>An sample <b>HTML</b> document.</h2> <div id="section"></div> </div> </body> </html>
Listing 2. Getting a DOM element by id
getElementById() method is the fastest way to retrieve a
with direct access to that element. As browsers provide no warning for duplicate ids,
and as Microsoft® Internet Explorer® treats the name attribute as an id, avoiding duplicates and watching out for the Internet Explorer collisions is your responsibility. That said, in practice these issues are generally simple to avoid and therefore not a big concern.
A second method of DOM element retrieval to look at is
getElementsByTagName(). This more versatile method is essential for
processing XML because with that format you do not have the luxury of relying on
element id attributes. Look at how
works. To retrieve the contents of the H1 node you might execute the code in Listing 3.
Listing 3. Getting a set of elements by tag name
Note two interesting things here. First,
getElementsByTagName() returns an array (a collection of elements, as
the name implies). Because this example has only a single H1 element, you can retrieve it at index 0. That's almost never a safe assumption to make, though, because the element might not exist and the code above might throw an error. Instead, always check that the element exists before you attempt to access its properties (or methods).
Second, you've likely noticed the
innerHTML property. As the name implies, this property provides access to the contents of an element, which in the case above is just a string. Had the H1 element contained other elements (tags), its value would contain those as well as a part of the string (see Listing 4).
Listing 4. Using innerHTML to retrieve value with HTML elements
In addition to innerHTML, browsers also provide a property for retrieving only the text
contents of an element. However, this property is named
innerText in Internet Explorer and
textContent in other browsers. To use it safely across all browsers
you might do something similar to Listing 5.
Listing 5. Using
textContent properties across different browsers
headerText variable gets the value of
textContent if it exists, and the value of
innerText otherwise. A more sensible way to treat this task is to create a cross-browser function that does that, but as you see later, jQuery already provides one.
The HTML page also has a Sign In link. Suppose that the user has logged in using a
separate process, and you want to reflect that in the navigation by changing the Sign
In label to Sign Out. In the previous example, you retrieved the text value of the node using
Listing 6. Updating the innerHTML value of a DOM node
In addition to updating the values of existing nodes, you can create completely new DOM elements and append them to the DOM using an existing element (see Listing 7).
Listing 7. Creating and injecting a new DOM node
getElementsByTagName() are often executed on the document object, you
can also (more efficiently, in fact) execute them on any other element and reduce the
retrieval scope to the current element's children. This approach obviously assumes
that elements you are accessing are children of the element that the methods are
called on. Keep this notion of context in mind as it comes up later when you look at processing XML with jQuery.
Removing elements from the DOM is also trivial. To remove a node, first retrieve it,
then remove it by referencing it through its parent (see Listing
8). (See more on
parentNode and related properties below.)
Listing 8. Removing an element from the DOM
Last, let's go over attribute assigning and removing, using
getAttribute() (see Listing 9).
Listing 9. Setting and removing an element attribute
parentNode in the previous listing. Given an element, you can navigate from it to surrounding elements within the DOM using these basic references (see Listing 10).
firstChild lastChild nextSibling parentNode previousSibling
See Resources for a link to a complete node property listing.
A DOM representation of these elements, in reference to a given <node>, looks like Listing 11.
Listing 11. Relationship of related DOM elements
<parent_node> <previous_sibling/> <node> <first_child/> ... <last_child/> </node> <next_sibling/> </parent_node>
Last, consider the tree representation of this relationship as shown in Figure 1. First is the parentNode, which includes the node plus its previousSibling and nextSibling elements. The node can contain one or more child nodes (with firstChild and lastChild elements)
Figure 1. Tree representation of adjacent nodes
By now, you should have a good understanding of the basics of DOM traversal and manipulation as they apply to HTML documents. In the next section, you'll look at how this applies to XML documents.
XML node types
Before digging into processing XML, let's go over the different XML node types and their named constants. While this is an easy topic to ignore when dealing with HTML, it is crucial when processing XML due to that format's extensible, and therefore, unpredictable structure. It is precisely this difference that requires the custom methods that I cover here for the XML processor.
Here are the 12 different XML node types:
property and check its type. The function in Listing 13 returns
true if the passed node is a comment node and false otherwise. Although this function
has no jQuery dependencies you'll explore it further when you look at parsing XML node values.
I do not go into details about each of the node types in this tutorial, but being familiar with the node types is essential for handling nodes and their values accordingly.
Assume that you are working with this simple XML file in Listing 14.
Listing 14. A simple XML file
<?xml version="1.0" encoding="UTF-8" ?> <item content_id="1" date_published="2010-05-25"> <description></description> <body></body> <related_items> <related_item content_id="2"></related_item> <related_item content_id="3"></related_item> </related_items> </item>
- Server-side rendering of XML into a textarea element
- Loading XML into the browser through Ajax
The detailed steps for each option are:
- Server-side rendering of XML into a textarea element
A slightly different approach consists of loading the XML into a <textarea> field (which does not need to be visible). Then, using the
You can output the PHP variable (
$xmlFile) defined here to an HTML textarea field with an id for easy reference:
Let's also look at the function for reversing this process. Given an XML DOM object, the function in Listing 18 returns a string.
- Loading XML into the browser through Ajax
Look more closely at this code. The method
getElementsByTagName(), which you saw before, is essential for
processing XML because it allows you to select all XML elements of a given name.
(Again, keep in mind that when you process XML it is case sensitive.) You then safely
retrieve the description value by first checking if the
descriptionNode has a firstChild. If so, you go on to access its
nodeValue. When you try to access a specific node's text
value, things start to get a little tricky. Although some browsers support the
innerHTML property for XML documents,
most do not. You first have to check whether it has a
textNode, comment or child
node) and if it does, retrieve that
nodeValue. If the value doesn't exist, you set it to an empty string. (You can ignore empty values and only store actual values, but for the purposes of this example let's maintain the number of items and keep the indexes in sync.)
Last, you see that
getAttribute() methods work as they did with an HTML file.
jQuery and XML
Likely the main reasons for jQuery's huge popularity are its fast and simple traversal engine and its slick selector syntax. (Excellent documentation also really helps.) And although its primary use is HTML processing, in this section you explore how it works and how to apply it to processing XML files as well.
DOM manipulation and traversal with jQuery
To access any of jQuery's features you first need to make sure that the file jquery.js
is included on the page. Having done that, you simply call
jQuery() or the shorthand version
pass it a selector as the first argument. A selector is usually a string that
specifies an element or a collection of elements if more than an element matches the
given selector. Listing 20 shows some basic jQuery selectors.
Listing 20. Basic jQuery selectors
Listing 21. Basic jQuery operation with chained method calls
This code selects all images, sets padding and border on each of them, then wraps each
in a DIV with class
img-wrap. As you can tell, that's quite a bit of cross-browser
functionality reduced to just a single line of code. For thorough information on
jQuery selectors and methods, check out the excellent documentation on the jQuery
website (see Resources).
Listing 22 shows how jQuery simplifies examples from the previous section.
Listing 22. Creating and injecting a DOM node with jQuery
Processing XML with jQuery
I mentioned that the first argument passed to the
function is the string selector. The less common second argument allows you to set the
context, or starting node for jQuery, to use as a root when making the selection. By
default, jQuery uses the document element as the context, but optimizing code is possible by restricting the context to a more specific (and therefore smaller) subset of the document. To process XML, you want to set the context to the root XML document (see Listing 23).
Listing 23. Retrieving values from an XML document with jQuery
That code cleans things up quite a bit. By passing the node name to the core
jQuery $() function and setting the context,
xmlData, you quickly get access to the node set you want. Getting the value of the node, though, is something that needs some exploration.
innerHTML property does not work for non-HTML documents, you cannot rely on jQuery's
html() method to retrieve the contents of a node. jQuery also provides a method for cross-browser retrieval of the text of an HTML node. The
text() method, as mentioned earlier, is a cross-browser wrapper for the
innerText property, but even it behaves inconsistently across browsers when processing XML. Internet Explorer, for example, ignores what it considers the empty node values (spaces, tabs, breaks) as the contents of a node. This approach might seem more intuitive than Firefox's handling of the same, which interprets the
related_nodes element from the sample XML file as a set of text nodes along with the
related_items nodes. To get around this inconsistency, create custom methods for treating text nodes consistently. In doing so (see Listing 24) you make use of a few handy jQuery methods:
Now look at how to set the node value (see Listing 25). Two things to keep in mind are that this operation is potentially destructive, as setting the text value of the root node overwrites all of its children. Also note that if a specific node has no prior text value, instead of setting it using
node.textContent, set it with
node["textContent"] because Internet Explorer doesn't like the first method (the property doesn't exist when blank).
DOM attributes and jQuery
Listing 26. Getting and setting DOM element attributes with jQuery
As you can see, jQuery's
attr() method supports both the
retrieval and setting of attributes. More importantly, jQuery provides excellent
access to element retrieval by allowing attributes in selectors. In the example above,
you selected the item with
content_id attribute set to
1, from the
Loading XML through Ajax with jQuery
As you probably already know, Ajax is a web technology for asynchronous retrieval of
XMLHttpRequest (XHR) API to send a request to and receive a response
from the server. In addition to providing excellent DOM traversal and manipulation methods, jQuery also offers thorough, cross-browser Ajax support. That said, the loading of XML through Ajax is as native as Ajax gets, so you're on familiar ground. The way this works in jQuery is shown in Listing 27.
Listing 27. Loading an external XML file with jQuery's Ajax method
$.ajax() method has a number of additional options and can also be called indirectly through shortcut methods such as
$.getJSON(), which loads a JSON data file and makes it available to the success script, and so on. When requesting a file of type XML, though, you're stuck with the core
$.ajax() method that has the advantage of forcing you to know only its syntax for any circumstance. In the example above, you simply request file /path/to/data.xml, specifying that the
dataType is "xml" and that the request method is
GET. After the browser receives a response from the server, it triggers either the success or the error callback function accordingly. In this example, a success callback alerts the total number of nodes. jQuery's star selector (*) matches all nodes. The key point is to note that the success callback function receives the data from the server as the first argument. The name of the variable is up to you, and as described earlier, that value becomes the context passed to any jQuery call intended to process the XML.
An important thing to keep in mind when processing Ajax in general is the cross-domain restriction, which prevents retrieval of files from different domains. The previously covered methods of server-side XML retrieval might be viable alternatives in your application.
Processing external XHTML as XML
Because XHTML is a subset of valid XML, there's no reason why you can't process it the same way you process XML. Why exactly you would want to is a separate topic, but the point is that you could. For instance, scrapping a (valid) XHTML page and extracting data from it is perfectly doable using this technique, even though I encourage a more robust approach.
While primarily intended for HTML DOM traversal and manipulation, jQuery can also be used for processing XML as well, though it requires the additional step of a getting the file to the browser. The topics covered in this section explain the different methods and provide the methods essential for processing the XML effectively.
Case Study: Live XML Edit
In this section, you apply what you learned to create a browser-based XML editor.
Live XML Edit
While I don't recommend editing XML by hand, I can think of too many cases where precisely that approach was taken. So, in part as an academic exercise and in part as a useful tool, I set out to build a browser-based XML editor. A primary goal was to process the XML directly, rather than convert it to a different format such as JSON, make the updates, then transform back to XML. Making the edits live ensures that the only affected parts of the file are those that were actually edited, which means less room for error and faster processing. The techniques covered in this tutorial were essential in putting this together. Take a closer look at how they applied.
Figure 2 shows Live XML Edit.
Figure 2. Live XML Edit
Uploading and loading XML through Ajax
LiveXMLEdit uses Ajax to get the XML into the page. The user is required to upload the
XML file he wants to edit, which is then saved on the server, and brought in using
$.ajax() described in Listing 27. A reference to the original XML object is saved and edited directly. This approach means that after the user is finished editing the file no transformations are necessary as the updated DOM already exists.
Rendering collapsible and editable HTML tree representation of the XML
traverseDOM method (see Listing 11), and each
of the nodes along with their attributes is rendered as nested unordered lists (UL).
jQuery is then used to assign handlers for expanding and collapsing of elements, which
simplifies the display and editing of larger documents. Also rendered are action
buttons that provide editing features.
Adding methods to handle (and store) live edits
Along with rendering buttons for editing and deleting nodes and making updates for edited fields, the success handler of the Ajax call that loads the XML also assigns handlers for processing various user interactions and events. jQuery provides different means of assigning handlers, but for unpredictably large DOMs by far the most efficient is the
$.live() method or its younger (and even more performant) sibling,
$.delegate(). Rather than catch events at the target element, these methods handle events at the document or the specified element, respectively. This approach has a number of benefits—faster binding and unbinding, and support for existing as well as future nodes that match the selector (key in this case because users can create new XML nodes that should behave just like existing ones.)
Server-side script to save the updated file
Although server-side processing is beyond the scope of this article, it is necessary for saving the edited file. For the code sample, check out the entire application on GitHub (see Resources), but as far as the browser processing is concerned, simply convert the updated XML DOM to a string and post it to a server script. The script itself retrieves the post and saves it as a file.
|Tutorial source code||ProcessingXMLwithjQuery-sourceCode.zip||10KB|
- DOM objects and methods tutorial (Mark "Tarquin" Wilton-Jones, January 2009): Find a comprehensive listing of all properties, collections, and methods of the W3C DOM.
- The Mozilla Developer Center: Visit a great resource for web developers.
- element.getElementsByTagName: On this page, find a thorough overview of the getElementsByTagName covered in this tutorial.
- LiveXMLEditor: View more information about LiveXMLEditor.
- Process XML in the browser using jQuery (Uche Ogbuji, developerWorks, December 2009): Find key information for processing namespaced XML and navigate some major pitfalls to gain the benefits of the popular Web application API.
- Understanding DOM (Nicholas Chase, developerWorks, March 2007): In this tutorial, learn about the structure of a DOM document.
- XML area on developerWorks: Get the resources you need to advance your skills in the XML arena.
- My developerWorks: Personalize your developerWorks experience.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks. Also, read more XML tips.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- jQuery Browser Compatibility: Visit this page for a list of recommended browsers.
- Firebug: Get the essential debugging tool for Firefox users.
- LiveXMLEditor: Try the XML editor created by the author of this tutorial.
- PHP: Hypertext Preprocessor: Get the widely-used scripting language that is well suited for web development and can be embedded into HTML. This tutorial uses PHP 5.2 or higher.
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into XML on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.