XMLStarlet is an open source XML toolkit that you can use on your UNIX®, Mac OS® X, or Microsoft® Windows® command line. You can use XMLStarlet to validate XML, to format it, to select portions of it, to transform it with XSLT, even to make edits. This means you can put XML utilities into your shell scripts without writing any custom code in a programming language like Perl or Java®.
To get started with XMLStarlet, you need to install it. But to do so, you need the libxml2 and libxslt2 libraries. On Windows, you don't need to install libxml2 and libxslt2 -- they come with the Win32 package. You can download the Win32 executable and install it somewhere on your path so it's easily executable from the command line. If you're running UNIX, and your machine doesn't already have libxml2 and libxslt2, then you must download and install them (see Resources).
Next, surf over to the XMLStarlet home page, and download the latest build (see Resources). Run the ./configure script to set up the build scripts. Then, run make install to build the package and install it. If you aren’t the super user, you should use sudo make install so the commands are installed in the system directories.
You might also want to check out the XML, XSLT, and XML Path Language (XPath) pages to keep up with these three standards; they're critical to making the most of XMLStarlet (see Resources).
Now that it's installed, you can navigate around XMLStarlet. Start by running the xml command on its own (see Listing 1).
Listing 1. The XMLStarlet help page
% xml XMLStarlet Toolkit: command-line utilities for XML Usage: xml [<options>] <command> [<cmd-options>] where <command> is one of: ed (or edit) - Edit/Update XML document(s) sel (or select) - Select data or query XML document(s) (XPATH, etc) tr (or transform) - Transform XML document(s) using XSLT val (or validate) - Validate XML document(s) (well-formed/DTD/XSD/RelaxNG) fo (or format) - Format XML document(s) el (or elements) - Display element structure of XML document c14n (or canonic) - XML canonicalization ls (or list) - List directory as XML esc (or escape) - Escape special XML characters unesc (or unescape) - Unescape special XML characters pyx (or xmln) - Convert XML into PYX format (based on ESIS - ISO 8879) p2x (or depyx) - Convert PYX into XML <options> are: --version - show version --help - show help Wherever file name mentioned in command help it is assumed that URL can be used instead as well. Type: xml <command> --help <ENTER> for command help XMLStarlet is a command line toolkit to query/edit/check/transform XML documents (for more information see http://xmlstar.sourceforge.net/) |
The basic format of each command is xml <command> followed by some options. Getting help for each of the options is as easy as xml <command> --help. For example, Listing 2 shows the help for the edit (ed) command.
Listing 2. Help for the edit command
% xml ed --help
XMLStarlet Toolkit: Edit XML document(s)
Usage: xml ed <global-options> {<action>} [ <xml-file-or-uri> ... ]
where
<global-options> - global options for editing
<xml-file-or-uri> - input XML document file name/uri (stdin otherwise)
<global-options> are:
-P (or --pf) - preserve original formatting
-S (or --ps) - preserve non-significant spaces
-O (or --omit-decl) - omit XML declaration (<?xml ...?>)
-N <name>=<value> - predefine namespaces (name without 'xmlns:')
ex: xsql=urn:oracle-xsql
Multiple -N options are allowed.
-N options must be last global options.
--help or -h - display help
where <action>
-d or --delete <xpath>
-i or --insert <xpath> -t (--type) elem|text|attr -n <name> -v (--value) <value>
-a or --append <xpath> -t (--type) elem|text|attr -n <name> -v (--value) <value>
-s or --subnode <xpath> -t (--type) elem|text|attr -n <name> -v (--value) <value>
-m or --move <xpath1> <xpath2>
-r or --rename <xpath1> -v <new-name>
-u or --update <xpath> -v (--value) <value>
-x (--expr) <xpath> (-x is not implemented yet)
XMLStarlet is a command line toolkit to query/edit/check/transform
XML documents (for more information see http://xmlstar.sourceforge.net/)
|
This help display looks complicated, but the good stuff is at the bottom; there, you learn about deleting XML nodes, inserting them, changing their value, and more.
To begin playing with XMLStarlet, you need XML. That brings you to your first command, xml ls, which gives a listing of the current directory in XML. Listing 3 shows an example.
Listing 3. A directory listing in XML
% xml ls
<xml>
<d p="rwxr-xr-x" a="2005.05.04 23:03:46"
» m="2004.03.24 16:21:02" s="374" n="."/>
<d p="rwxr-xr-x" a="2005.05.04 23:03:46"
» m="2005.05.04 22:13:41" s="1938"n=".."/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 01:13:43" s="6148"n=".DS_Store"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:41:46" s="173" n="build.xml"/>
<d p="rwxr-xr-x" a="2005.04.30 11:34:27"
» m="2004.03.24 01:13:43" s="544" n="docs"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.21 18:41:58" s="641" n="input.xml"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.23 23:41:15" s="3587"n="main.xsl"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:37:10" s="184" n="Makefile"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:36:41" s="3869"n="MyGenerator.class"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:36:33" s="5265"n="MyGenerator.java"/>
<d p="rwxr-xr-x" a="2005.04.30 11:34:25"
» m="2004.03.24 00:20:07" s="272" n="output"/>
</xml>
|
You may think this directory listing displays too much information. If so, you can (for example) remove the directory nodes, as shown in Listing 4.
Listing 4. Directory listing without directory nodes
% xml ls | xml ed -d "//d"
<?xml version="1.0"?>
<xml>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 01:13:43" s="6148" n=".DS_Store"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:41:46" s="173" n="build.xml"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.21 18:41:58" s="641" n="input.xml"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.23 23:41:15" s="3587" n="main.xsl"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:37:10" s="184" n="Makefile"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:36:41" s="3869" n="MyGenerator.class"/>
<f p="rw-r--r--" a="2005.03.24 17:53:52"
» m="2004.03.24 00:36:33" s="5265" n="MyGenerator.java"/>
</xml>
|
You use the edit command (ed) to remove the d nodes from the XML. The ls command outputs the directory to the standard output. The pipe (|) then redirects the standard output to the standard input of the edit command, which removes the d nodes from the listing. You specify the d nodes using the XPath expression //d, which matches a d node at any level in the tree. You can make this command more specific by using /xml/d.
Now, suppose you want to remove the a and m attributes (see Listing 5).
Listing 5. Directory listing without a and m attributes
% xml ls | xml ed -d "//d" -d "//@a" -d "//@m" -d "//@p"
<?xml version="1.0"?>
<xml>
<f s="6148" n=".DS_Store"/>
<f s="173" n="build.xml"/>
<f s="641" n="input.xml"/>
<f s="3587" n="main.xsl"/>
<f s="184" n="Makefile"/>
<f s="3869" n="MyGenerator.class"/>
<f s="5265" n="MyGenerator.java"/>
</xml>
|
That’s more workable. Your listing is down to just files, and within the file nodes you see only the size and name of the file. To make the display easier to follow, you can put the result into a file called ls.xml. You can also use the rename edit function to change the f tag to file (see Listing 6).
Listing 6. Directory listing with size and name attributes
% cat ls.xml | xml ed -r "//f" -v "file"
<?xml version="1.0"?>
<xml>
<file s="6148" n=".DS_Store"/>
<file s="173" n="build.xml"/>
<file s="641" n="input.xml"/>
<file s="3587" n="main.xsl"/>
<file s="184" n="Makefile"/>
<file s="3869" n="MyGenerator.class"/>
<file s="5265" n="MyGenerator.java"/>
</xml>
|
In addition, instead of using short names for tags and attributes like s and n, you can change them to size and name, respectively (see Listing 7).
Listing 7. Directory listing with file tags
% cat ls.xml | xml ed -r "//f" -v "file" -r "//@s" -v "size" -r "//@n" -v "name"
<?xml version="1.0"?>
<xml>
<file size="6148" name=".DS_Store"/>
<file size="173" name="build.xml"/>
<file size="641" name="input.xml"/>
<file size="3587" name="main.xsl"/>
<file size="184" name="Makefile"/>
<file size="3869" name="MyGenerator.class"/>
<file size="5265" name="MyGenerator.java"/>
</xml>
|
That’s easy to read. And you haven’t written one line of XSLT, Perl, or Java code. Save this file as ls2.xml.
The new directory listing is nice, but is it still valid? Listing 8 shows you how to determine that.
Listing 8. Checking the XML's well-formedness
% xml val ls2.xml ls2.xml - valid |
Yep, it's valid. In the sense that it is well formed -- which means the tags are balanced, the characters are encoded properly, and that sort of thing. But it still may not have all of the required tags, or the right tags. To determine that, you need to know the proper structure of the file. So, you need a schema. Only when you have checked the XML document against a schema and found that it passes will it truly be valid.
Listing 9 shows a basic RELAX NG schema for the XML directory listing file.
Listing 9. The RELAX NG schema
<?xml version="1.0" encoding="UTF-8"?>
<grammar ns="" xmlns=http://relaxng.org/ns/structure/1.0
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<element name="xml">
<oneOrMore>
<element name="file">
<attribute name="name">
<data type="NMTOKEN"/>
</attribute>
<attribute name="size">
<data type="integer"/>
</attribute>
</element>
</oneOrMore>
</element>
</start>
</grammar>
|
RELAX NG is easy to read. At the top, the element tag defines the name xml as the base tag. Then, the oneOrMore tags inside the xml tags are named file and size.
Is the ls2.xml file valid against this new schema? See Listing 10.
Listing 10. Checking against the schema
% xml val -e -r ls.rng ls2.xml ls2.xml - valid |
If you're like me, you aren't satisfied until you see it fail. So, add an attribute named someAttribute to one of the file items in a file called ls3.xml, and run it again (see Listing 11).
Listing 11. Checking a bad file against the schema
% xml val -e -r ls.rng ls3.xml ls3.xml:4: element file: Relax-NG validity error : » Invalid attribute someAttribute for element file ls3.xml - invalid |
As it turns out, it fails. Not only do you know the file is well formed, but you also know it has all the right tags and attributes.
You can also play with the selection functions, which let you extract elements of the data from the XML. The example in Listing 12 extracts the file names from the XML directory listing as plain text.
Listing 12. Extracting the file names
% xml sel -t -m "/xml/file" -v "concat(@name,' ')" ls2.xml .DS_Store build.xml input.xml main.xsl Makefile MyGenerator.class MyGenerator.java |
Look at two things here. First, the XPath to get to the file names is the /xml/file specification. Second, the output specification using the -v option concatenates the name attribute on the file tag with a carriage return.
Now you can add the -s option to sort the files by the size attribute (see Listing 13). The A:N:- syntax tells XMLStarlet to use an ascending numerical sort. (This code adds the size parameter to the concat statement to make sure it's working.)
Listing 13. Sorting the list
% xml sel -t -m "/xml/file" -s A:N:- "@size" -v "concat » ( @name,':',@size,' ' ) " ls2.xml build.xml:173 Makefile:184 input.xml:641 main.xsl:3587 MyGenerator.class:3869 MyGenerator.java:5265 .DS_Store:6148 |
To have some fun with the xml command, you can use it to parse a traffic report. Yahoo!® Maps provides a traffic service. You can use the curl command with the -g option (for GET) to download the latest traffic information through RSS. For example, in Listing 14, I specify my zip code by adding the ?csz=94101 argument, and the result is the latest San Francisco traffic report.
Listing 14. San Francisco traffic as RSS
% curl -g "http://maps.yahoo.com/traffic.rss?csz=94101" –s <?xml version="1.0" encoding="ISO-8859-1" ?> <rss version="2.0"> <channel> <title>Yahoo! Maps Traffic -- San Francisco, CA 94101</title> <link>http://us.rd.yahoo.com/maps/mapresults/trfrssarea/* » http://maps.yahoo.com/maps_result?csz= » San+Francisco%2C++CA+94101&country= » us&lat=37.775&lon= » -122.4183&trf=1&mag=5</link> <category>Traffic</category> <description>Yahoo! Maps Traffic -- » San Francisco, CA 94101</description> <language>en-us</language> <ttl>3</ttl> <lastBuildDate>Fri, 06 May 2005 16:33:59 -0700< » /lastBuildDate> <pubDate>Fri, 06 May 2005 18:31:27 CDT< » /pubDate> <copyright>Copyright (c) 2005 Yahoo! Inc. » All rights reserved.</copyright> <item> <title> Incident, On I-580 At Seminary Ave </title> <description> Traffic Collision, Severity: Major, Started: 04:20pm 05/06/05, » Estimated End: 04:50pm 05/06/05, » Last Updated: 04:25pm 05/06/05 </description> <link>http://us.rd.yahoo.com/maps/mapresults/trfrssitem/* » http://maps.yahoo.com/maps_result?csz= » San+Francisco%2C++CA+94101&mlt= » 37.778234&mln=-122.168438&lat= » 37.775&lon=-122.4183&trf= » 1&exctrf=1&mag=4</link> <pubDate>Fri, 06 May 2005 16:20:00 -0700</pubDate> <category>Incident </category> <severity>Major</severity> <endDate>Fri, 06 May 2005 16:50:00 -0700</endDate> <updatedDate>Fri, 06 May 2005 16:25:00 -0700< » /updatedDate> </item> ... |
Now you can pipe the output of the curl command through the XMLStarlet command to get just the descriptions (see Listing 15).
Listing 15. Traffic RSS piped through XMLStarlet
% curl -g "http://maps.yahoo.com/traffic.rss?csz=94101" » -s | xml sel -t -m "/rss/channel/item/description" -v "." Traffic Collision, Severity: Major, Started: 04:20pm 05/06/05, » Estimated End: 04:50pm 05/06/05, » Last Updated: 04:25pm 05/06/05 Disabled Vehicle, Severity: Moderate, Started: 04:20pm 05/06/05, » Estimated End: 04:50pm 05/06/05, » Last Updated: 04:25pm 05/06/05 Disabled Vehicle, Severity: Moderate, Started: 04:19pm 05/06/05, » Estimated End: 04:49pm 05/06/05, » Last Updated: 04:25pm 05/06/05 Pedestrian On The Roadway, Severity: Critical, » Started: 04:17pm 05/06/05, » Estimated End: 04:47pm 05/06/05, » Last Updated: 04:25pm 05/06/05 Traffic Collision, Severity: Major, Started: 04:15pm 05/06/05, » Estimated End: 04:45pm 05/06/05, » Last Updated: 04:25pm 05/06/05 ... |
The -m option picks out the description of each item. Then, using the -v option, you can output only the text of the node by specifying a period (.).
This article barely scratches the surface of this very powerful XML tool. If you have time, check out XMLStarlet's XSLT transformation functions, the handy escaping and unescaping functions, the XML formatting functions, and more.
- Visit the XMLStarlet home page.
- Check out the XML home page at the W3C site. Doug Tidwell's intro to XML tutorial is another great place to get started (developerWorks, August 2002).
- Read about the Saxon XSLT processor. Michael Kay, who developed Saxon, has also written two popular articles for developerWorks: "Saxon: Anatomy of an XSLT processor" and "What kind of language is XSLT?" (April 2005).
- Obtain the libxslt library and the libxml2 library.
- Visit the W3C page to learn about the XSL family.
- Go to the W3C's XPath page. Bertrand Portier's developerWorks tutorial "Get started with XPath" is also worth a read (May 2004).
- Find hundreds more XML resources on the
developerWorks XML zone.
- Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
An engineer with more than 20 years of experience, Jack Herrington is currently editor in chief of the Code Generation Network. He is the author of Code Generation in Action. You can contact him at jack_d_herrington@codegeneration.net.





