Reading fields from Atom
Atom 1.0 is now Internet Engineering Task Force (IETF) RFC 4287. It's very well defined, so you can be very confident in the structure of Atom documents, and your ability to extract information from them.
As with all Atom documents it is a good idea to include the XML declaration, and to specify the encoding. This declaration is transparent to XSLT, and you will be dealing with constructs based on Unicode and the XPath data model. All Atom 1.0 elements are in the namespace
http://www.w3.org/2005/Atom. Throughout this tutorial, in situations where a prefix is required for this namespace (for example, in XPath expressions) I'll use
a as the prefix. In many of the code examples, the Atom 1.0 namespace will actually be set up as the default.
Listing 1 (atom-basic.xml) is a complete Atom 1.0 document example.
Listing 1. Atom 1.0 complete example (atom-basic.xml)
<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en" xml:base="http://www.example.org"> <id>http://www.example.org/myfeed</id> <title>My Simple Feed</title> <updated>2005-07-15T12:00:00Z</updated> <link href="/blog" /> <link rel="self" href="/myfeed" /> <author><name>Uche Ogbuji</name></author> <entry> <id>http://www.example.org/entries/1</id> <title>A simple blog entry</title> <link href="/blog/2005/07/1" /> <updated>2005-07-14T12:00:00Z</updated> <summary>This is a simple blog entry</summary> </entry> <entry> <id>http://www.example.org/entries/2</id> <title /> <link href="/blog/2005/07/2" /> <updated>2005-07-15T12:00:00Z</updated> <summary>This is simple blog entry without a title</summary> </entry> </feed>
Every Atom document contains a top-level
feed element and zero or more
entry elements. Every
entry element must include these three elements:
id-- a globally unique identifier
title-- can be empty, although you are strongly encouraged to always provide title content
updated-- a time stamp of the last time the entry or feed was updated
id element's content is simple text. You can access the ID of the entire feed using the XPath expression
/a:feed/a:id. You can get all the entry IDs using
title element's content follows Atom's careful conventions for brief, structured text. I'll cover such fields in more detail in the next section. For now just pay attention to the string value of the element, which can be accessed for the entire feed at
updated element's content is a full ISO-8601 date/time point. You can treat this value as simple text for some purposes. See Resources for an article where you can learn more advanced date processing techniques in XSLT. The updated date is accessed for the entire feed at
An Atom feed will almost always have at least one
link element, with a
rel attribute value of
self. This link provides the authoritative URI for the feed itself. Access the feed's self-link using the XPath
/a:feed/a:link[@rel='self']. In Listing 1 the feed's self-link has the value
/myfeed. This is a relative URL, and needs to be turned into an absolute URL for proper processing. Here's where the
xml:base attribute comes in. Processing software should resolve the relative value against the base (
http://www.example.org) to get the absolute URL
http://www.example.org/myfeed. You cannot do this using XPath 1.0 (although XPath 2.0 does include a function
resolve-uri for this purpose). In some cases, this is OK because you can use the URI base processing in the destination system of your processing. If you are generating HTML, for example, you can keep the URIs relative and just make sure the base URI is expressed in the HTML, as demonstrated in a later section.
Access other links based on the relevant link attributes. Entries usually have a main link (to the Web representation of the entry), and you can usually access this using
Note: URIs in Atom are actually specified as Internationalized Resource Identifiers (IRIs) -- IRIs are defined in RFC 3987 (see Resources). However, in this tutorial I shall stick to the term URI, which is probably more familiar to most readers.
Atom enforces attribution. To be specific, at least one
author element is required for the feed unless all the entries have at least one such element. This may seem a harsh restriction, but it's actually good practice considering how much slicing, dicing, and information re-purposing goes on in the world of episodic Web content. Atom helps ensure that the chain of authorship is maintained. A feed or entry can also have one or more
contributor elements. Authors and contributors are expressed in Atom as structured elements with a mandatory
name child (plain text), an optional
uri child (a URI), and an optional
One hot topic you might have heard of is tagging. This is the practice of associating metadata in the form of tags with episodic Web site entries. Tags are simple words that express a category or relevant subject matter for an item. People can then use tags to form streams of related information, so that you could, for example, combine Weblog entries, scheduled events, photos, audio clips, and more from a public convention. Atom supports such tags in a somewhat enriched form (thank goodness). A feed or entry can have a
category element that contains a mandatory
term element and other optional metadata.
I will cover some of the other Atom elements in later sections; I don't cover elements that are less commonly used, and that can generally be processed using techniques presented in this tutorial.
You now know enough about accessing Atom feed elements to make sense of your first XSLT example. Listing 2 (tickertape.xslt.xml) is a transform that summarizes an Atom feed as a simple listing in plain text. You can think of it as a human-readable ticker tape for the feed.
Listing 2. Ticker tape, a simple XSLT file that summarizes key fields in an Atom feed (tickertape.xslt.xml)
<?xml version="1.0" encoding="utf-8"?> <xsl:transform version="1.0" xmlns:a="http://www.w3.org/2005/Atom" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:strip-space elements="*"/> <xsl:output method="text"/> <xsl:template match="*"/> <xsl:template match="a:feed"> <xsl:text>Atom Feed: </xsl:text><xsl:value-of select="a:id"/> <xsl:text> </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="a:entry"> <xsl:text> ---------------------------------------- </xsl:text> <xsl:text> Feed entry: </xsl:text><xsl:value-of select="a:id"/> <xsl:text> </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="a:title|a:updated"> <xsl:if test="parent::a:entry"> <xsl:value-of select="' '"/> </xsl:if> <xsl:value-of select="local-name()"/>:<xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> </xsl:transform>
The first thing you might notice is that I mostly used the push style of XSLT, with heavy use of templates. As such, you won't find many of the XPath expressions I've presented so far. Those expressions are still useful in understanding the node structure that is addressed with the XSLT. Atom's re-use of metadata element names for the feed and entries allows me to combine treatment of these into the final template in the listing. Notice my use of
(the character entity for new line) to control the output text formatting.
To apply the transform and view the results in a browser, add an XSLT stylesheet processing instruction (PI). atom-basic-1.xml (see Download) is the same as Listing 1 (atom-basic.xml) except that I added such a PI. The first three lines of this version are as follows:
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xml" href="tickertape.xslt.xml"?> <feed xmlns="http://www.w3.org/2005/Atom"
I use the extension xslt.xml for Listing 2 because Mozilla-based browsers sometimes have trouble guessing an XML media type for files retrieved from the local disk using the .xsl or .xslt extensions. This results in an error when trying to load the stylesheet. I have found that the most reliable way to avoid such problems is to be sure that the XSLT file ends in .xml. Loading atom-basic-1.xml in a browser, I get the display in Figure 1.
Figure 1. Browser output of Listing 2 applied to Listing 1
You can also run this transform using command-line tools. With 4Suite's 4xslt command line tool, for example, you can either specify the plain source file (atom-basic.xml) and the XSLT file (tickertape.xslt.xml), or you can just use the source file with the stylesheet PI (atom-basic-1.xml). Figure 2 shows the console output from both approaches.
Figure 2. Command line session applying Listing 2 to Listing 1
When following along with the examples, you need to decide whether to use a command-line processor or your Web browser. You can also use other tools such as an XSLT IDE, if that is more convenient for you.