Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Convert Atom documents to JSON

Avoid loss of important data and context when you to convert from Atom to JSON

James Snell (jasnell@us.ibm.com), Software Engineer, Emerging Technologies, IBM
James Snell
James M. Snell is a software engineer in IBM's WebAhead group focusing on the development and practical application of key emerging technologies for IBM's own use. James has participated in the effort to define the Atom Syndication Format and Atom Publishing Protocol standards and has implemented no less than nine different Atom Publishing Protocol server implementations.

Summary:  Converting an Atom document to JSON might, at first, appear to be a fairly straightforward task. Atom is, after all, just a bit of XML and XML-to-JSON conversion tools are widely available. However, the Atom format is more than just a set of XML elements and attributes. A number of subtle details can make proper handling of Atom difficult. This article describes those issues and demonstrates a mechanism implemented by the Apache Abdera project to convert Atom documents into JSON and produces a result that is readable, usable, and complete.

Date:  08 Jan 2008
Level:  Intermediate
Also available in:   Korean  Russian

Activity:  1441 views
Comments:  

When you convert an Atom document into JSON, you take a great deal of information-rich content and serialize it into a drastically simplified form. You can more easily use the simpler form of that content in environments where parsing and properly processing the XML is prohibitively difficult. The main challenge to this process is to ensure that important data and context is not lost in the translation. The sections that follow discuss each of the key areas of concern when you convert an Atom document. I assume that the reader has at least a basic understanding of Atom and JSON. If you're not familiar with either, please take the time to look over the resources listed at the end of this article before you continue on.

Language metadata

Atom documents contain a mixture of information that is intended to be consumed by both humans and machines. The proper presentation of human readable content such as the titles of entries, category labels, link titles, the description of rights, and so on, is dependent on a language context that must be preserved properly during the transformation to JSON.

The xml:lang attribute is the mechanism to specify the language used in an Atom document. The attribute can appear anywhere within the Atom document and can either be inherited or overridden by child elements. The value of the xml:lang attribute is a "Language Tag" as defined by RFC 4646, entitled "Tags for Identifying Languages." These tags provide information about the language that the text was written in, the script or writing-system used, the regional linguistic variation, and so on. Such information affects how text will be presented by a user-agent.

The example Atom feed shown in Listing 1, despite being somewhat nonsensical, demonstrates the use of xml:lang within an Atom document.


Listing 1. An Atom feed using xml:lang
                
<feed xmlns="http://www.w3.org/2005/Atom" 
      xml:lang="en-US"> 
  <title>This is the title</title>
  ... 
  <entry xml:lang="fr"> 
    ... 
    <category term="foo" label="bar" xml:lang="en" /> 
    <category term="goo" label="baz" />
    ... 
  </entry> 
</feed>
      

The xml:lang attribute on the feed element establishes the default language context for the entire document. Any descendant element that does not explicitly override the xml:lang (the title element of the feed, for instance) will automatically inherit that language.

Within the JSON serialization, the language context can be captured by converting the xml:lang attribute into a field. The transformation rule is simple: output a single lang field whenever the language context changes. Listing 2 shows the serialization of the feed in Listing 1. Note that the lang field appears for the feed, entry and the first category, but is omitted on the title and second category; that is because the latter elements inherit the language of their parent elements.


Listing 2. The JSON serialization of the feed in Listing 1
                
{ 
  "lang":"en-US", 
  "title":"This is the title",
  "entries":[ 
    { 
      "lang":"fr" 
      "categories":[ 
        { 
          "lang":"en", 
          "term":"foo", 
          "label":"bar" 
        }, 
        { 
          "term":"goo", 
          "label":"baz" 
        } 
      ] 
    } 
  ] 
}
      

The xml:lang attribute can appear on any element whose children or attributes might be considered to be Language Sensitive. Within the core Atom vocabulary, this set includes feed, entry, title, rights, subtitle, summary, content, category, link, author, and contributor.


IRIs and URIs

In addition to the support for multiple languages, Atom also supports the use of Internationalized Resource Identifiers, or IRIs. An IRI is essentially a URI that is allowed to contain non-ASCII characters.

When designing an Atom-to-JSON serialization, it is important to distinguish between the two uses of IRIs within an Atom document. In the example shown below in Listing 3, four elements contain IRI's.


Listing 3. A feed containing multiple Internationalized Resource Identifiers
                
<feed xmlns:a="http://www.w3.org/2005/Atom" 
        xml:base="http://examplé.org/foó"> 
  ... 
  <id>http://examplé.org/foó</id>
  <link rel="self" href="" /> 
  <link href="/blog" /> 
  <category scheme="http://examplé.org/foó/bar" term="foo" />
  ... 
</feed>
      

The id and category elements use IRIs as identifiers. Despite the fact that they look like ordinary URLs, such identifiers are not intended to be dereferenceable. Atom requires that IRIs used in this way have to be absolute, and generally treats them as opaque values that require no additional processing on the part of Atom consumers. Within the Atom vocabulary, id and category are the only elements that use IRIs as identifiers.

The link element, however, is specifically intended to provide a reference to some other resource. The href attribute is expected to specify an IRI that can be dereferenced. The value of the attribute can be a relative path and must be processed by Atom consumers to produce the correct results. Such processing includes converting the IRI to an equivalent URL. Within the core Atom vocabulary, the icon, logo, uri, link and content elements can contain dereferenceable IRIs.

Listing 4 shows the JSON serialization of the id and category elements. Note that the IRIs are copied as-is, without any additional processing.


Listing 4. JSON serialization of the id and category elements from Listing 3
                
"id":"http://examplé.org/foó" 
"categories":[
  {
    "scheme":"http://examplé.org/foó/bar", 
    "term":"foo"
  }
]
      

Dereferenceable IRIs, on the other hand, need to be resolved into fully-qualified, absolute paths and converted into URLs to be generally useful within JSON, as illustrated in Listing 5. Note that "http://xn--exampl-gva.org/fo%C3%B3" is the ASCII equivalent URL to the IRI "http://examplé.org/foó".


Listing 5. JSON serialization of the link elements from Listing 3
                
"links":[ 
  { 
    "rel":"self", 
    "href":"http://xn--exampl-gva.org/fo%C3%B3" 
  }, 
  { 
    "href":"http://xn--exampl-gva.org/blog" 
  } 
] 
      

Any dereferenceable IRI or URI appearing anywhere within the Atom document should be given similar treatment, especially those appearing within HTML and XHTML markup in text constructs and content as shown in Listing 6.


Listing 6. XHTML markup containing relative URI paths
                
<a:content type="xhtml"> 
  <div> 
    <img src="/images/foo.jpg" /> 
  </div> 
</a:content>
     

Because JSON is incapable of preserving any notion of a Base URI, it is unlikely that an application using the JSON serialization is capable of properly rendering markup containing relative URI paths. It is therefore important that such paths be resolved automatically during the conversion process as shown in Listing 7.


Listing 7. JSON serialization of the XHTML markup from Listing 6
                
"content":{ 
  "attributes":{ 
    "type":"xhtml" 
  }, 
  "children":[ 
    { 
      "name":"img", 
      "attributes":{ 
        "src":"http://example.org/images/foo.jpg" 
      }, 
      "children":[] 
    } 
  ] 
}
     

This obviously requires the application that produces the serialized JSON output to be capable of properly parsing the HTML and XHTML markup and picking out the attributes whose values can be URIs.


Repeatable elements

Several elements within the Atom vocabulary are allowed to appear more than once within a feed or entry. In the JSON serialization, these are rendered as Arrays as illustrated in Listings 8 and 9.


Listing 8. Atom feed, entry and source elements can contain multiple atom:author elements
                
<author> 
  <name>James M Snell</name> 
</author> 
<author> 
  <name>Jane Doe</name> 
</author>
     


Listing 9. JSON serialization of the author elements in Listing 8
                
"authors":[
  {
    "name":"James M Snell",
  },
  {
    "name":"Jane Doe",
  }
]
     

Within the Atom vocabulary, the author, contributor, category, link, and entry elements are all repeatable. Listing 10 through 13 below further illustrate the conversion of repeatable elements into JSON arrays. The examples are fairly self-explanatory, I won't go into the details.


Listing 10. Multiple Atom category elements
                
<category term="foo" 
  scheme="http://example.org/categories/" />
<category term="bar" 
  scheme="http://example.org/categories/" 
  label="Bar" />
     


Listing 11. JSON serialization of the multiple category elements in Listing 10
                
"categories":[
  {
    "term":"foo",
    "scheme":"http://example.org/categories/"
  },
  {
    "term":"bar",
    "scheme":"http://example.org/categories/", 
    "label":"Bar"
  }
]
     


Listing 12. Multiple Atom link elements
                
<link href="/foo" 
      rel="related" 
      title="Related Entry"
      type="text/html" />
<link href="/bar" 
      title=”Alternate View"
      type="text/html" />
      


Listing 13. JSON serialization of the multiple link elements in Listing 10
                
"links":[
  {
    "href":"http://example.org/foo",
    "rel":"related",
    "title":"Related Entry",
    "type":"text/html"
  },
  {
    "href":"http://example.com/bar",
    "title":"Alternate View",
    "type":"text/html"
  }
]
     


Dates

Date and times within Atom are represented using a subset of the ISO 8601 format specified by RFC 3339. Listing 14 shows an example of an Atom updated element.


Listing 14. RFC 3339 date-time in the Atom updated element
                
<updated>2007-10-14T12:12:12Z</updated>
     

Because of JSON's data typing model, I have a limited range of options when I serialize a Date. I can either:

  • Convert the date to a numeric value indicating the number of seconds that have passed since midnight UTC on January 1, 1970 (for example, a standard UNIX® timestamp)
  • Convert the date to the JavaScript toString serialization of a date (for example. Sun Oct 21 2007 12:34:28 GMT-0700 (PDT))
  • Copy the date exactly as it is presented within the Atom document

The first two options have the benefit of working within JavaScript without much work on the part of the developer. That is, to get a proper JavaScript Date object, I can simply call new Date(feed.updated). The downside, however, is that with the first option, potentially important information about the number the milliseconds and the timezone offset is lost; and with the second, the toString serialization of the date can vary across implementations and locales. While using the RFC 3339 serialization requires additional work, it is the only option that avoids data loss and ambiguity.


Listing 15. JSON serialization of the updated element in Listing 14
                
"updated":"2007-10-14T12:12:12.000Z"
     

The JavaScript code in Listing 16 parses RFC 3339 date-times and produces a proper JavaScript date object.


Listing 16. JavaScript code for parsing RFC 3339 date-times as used in Atom
                
AtomDate = Class.create(); 
AtomDate.pattern = /^(\d{4})(?:-(\d{2}))?(?:-(\d{2}))?(?:[Tt]   \
  (\d{2}):(\d{2}):(\d{2})(?:\.(\d*))?)?([Zz])?(?:([+-])(\d{2}):(\d{2}))?$/; 
AtomDate.localoffset = (new Date()).getTimezoneOffset(); 
AtomDate.padding = function(val,char,count) { 
  var value = ""; 
  while(count > 0) { 
    if (val < Math.pow(10,count)) value = char + value; 
    count--; 
  } 
  return value + val; 
} 
AtomDate.parse = function(val) { 
  if (!val) throw "Invalid Date"; 
  if (val instanceof Date) return val; 
  var m = AtomDate.pattern.exec(val); 
  var year = new Number(m[1]?m[1]:0); 
  var month = new Number(m[2]?m[2]:0); 
  var day = new Number(m[3]?m[3]:0); 
  var hour = new Number(m[4]?m[4]:0); 
  var minute = new Number(m[5]?m[5]:0); 
  var second = new Number(m[6]?m[6]:0); 
  var millis = new Number(m[7]?m[7]:0); 
  var gmt = m[8]; 
  var dir = m[9]; 
  var offhour = new Number(m[10]?m[10]:0); 
  var offmin = new Number(m[11]?m[11]:0); 
  
  if (dir && offhour && offmin) { 
    var offset = ((offhour * 60) + offmin); 
    if (dir == "+") { 
      minute -= offset; 
    } else if (dir == "-") { 
      minute += offset; 
    } 
  } 
  return new Date(Date.UTC(year,month,day,hour,minute,second,millis)); 
}

Object.prototype.value2date = function() { 
  return AtomDate.parse(this); 
}
     

The value2date function makes it possible to get the value of any string whose value is an RFC 3339 date as a JavaScript Date.

Note: The regular expression for AtomDate.pattern is split into multiple lines for formatting purposes.


Listing 17. Using the date parsing code from Listing 16
                     
document.write(m.updated.value2date());
     


Text constructs

Atom's support for a broad range of text and content options is, by far, the most complicated and difficult thing to get right in the JSON serialization. Text constructs such as the title, subtitle, summary and rights elements can contain plain text, escaped HTML or XHTML markup; they are language sensitive so the xml:lang context must be considered; and the HTML and XHTML markup can contain relative URIs that need to be resolved. The Atom content element makes things even more difficult by throwing in support for Base64 encoded content, arbitrary XML markup, and external content referenced using the src attribute.

The goal for a JSON serialization of Atom content is to find a generalized representation that captures all of these options as consistently as possible. Listing 18 shows an example feed with three text constructs.


Listing 18. An Atom feed element with three different types of text constructs
                
<feed xmlns="http://www.w3.org/2005/Atom"> 
  ...
  <title>Example Feed</title>
  <subtitle type="html"><p>This is an example feed</p></subtitle>
  <rights xml:lang="fr">...</right>
  ...
</feed>
     

The title element is considered to be plain text. It inherits the language context of feed and is otherwise nondescript. The subtitle element contains escaped HTML markup. The rights element is plain text, but overrides the language context. Listing 19 shows the JSON serialization of these three elements.


Listing 19. JSON serialization of the text constructs in Listing 18
                
{
  "title":"Example Feed", 
  "subtitle":{ 
    "attributes":{ 
      "type":"html" 
    }, 
    "children":[
      { 
        "name":"p", 
        "attributes":{ }, 
        "children":["This is an example feed" ] 
      } 
    ]
  },
  "rights":{
    "attributes":{
      "lang":"fr"
    },
    "children":[
      "..."
    ]
  } 
}
     

Note that in the simplest case, the title is serialized as a simple string. However, because the language context changes in the rights element, despite being plain text, it is serialized as an object consisting of two fields, attributes and children.

For the subtitle element, the HTML markup is parsed and rendered as a hierarchical structure. The same structure is used for XHTML markup (Listings 20 and 21).


Listing 20. Text construct using XHTML markup
                
<subtitle type="xhtml">
  <div xmlns="..."><p>This is an example feed</p></div>
</subtitle>
     


Listing 21. JSON serialization of the XHTML markup in Listing 20
                
"subtitle":{ 
  "attributes":{ "type":"xhtml" }, 
  "children":[
    { 
      "name":"p", 
      "attributes":{ }, 
      "children":["This is an example feed" ] 
    }
  ] 
}
     

This structure ensures that plain text and markup content is serialized in an unambiguous and consistent way. The downside, however, is that displaying the content in a Web browser is not as straightforward as doing a simple document.write(...) or setting the innerHTML property on a div. What you need is a way to convert the parsed out structure back into a form that is easy to display.

Listing 22 is a modified version of code originally provided by Sam Ruby that takes the JSON serialization of an Atom text construct or content element and converts it back into a string.


Listing 22. JavaScript that reconstructs JSON serialized text constructs into displayable string values
                
Array.prototype.hash2value = function () { 
  var result = ''; 
  for (var i=0; this.length>=i; i++) if (this[i]) result+=this[i].tag2value(); 
  return result; 
} 

Object.prototype.tag2value = function () { 
  if (this.name) { 
    var result = String.fromCharCode(60) + this.name; 
    for (key in this.attributes) { 
      if (typeof(this.attributes[key]) === 'function') continue; 
      result += ' ' + key + '="' + this.attributes[key].toString() + '"'; 
    } 
    result += '>' + this.children.hash2value(); 
    result += String.fromCharCode(60) + '/' + this.name + '>'; 
    return result; 
  } else return this.toString(); 
}

Object.prototype.value = function() { 
  if (this.children) return this.children.hash2value(); 
  else return this.toString(); 
} 

Array.prototype.value = function() { 
  var result = ''; 
  for (var i = 0; this.length>=i; i++) 
    if (this[i]) result+=this[i].value(); 
  return result; 
}
     

The value functions added to the JavaScript Array and Object classes makes it possible to use the hash2value function without concern about whether a particular object is serialized as a simple string or an object structure.


Listing 23. Using the JavaScript code from Listing 22 to display text from the JSON serialization
                
document.write(m.title.value()); 
document.write(m.subtitle.value()); 
document.write(m.rights.value());
     

Using the model presented here, working with extended content types within the atom:content element is no different than working with plain text, HTML or XHTML. Listing 24 shows four extended content examples that illustrate the use of alternative media types, arbitrary well-formed XML, Base64 encoded binary data and referenced content using the src attribute.


Listing 24. Examples of extended content types
                     
<content type="text/plain">This is plain text</content>

<content type="application/xml"><a xmlns="foo">b</a></content>

<content type="image/jpg">{base64 encoded data}</content>

<content type="image/jpg" src="image.jpg" />
     

Listing 25 shows the JSON serializations for each of these content forms.


Listing 25. JSON serialization of the extended content types from Listing 24
                
"content":{ 
  "attributes":{ "type":"text/plain" }, 
  "children":["This is plain text" ] 
} 

"content":{
  "attributes":{ "type":"application/xml" },
  "children":[
    {
      "name":"a",
      "attributes": {
        "xmlns":"foo"
      },
      "children":["b"]
    }
  ]
}

"content":{ 
  "attributes":{ "type":"image/jpg" }, 
  "children":["{base64 encoded data}" ] 
} 

"content":{ 
  "attributes":{ 
    "type":"image/jpg", 
    "src":"http://example.org/image.jpg" 
  }, 
  "children":["" ] 
} 
     


Extensions

The final area of concern when you serialize Atom to JSON is how to deal with extensions. You have three possible choices:

  • Ignore all extensions and do not include them in the JSON serializations
  • Serialize known-extensions and ignore everything else
  • Serialize all extensions

The first choice is obviously the easiest choice but limits the overall usefulness of the JSON representation (which is not necessarily a bad thing). The second choice allows the JSON serialization of known extensions to be optimized and simplified but still limits the usefulness of the serialization. The third option significantly increases the overall complexity of the serialization but ensures that all of the information from the original Atom document is carried through to the JSON representation.

First, look at how to optimize the output for a known extension. The Atom Threading Extension (RFC 4685) provides a means to indicate that one entry is a response to another. The specification for the threading extension specifies the attributes and meaning of the in-reply-to element clearly and indicates that multiple in-reply-to elements can appear within an entry. Knowing this, you can create an optimized JSON representation of the in-reply-to element as illustrated in Listings 26 and 27.


Listing 26. Using the Atom Threading Extension in an Atom Entry
                
<entry>
  ...
  <thr:in-reply-to ref="tag:example.org,2007:/foo/entries/2" />
  <thr:in-reply-to ref="tag:example.org,2007:/foo/entries/3" />
  ...
</entry>
     


Listing 27. JSON serialization of the Atom Threading Extensions from Listing 26
                
 "inreplyto":[
   { 
     "ref":"tag:example.org,2007:/foo/entries/2" 
   },
   {
     "ref":"tag:example.org,2007:/foo/entries/3"
   }
 ]
     

For unknown extensions, however, you'll use a more verbose and generalized syntax to capture all of the necessary details. Listing 28 shows an example of an entry with a complex, unknown extension element.


Listing 28. Complex, unknown extension in an Atom entry
                
<entry>
  ...
  <foo:a xmlns="..."><foo:b><foo:c d="e">f</foo:c></foo:b></foo:a>
  ...
</entry>
     

The serialization of the extension (Listing 29) is generally identical to that used for text consisting of HTML or XHTML markup.


Listing 29. JSON serialization of the extension in Listing 28
                
"extensions":[
  {
    "name":"foo:a", 
    "attributes":{ 
      "xmlns:foo":"http://example.org/unknown-markup" 
    }, 
    "children":[
      { 
        "name":"foo:b", 
        "attributes":{ }, 
        "children":[
          { 
            "name":"foo:c", 
            "attributes":{ "d":"e" }, 
            "children":["f" ]
          }
        ]
      }
    ]
  }
]
     


Putting it all together

With this foundation, it is possible to take any Atom document and convert it into a usable JSON representation. Listings 30 and 31 provide a complete illustration of conversion process. The original Atom document contains relative IRIs, language context, extensions, a variety of text and content types, etc. Running this document through any of the many XML-to-JSON converters available on the Internet will provide a range of alternative serializations that will invariably suffer data loss or serious usability problems or both.


Listing 30. Complete Atom feed document
                
<?xml version="1.0" encoding="utf-8" ?> 
<a:feed xmlns:a="http://www.w3.org/2005/Atom" 
        xmlns:thr="http://purl.org/syndication/thread/1.0" 
        xmlns="http://www.w3.org/1999/xhtml" 
        xmlns:foo="http://example.org/unknown-markup" 
        xml:lang="en-US" 
        xml:base="http://example.org/foo" 
        dir="ltr"> 
  
  <a:id>tag:example.org,2007:/foo</a:id> 
  <a:title>Example Feed</a:title> 
  <a:subtitle type="html"><![CDATA[<p>This is an example feed</p>]]></a:subtitle> 
  <a:rights type="xhtml">
    <div>
      <p>Copyright © James M Snell</p>
    </div>
  </a:rights> 
  <a:author xmlns="http://www.w3.org/2005/Atom"> 
    <name>James M Snell</name> 
    <email>jasnell@example.org</email> 
    <uri>/~jasnell</uri> 
  </a:author> 
  <a:updated>2007-10-14T12:12:12Z</a:updated> 
  <a:link rel="self" href="" /> 
  <a:link href="/blog" /> 
  <a:link rel="alternate" type="application/json" href="/blog;json" /> 
  
  <a:entry xml:base="entries/1"> 
    <a:id>tag:example.org,2007:/foo/entries/1</a:id> 
    <a:title type="text">Entry Number One</a:title> 
    <a:summary type="xhtml"> 
      <div> 
        <p>This is the first entry. You can read it <a href="">here</a></p> 
      </div> 
    </a:summary> 
    <a:rights type="html">
      <p>Copyright &copy; James M Snell</p>
    </a:rights> 
    <a:updated>2007-10-14T12:12:12Z</a:updated> 
    <a:link href="" /> 
    <a:link rel="alternate" type="application/json" href="1;json" /> 
    <a:link rel="replies" type="application/atom+xml" 
      href="1;replies" thr:count="10" /> 
    <a:content type="xhtml"> 
      <div> 
        <p>This is the content of the first entry. It contains a picture.</p> 
        <img src="/images/foo.jpg" /> 
      </div> 
    </a:content> 
    <thr:in-reply-to ref="tag:example.org,2007:/foo/entries/2" /> 
    <a:category scheme="http://example.org/categories/" term="foo" 
      label="test" xml:lang="en-US" /> 
    <a:category scheme="http://example.org/categories/" term="bar" 
      label="essai" xml:lang="fr" /> 
    <foo:a><foo:b><foo:c d="e">f</foo:c></foo:b></foo:a> 
  </a:entry> 
 
  <a:entry xml:base="entries/2" xml:lang="fr"> 
    <a:id>tag:example.org,2007:/foo/entries/2</a:id> 
    <a:title type="text">La première entrée</a:title> 
    <a:summary type="xhtml"> 
      <div> 
        <p>Il s'agit de la première entrée. Vous pouvez lire 
        <a href="">est ici</a></p> 
      </div> 
    </a:summary>
    <a:rights type="html">
      <p>Copyright &copy; James M Snell</p>
    </a:rights> 
    <a:updated>2007-10-14T12:12:11Z</a:updated> 
    <a:link href="" /> 
    <a:link rel="alternate" type="application/json" href="2;json" /> 
    <a:link rel="replies" type="application/atom+xml" 
      href="2;replies" thr:count="10" /> 
    <a:content type="xhtml"> 
      <div> 
        <p>Ceci est le contenu de la première entrée. Il contient une image.</p> 
        <img src="/images/foo.jpg" /> 
      </div> 
    </a:content> 
    <thr:in-reply-to ref="tag:example.org,2007:/foo/entries/1" /> 
    <a:category scheme="http://example.org/categories/" term="foo" 
      label="test" xml:lang="en-US" /> 
    <a:category scheme="http://example.org/categories/" term="bar" 
      label="essai" xml:lang="fr" /> 
    <foo:a><foo:b><foo:c d="e">f</foo:c></foo:b></foo:a> 
  </a:entry> 
</a:feed>
     

The Atom-to-JSON serialization technique described here provides a result that is readable, usable, and avoids loss of important contextual data.


Listing 31. JSON serialization of the complete Atom feed document from Listing 30.
                
{ 
 "lang":"en-US", 
 "dir":"ltr", 
 "id":"tag:example.org,2007:/foo", 
 "title":"Example Feed", 
 "subtitle":{ 
  "attributes":{ 
   "type":"html" 
  }, 
  "children":[{ 
    "name":"p", 
    "attributes":{ }, 
    "children":["This is an example feed" ] 
   } ] }, 
 "rights":{ 
  "attributes":{ "type":"xhtml" }, 
  "children":[{ 
    "name":"p", 
    "attributes":{ }, 
    "children":["Copyright \u00a9 James M Snell" ] 
   } ]}, 
 "updated":"2007-10-14T12:12:12.000Z", 
 "authors":[{ 
   "name":"James M Snell", 
   "email":"jasnell@example.org", 
   "uri":"http://example.org/~jasnell" 
  } ], 
 "links":[
   { 
     "href":"http://example.org/foo", 
     "rel":"self" 
   },
   { 
     "href":"http://example.org/blog" 
   },
   { 
     "href":"http://example.org/blog;json", 
     "rel":"alternate", 
     "type":"application/json" 
   } ], 
 "entries":[...],
 "attributes":{ 
  "xml:lang":"en-US", 
  "xml:base":"http://example.org/foo" 
 }
}
     


Using the Abdera JSON Writer

The technique described here has been implemented as part of the Apache Abdera project. The code in listing 32 demonstrates the use of the Abdera JSON Writer. If you want to experiment with Atom-to-JSON conversion, please visit the Abdera wiki for information on how to download the latest development image.


Listing 32. Using the Apache Abdera JSON Writer
                
Abdera abdera = new Abdera();
Entry entry = abdera.newEntry();
    
entry.setId("http://example.org");
entry.setTitle("Testing the JSON Writer");
entry.setUpdated(new Date());
entry.addLink("http://www.example.org");
entry.addAuthor("James Snell");
entry.setSummary("This is a test of the JSON Writer");
    
entry.writeTo("json", System.out);
     


Wrapping up

Converting data from one format to another has always been a difficult task. With a data format as rich and capable of Atom, serializing to a format as simple and basic as JSON can be problematic at best. While there have been several very good attempts at getting it right, until an effort is made to produce a standardized transformation, application developers need to be prepared to deal with multiple, potentially incompatible serializations that vary broadly in quality. The technique described here is just one of several possible approaches.


Resources

Learn

  • RFC 4287: Read more about the Atom Syndication Format, an XML-based document format that describes feeds that syndicate Web content such as weblogs and news headlines to Web sites and directly to user agents.

  • RFC 4627: Learn more about the JSON Format in a detailed IETF memo.

  • Apache Abdera wiki: Learn about the Abdera project and developing Atom-enabled Java app.

  • Google's GData APIs: See an alternative approach to converting Atom-to-JSON.

  • IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.

  • XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.

  • developerWorks technical events and webcasts: Stay current with technology in these sessions.

  • The technology bookstore: Browse for books on these and other technical topics.

Get products and technologies

  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.

Discuss

About the author

James Snell

James M. Snell is a software engineer in IBM's WebAhead group focusing on the development and practical application of key emerging technologies for IBM's own use. James has participated in the effort to define the Atom Syndication Format and Atom Publishing Protocol standards and has implemented no less than nine different Atom Publishing Protocol server implementations.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source, Web development
ArticleID=284571
ArticleTitle=Convert Atom documents to JSON
publish-date=01082008