Skip to main content

skip to main content

developerWorks  >  Web development | XML | Open source  >

Implement news syndication using RSS and Atom

A handy reference for using RSS 2.0 and Atom 1.0 in content syndication

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Ying Ying Lin (queenielyy@yahoo.com.cn), Software Engineer, Dong Hua University, Shanghai, China

26 Sep 2006

The advent of RSS and Atom technologies brings a bright new era of news syndication. It takes time, however, for Web site administrators to publish the news manually every day and to manage e-mail subscribers. This article shows how to implement a general news publication architecture using RSS and Atom syndication formats to ease the process and minimize human error.

If you have at least a basic understanding of content syndication and the RSS 2.0 format specification, as well as the Atom 1.0 format specification, then you're ready to implement a news syndication system using one or the other or both, and this article is intended to be a handy reference for you.

RSS and Atom syndication

RSS and Atom are similar XML-based document formats that describe lists of related information known as feeds. These feeds are composed of a number of items, each with an extensible set of attached metadata; for example, each item has a title. The primary purpose of these feeds is the syndication of Web content such as Weblogs or news headlines to Web sites and directly to user agents.

Two examples of RSS 2.0 and Atom 1.0


Listing 1. Sample RSS 2.0 feed
        
        <?xml version="1.0"?>
        <rss version="2.0">
          
        <channel>
           <title>Feed Title</title>
           <link>http://yourwebsite.com/</link>
           <description>Feed Description</description>
           <language>en-us</language>
           <pubDate>Mon, 03 Jan 2005 12:00:00 GMT</pubDate>
            
        <item>
           <title>Article Title</title>
           <link>http://yourwebsite.com/articlelink.html</link>
           <description>Your content included here.</description>
        </item>
        <item>
           <title>Sports</title>
           <link>http://yourwebsite.com/sportslink.html</link>
           <description>Your content included here.</description>
        </item>
            
        </channel>
        </rss>
        


Listing 2. Sample Atom 1.0 feed
        
        <?xml version="1.0" encoding="utf-8"?>
        <feed xmlns="http://www.w3.org/2005/Atom">

          <title> Feed Title </title> 
          <link href=" http://yourwebsite.com/"/>
          <updated>2003-12-13T18:30:02Z</updated>
          <author> 
            <name>Your Name</name>
          </author> 
          <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

          <entry>
            <title>Article Title</title>
            <link href=" http://yourwebsite.com/articlelink.html "/>
            <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
            <updated>2003-12-13T18:30:02Z</updated>
            <summary>Some text.</summary>
          </entry>
          <entry>
            <title>Sports</title>
            <link href=" http://yourwebsite.com/sportslink.html "/>
            <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344e45ab90</id>
            <updated>2003-12-14T13:30:55Z</updated>
            <summary>Some text.</summary>
          </entry>

        </feed>

        

Similarities in RSS and Atom feeds

As you've seen in the previous two examples (Listing 1 and Listing 2), RSS and Atom are similar XML-based formats. Their basic constructions are the same with only a small difference between the expressions of the nodes.

Each feed file actually represents a channel. It has a channel title, link, description, author, and so on. The channel information provides the basic information about the feed. Following the channel information are several items. Each item represents a real piece of news or articles you can read from the feed reader. Usually an item contains a title, link, date of update, and summary information.

Differences of RSS and Atom feeds

With RSS 2.0 and Atom 1.0, Compared as a reference, review the differences between RSS and Atom.


Table 1. RSS 2.0 and Atom 1.0 Compared
DifferenceRSS 2.0Atom 1.0
DeploymentRSS 2.0 is widely deployed.Atom 1.0 is not widely deployed.
SpecificationHarvard has copyrighted and frozen the RSS 2.0 specification.The Atompub Working Group (within the IETF) agreed on the Atom 1.0 specification, and that it could be revised in the future.
Required contentRSS 2.0 has a feed-level title, link, and description required. It doesn't require that any individual item fields be present in a feed.Atom 1.0 has a title (which can be empty), a unique identifier, and a last-updated time stamp required for both feeds and entries.
PayloadRSS 2.0 can have plain text or escaped HTML, but you can't tell which of the two is provided.Atom 1.0 has a payload container.
Full or partial contentRSS 2.0 has a <description> element that can contain either the full text or a synopsis of an entry. It doesn't have a built-in way to signal whether the contents are complete. Atom 1.0 offers separate <summary> and <content> elements. The summary is good to use for accessibility reasons if it's nontextual or nonlocal content.
Auto-discoveryRSS 2.0 auto-discovery is implemented in different ways.Atom 1.0 standardizes auto-discovery.
Extraction and aggregationRSS 2.0 has only one recognized form: an <rss> document.Atom 1.0 allows stand-alone Atom Entry documents, which can be transferred using any network protocol; for example, XMPP. Atom also has support for aggregated feeds, where entries point to the feed they came from if they will be included in other feeds.

A general architecture for RSS and Atom publication system

With the development and sophistication of the RSS and Atom format specifications, more and more Web applications are being implemented. The most popular and typical implementation using RSS or Atom is a news publication system.

Here is a general architecture for implementing a publication system using both RSS and ATOM feeds. This architecture consists of three parts:

  • A feed generator subsystem
  • A feed operation subsystem
  • An auto-publication subsystem

Now view the details of the three subsystems individually.

A feed generator

A feed generator takes charge of generating XML-based feed files. The class diagram of the feed generator is as follows:


Figure 1. Feed generator class diagram
Feed generator class diagram

The core of the feed generator is class FeedFileManager. It generates the XML file related to a particular feed. The feed has two specific types: AtomFeed and RssFeed. Both of the two feed classes contain a feed channel class and a feed item class. The realizations of the feed channel class and feed item class will vary due to their different constructions.

A feed operation subsystem

A feed operation subsystem provides several feed operation functions to operate these two types of feed files. For example, insertItem(), deleteItem(), updateItem() functions. The class diagram of the feed operation subsystem is as follows:


Figure 2. Feed operation subsystem class diagram
Feed operation subsystem class diagram

First, you need an abstract factory pattern PublisherFactory to generate either one of two feed publishers. Then the publisher pulls out the data from the database through a DataCollection class.

An auto-publication subsystem

An auto-publication subsystem is a timer for updating the feed files in a fixed time. This is the real part of utilizing the previous two subsystems and makes the publication system work.


Figure 3. Auto-publication subsystem class diagram
Auto-publication subsystem class diagram


Back to top


Step-by-step implementation of how to generate RSS and ATOM feeds automatically

Now, look at the detailed implementation procedure, step by step. First, Figure 4 provides a general implementation procedure diagram. It shows several main building blocks and the relationship between them.


Figure 4. Overview of implementation procedure
Overview of the implementation procedure

The whole application is a big timer, so you can set a fixed launch time inside it. When time is up, the timer starts to do the scheduled work -- generate feed files. First, the application generates an XML file, then forms RSS and Atom feeds, and finally writes these feeds to the created XML file. The source of the feeds can be a database, a file or various other resources.

Timer

A timer is the core of the publication system. It triggers the system to generate feed files according to the user's appointment or in a fixed time per day. For a Java application, you can usually use the Java standard scheduling classes: Timer and TimerTask to implement a repeated execution schedule task.

It's more common to compose a scheduled recurring task in this publication system. You can refer to Scheduling recurring tasks in Java applications for detailed information on how to compose a scheduled recurring task.


Listing 3. Sample Timer start method
        
           public void start() {
               SchedulerTask st = new SchedulerTask(){
               public void run() {
	       try {
	           updateNewsFeeds(); //concrete method of update feed files.
                   updateBooksFeeds();
                    ... 
		   } catch (Exception e) {
		       e.printStackTrace();
			}
		  }
		};

        

Generate XML-based feed files

Both RSS and ATOM feeds are XML based, so generating these XML-based files is important during the implementation procedures. The FeedFileManager class in the feed generator subsystem is responsible for this task.


Figure 5. FeedFileManager class
FeedFileManager class

Generally this class should contain a createFeedFile() method. This method needs to do three things:

  • Create an XML file. You can create a static method createXMLFile() to achieve this goal.
  • Interact with "Feed Operation Subsystem" to create the related channel and item information.
  • Write all of that information back to the XML file.

Listing 4. Sample createFeedFile method
        
	public String createFeedFile(String channelid, String name, String type)
			throws Exception {
                ...
		createXMLFile(file, type);

		IPublishable publisher = PublisherFactory.createPublisher(type, file);
		Feed feed = getFeed(type);

		Channel channel = feed.getChannel();
		ArrayList itemlist = feed.getItemList();
		
		publisher.insertChannel(channel);
		publisher.insertItemList(itemlist);
		publisher.writeback(file);

		return file;
	}
        

Form RSS/Atom feeds

In the last procedure, the publish factory creates a publisher and inserts related channel information and itemlist to the certain feed.


Listing 5. Sample code to insert channel and itemlist
        
		publisher.insertChannel(channel);
		publisher.insertItemList(itemlist);
        

To form the exact channel and itemlist, pull out the data from the database and input pulled-out data to the feed pattern. The following demonstrates how to pull out the data and then input the information to the related databean:


Listing 6. Sample code for feeds
        
  	...
        NewsBeanManager newsmanager = new NewsBeanManager();
        newslist = newsmanager.getAllNews(); //interact with database
        ...
	for (Iterator it = newslist.iterator(); it.hasNext(); it.next()) {

		title = "News " + i + ": " + ((NewsBean) newslist.get(i)).getTitle();
		link = ((NewsBean) newslist.get(i)).getLink();
		author = ((NewsBean) newslist.get(i)).getAuthor();
		timestamp = ((NewsBean) newslist.get(i)).getPublishTime();
		id = String.valueOf(((NewsBean) newslist.get(i)).getNewsID());
		description = ((NewsBean) newslist.get(i)).getSummary();
		content = ((NewsBean) newslist.get(i)).getContent();
		rssitem = new RssItem(title, link, author, timestamp, id,
					description);
		rssitemlist.add(i, rssitem);
		atomitem = new AtomItem(title, link, author, timestamp, id,
					description, content);
		atomitemlist.add(i, atomitem); 
		i++;
		}
        



Back to top


Implementation tips

After you implement a news publication system successfully, you might benefit from the following tips for deployment.

Implementing RSS or Atom feeds for most popular feed readers

To meet the needs of most popular feed readers, pay attention to the following when you generate those feed files.

  • Add the RSS and Atom version information to the head of the XML file, as in Listing 7:

    Listing 7. Sample code to add RSS and Atom version information
            
       <?xml version="1.0" encoding="utf-8"?>   
         <rss version="2.0">
         ...
       <?xml version="1.0" encoding="utf-8"?>   
        <atom version="1.0">
            

  • Provide link information for the RSS and Atom channel. In RSS 2.0, the link information appears as a node, but in the Atom 1.0 link, the information appears as an attribute. It's a good idea to give the link information for the channel part, although the link isn't a mandatory item for Atom (see Listing 8).

Listing 8. Sample code to provide link information for the channels
        
   <?xml version="1.0" encoding="utf-8" ??> 
   <rss?>
   <channel?>
      <title?>News Syndication</title?> 
      <link?>http://www.newssyndication.com</link?> 
      <description?>this is a news feed</description?> 
      ...

   <?xml version="1.0" encoding="utf-8" ??> 
   <feed xmlns="http://www.w3.org/2005/Atom"?>
      <title?>News Syndication</title> 
      <link href="http://www.newssyndication.com" /?> 
      <updated?>2006-07-06T10:26:30Z</updated?> 
      <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id?> 
        

Support globalization when generating XML files

Add encoding information to the header of the XML file.


Listing 9. Sample code to ad encoding information in XML file header
        
   <?xml version="1.0" encoding="utf-8"?>
        

When you implement the function to write the XML file, use OutputStreamWriter instead of FileWriter to set encoding as utf-8. The default encoding of FileWriter is "GBK."


Listing 10. Sample code to set encoding as utf-8 with OutputStreamWriter
        
    OutputStreamWriter writer = new OutputStreamWriter(outputstream,"utf-8");
        



Back to top


In conclusion

In this article, you first reviewed the similarities and differences between RSS and Atom feeds, and then saw the architecture for implementing a publication system using both types of feeds. This architecture consists of three subsystems, which work in correlation with each other. After the step-by-step demonstration, you saw several tips for the implementation procedures. Enjoy your streamlined approach to news feeds.



Resources

Learn

Get products and technologies
  • IBM trial software: Build your next development project with software available for download directly from developerWorks.


Discuss


About the author

Photo of Ying Ying Lin

Ying Ying Lin is a graduate student from Dong Hua University and currently an intern at IBM China's Globalization Lab in Shanghai. She is interested in Java and workflow application technologies. You can contact her at queenielyy@yahoo.com.cn.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top