Creating an Atom feed in PHP

Easy syndication with PHP and MySQL

Atom is an XML specification that identifies information contained in a Web site. Using Atom, Web developers produce feeds that enable other Web developers (or consumers who use feed readers) to quickly locate and view information of interest on a remote site. Think of it as a Web site's index, available to anyone who wants it. Using PHP, a popular language of choice for most host providers, a Web developer can easily produce an Atom feed that can then be made available to the various feed readers and other Web developers. The ultimate result is a state-of-the-art information solution that enables the Web content to reach a much wider audience.

Share:

Brian M. Carey, Information Systems Consultant, Triangle Information Solutions

Photo of Brian CareyBrian Carey is an information systems consultant who specializes in the architecture, design, and implementation of Java enterprise applications. You can follow Brian on Twitter at http://twitter.com/brianmcarey, and his tweets are publicly available.



28 July 2009

Also available in Chinese Japanese Vietnamese Portuguese

What is Atom?

Frequently used acronyms

  • HTML: Hypertext Markup Language
  • RSS: Really Simple Syndication
  • SQL: Structured Query Language
  • URL: Uniform Resource Locator
  • XHTML: Extensible Hypertext Markup Language
  • XML: Extensible Markup Language

Atom, as it is used here, refers to an XML language that enables Web publishers to syndicate the content of their Web sites to various consumers. Using Atom, publishers are able to create a Web feed in a standardized format. This feed enables users to read the contents of the Web site with software known as a feed reader. It also enables other Web developers to publish the contents of the feed on their own Web sites.

Atom is by no means the only syndication standard in use today. RSS is another standardized format (also using XML) and predates Atom. In fact, Atom was created in response to certain limitations in RSS.

As a result, the Atom specification contains numerous advantages over RSS. Atom provides a means to define the format of the data being provided—for example, HTML, XHTML, and so on—whereas RSS does not. Atom, unlike RSS, supports internationalization with the xml:lang attribute. Atom also accepts more state-of-the-art (and standardized) date formatting, relying on Request for Comments (RFC) 3339 as opposed to RSS's RFC 822.

Why PHP with Atom?

PHP stands for PHP: Hypertext Processor. It might be the only acronym in the English language that, when expanded, still contains the original acronym. The historical significance here is that PHP originally stood for Personal Home Page.

PHP is a scripting language that produces dynamic, server-side content. It works harmoniously with HTML, and PHP code is frequently embedded within standard HTML Web pages to facilitate dynamic content.

PHP also works extremely well with MySQL, a database management system. Over the years of Web development, these two technologies have evolved together and worked side by side on countless occasions. This is almost certainly due to one undeniable, overarching rationale: They are both free.

In answer to the question at the top of this section, PHP gives developers the flexibility of producing dynamic content in an easy-to-read and easy-to-develop manner. The dynamic content is retrieved from a MySQL database. The output page (the feed) is coded using PHP so that it renders an XML output that conforms to the Atom specification.

Note that the explanations in this article were written assuming that you are familiar with the basics of MySQL and PHP. If you are not, see the links to introductory tutorials in the Resources section of this article.

Defining the business use case: Fishing reports

Your boss is in your office. He really likes the way that the company's Web site (fishinhole.com) is operating. The site currently markets and sells fishing tackle of all types to enthusiastic sport fishermen. The site also provides a forum for fishing reports, wherein said enthusiastic sport fishermen share their fish tales.

Your boss takes a seat (without asking) in a chair in your office and complains that the Web site isn't getting enough broad exposure. He wants to use the fishing reports section of the Web page to lure (pun intended) more enthusiastic sport fishermen to the Web site. He tells you that he wants you to make that section of the Web site a "one-stop shop" for sport fishing reports worldwide. This is essential to your ongoing success with fishinhole.com (or so he says). Your boss slurps his coffee, smiles, and walks out of your office with nothing else to say.

You lean back in your chair and get to thinking: What could give the fishing reports forum broader exposure? A moment later it comes to you: syndication! Instead of simply making the reports section available to users and shoppers at fishinhole.com, you can syndicate the forum so that people can read synopses of the fishing reports with their feed readers. Other Web developers might also include the syndication feed in their own Web pages. In either case, people would click on report titles of interest and be linked back to fishinhole.com, where you can expose them to a barrage of fishing tackle direct marketing. It's a great idea.

The database design

Long before your boss walked into your office, the database for the fishing reports forum was already designed. Recall that the fishing reports section of the Web page already exists. It simply hasn't been syndicated yet.

So what changes to the database do you need to make to syndicate its contents? None! That's one of the great things about syndication. In most cases, you can syndicate articles without changing your underlying schema or data model. This is because in most cases, you will syndicate articles, and articles almost always have the information that is required by the Atom specification.

Listing 1 shows the database model currently used by the fishing reports section of fishinhole.com. It also contains some INSERTs so that you have test data.

Listing 1. REPORTS table structure with INSERTs
CREATE TABLE IF NOT EXISTS `reports` (
  `ID` bigint(20) NOT NULL auto_increment,
  `AUTHOR` varchar(32) NOT NULL,
  `TITLE` varchar(64) NOT NULL,
  `SUBTITLE` varchar(128) NOT NULL,
  `CONTENT` varchar(2000) NOT NULL,
  `POSTED` datetime NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;


INSERT INTO `reports` (`id`, `author`, `title`, `subtitle`, `content`, `posted`) VALUES
(1, 'BigRed', 'Spanish Bite Looking Good!', 'Near the Cape!', 
'Trolled for 3 hours and limited out on Spanish Macks!  Watch out for the shallows 
near green can #4.', '2009-05-03 04:54:33'),
(2, 'JonBoy', 'Big Rock Report', 'Spring has sprung', 
'Caught several blackfins and mahi just outside of the Big Rock on Saturday.  
We were using flourescent squid teasers with ballyhoo for hookups.  
One Mahi weighed over 50#!', '2009-05-03 04:56:06'),
(3, 'Erasmus', 'Drum in the backwaters', 'The bite was hot!', 
'Loaded up against the marsh grass, boys.  Go get em.  I was using gulp 
with 1/4 ounce jigheads.', '2009-05-03 04:57:19'),
(4, 'ReelHooked', 'Speckled Trout In Old River', 'Limited out by noon', 
'They were schooling heavy in Old River.  They would eat anything we would 
throw at them.  Most were undersized, but we managed to keep some 
and had our fill by midday.', '2009-05-03 04:59:00');

If you are going to actually test the code presented in this article, you can do so by creating a MySQL database called fishinhole and executing the code from Listing 1 in that database.

The first column (ID) is the primary key of the table. Note that it uses the auto_increment specification so that every time a new row is inserted into the table, the ID column is populated with an increment of the previous row's ID column. This is similar to a sequence in an Oracle table.

The AUTHOR column simply specifies the user name of the person who posted the fishing report. This is the user's screen name, as opposed to the user's real first and last name (unless the user's real name is, in fact, the screen name).

The TITLE column is simply the title of the article. Likewise, the SUBTITLE is the subtitle of the article and is used in the Atom feed for the article synopsis.

The CONTENT column is the actual fishing report itself. Because the Atom feed produced here only includes a synopsis of the overall article (thus encouraging users to click the link and access the Web site), the content itself is not displayed in the Atom feed.

Finally, the POSTED column is a DATETIME column that stores the date that the article was posted on the Web site.

To keep things simple, I provide only a few articles (4). In a real-world situation, there would be thousands of these articles from hundreds of different authors.

The work commences

Now that you have the database design in place, it's time to code the PHP page so it produces an Atom feed. This article will walk you through the basics of creating a simple Atom feed, which you can test using PHP.

Please note that if you want to test the code you need access to a PHP processor. Most hosting solutions provide such access. You may also have access to one locally. Consult with the necessary system administrators or technical support staff to find out how you can execute PHP documents in your Web environment.

Accessing the database in PHP

Create a new PHP file called syndication.php. That's where you'll put your code.

As mentioned previously, PHP and MySQL have a rich history of working extremely well together. Some might go so far as to say they are married. But such judgments are not in the scope of this article.

Listing 2 provides a basic code snippet that enables you to access the MySQL database created with the code from Listing 1.

Listing 2. Accessing the MySQL database in PHP
$link = mysql_connect('localhost', 'admin', 'password')
    or die('Could not connect: ' . mysql_error());

mysql_select_db('fishinhole') or die('Could not select database');

$query = 'SELECT id,title,subtitle,author,posted 
     FROM reports order by posted desc limit 25';
;
$result = mysql_query($query) 
     or die('Query failed: ' . mysql_error());

That code actually covers quite a bit, so it's important to go over it step by step.

First comes the mysql_connect() function. You need to change the parameters according to the specifications of your own environment. The first parameter is the database host. In some cases, it will be just like Listing 2 (that is, localhost). In other cases it will be a remote host (for example, IP Address 10.92.2.1). It might also be an actual host name, assuming that you have Domain Name System (DNS) revolution (for example, mysql.myhost.com).

The second parameter is the name of the MySQL user who will access the database. You can use admin, as shown, only if that is a valid account within your own MySQL environment. Refer to the MySQL documentation to learn how to create accounts for a MySQL database. Keep in mind that the account used here must have read rights to the REPORTS database.

The third parameter is the user's password. This needs to match the password used for the user identified in the second parameter.

The or die clause is included to provide the developer with diagnostic information in the event of a failure. If a connection cannot be made to the database management system, you receive a message indicating that the connection failed and the reason it failed when the PHP script is executed.

Next comes the mysql_select_db() function. This is where you actually select which database you plan to use. Your own copy of MySQL can (and likely does) contain many databases, so it's important to specify which one you want to use.

Recall that I recommended you create a MySQL database called fishinhole if you want to test the functionality of the code provided in this article. The mysql_select_db() line specifies that you will use that database.

Next comes the actual query. In this case, you just define the query in a string. Here you grab the ID, TITLE, SUBTITLE, AUTHOR, and POSTED columns from the REPORTS table. The order by posted desc clause forces the query to return rows in descending order by the date in the POSTED column (which is the date that the article was posted on the Web site). So, you retrieve the most recent articles first. This is a standard practice for feeds.

The limit 25 clause at the end is important. This is where you specify that you want a maximum of 25 articles returned for this feed. Recall I mentioned earlier that forums such as this one can have thousands of articles. It is simply not practical to return thousands of articles in a feed. The bandwidth used is significant, and most consumers end up waiting for awhile.

This query is a string. It is assigned to a variable intuitively named $query.

In the mysql_query() function, you actually execute the query defined in the previous line. The results of that query are stored in the $result variable. Once again, the or die clause is in place for diagnostic purposes.

The loop and the Atom specification

Now that you have the data from the database, it's time to start displaying it in a format that conforms to the Atom specification. Because Atom is an XML language, the output of the PHP file is in XML format, as opposed to HTML format. If you intend to use a Web browser to display the output, just keep in mind that it will display differently depending upon your browser and version. To view the XML output, it's usually best to right-click on the output in a browser and select View Source. Then you will see the raw XML output.

Before displaying information about each article, it's important to include the preamble to the Atom feed. This is the section that identifies the output as an Atom feed and provides pertinent information about the feed, as shown in Listing 3.

Listing 3. The Atom preamble
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom"> 
     <title>Fishing Reports</title> 
     <subtitle>The latest reports from fishinhole.com</subtitle>
     <link href="http://www.fishinhole.com/reports/syndication.php" rel="self"/> 
     <updated><?php echo date3339(); ?></updated>
     <author> 
          <name>NameOfYourBoss</name>
          <email>nameofyourboss@fishinhole.com</email>
     </author>
     <id>
     tag:fishinhole.com,2008:http://www.fishinhole.com/reports/syndication.php
     </id>

You might immediately notice that the code in Listing 3 doesn't look like PHP. That's because most of it isn't. It's standardized output that requires little in the way of dynamic content.

The <feed> element identifies this XML document as an Atom feed. The namespace used to define the elements is provided as an attribute of the <feed> element. You also use the aforementioned xml:lang attribute to specify that this is a document written in English.

The <title> element specifies a title for the overall feed. Likewise, the <subtitle> element specifies a subtitle for the overall feed.

The <link> element specifies the URL of this syndication.php document. The address in the example works in the fictitious world that is described in this article, but in real life it does not. In reality, you can include a link that produces the output of this feed.

The <updated> element produces a timestamp (compliant with the RFC 3339 standard) that tells the consumer of this feed when it was last updated. In this case, since the feed will always be up to date because it retrieves the latest data from the database, you use the current timestamp. And you may notice that there is a little snippet of PHP code in this element. That is a custom-built PHP function that produces a timestamp in RFC 3339 format.

The <author> element defines the author of the overall feed. You'll be using your boss's name as the author because it was his idea.

Finally, the <id> element uniquely identifies the feed in an Internationalized Resource Identifier (IRI) format.

Listing 4 is the main loop that produces each entry in the Atom feed. The vast majority of the work for producing the feed is done here.

Listing 4. The loop
<?php
     $i = 0;
     while($row = mysql_fetch_array($result))
       {
          if ($i > 0) {
               echo "</entry>";
           }

           $articleDate = $row['posted'];
           $articleDateRfc3339 = date3339(strtotime($articleDate));
           echo "<entry>";
           echo "<title>";
           echo $row['title'];
           echo "</title>";
           echo "<link type='text/html' 
                    href='http://www.fishinhole.com/reports/report.php?
                    id=".$row['id']."'/>";
           echo "<id>";
           echo "tag:fishinhole.com,2008:http:
                    //www.fishinhole.com/reports/report.php?id=".$row['id'];
           echo "</id>";
           echo "<updated>";
           echo $articleDateRfc3339;
           echo "</updated>";
           echo "<author>";
           echo "<name>";
           echo $row['author'];
           echo "</name>";
           echo "</author>"; 
           echo "<summary>";
           echo $row['subtitle'];
           echo "</summary>";

           $i++;
     }			
?>

Once again, Listing 4 covers quite a bit of ground. First, is the while loop. Basically, this part of the code says, in English, "as long as there are rows in the table that haven't been included in the output yet, keep going." The current row in each iteration is stored in a PHP variable intuitively called $row.

Then the counter ($i) is checked. If the counter is more than 0, then that means this is at least the second iteration. In that case, it is necessary to close the previous iteration's <entry> element.

The next two lines retrieve the article date (from the POSTED column) and convert it to RFC 3339 format using the aforementioned function.

Next, the <entry> element is started. Following that is the <title> element, which is populated from the TITLE column in the current row.

The <link> element is unusual in that it doesn't contain any child text. Instead, the actual link is referenced as an attribute. This is part of the Atom standard. The link simply points the user to the URL where the user can read the entire article. Recall that this feed provides only a synopsis to the user.

The <id> element is similar to the one that was described previously. It uniquely identifies this element in IRI format. And, as before, it is constructed from the relevant URL.

The <updated> element contains the DATETIME value (in RFC 3339 format) from the POSTED column. Recall that the $articleDateRfc3339 variable for this document was populated earlier in this iteration.

Next comes the <author> element. This element, unlike the others (but like the <author> element in the preamble) has child elements. For this article, only one of those children is used: the author's name. The author's name is populated from the AUTHOR column of the current row.

The <summary> element contains the information gleaned from the SUBTITLE column of the current row.

Finally, the loop counter ($i) is incremented, and the loop continues.

That, in a nutshell, is the entire body of code associated with producing an Atom document from the REPORTS table. As you can see, it's not as complicated as it might seem at first.

Also, keep in mind that many elements in the Atom specification are not covered here. You can just as easily add those by following the same patterns I describe in this section of the code. For more information, see Resources.

Test it!

Now comes the fun part: testing!

Rather than retype (or copy and paste) everything you see in the code listings above, you can simply use the PHP file that is included in the Download section. Copy that file to a local directory and make the necessary database changes that I described earlier (user name, password, and host). Then copy it to a PHP file structure that has access to the database.

When you have the PHP file in the correct place, launch your browser and access your file as follows: http://your host/context/syndication.php.

As with any customized solution, you need to change the values in italics to match your specific environment.

As I stated previously, your results will vary depending upon which browser and version you use. Some of the more modern browsers detect that this is an Atom feed and display the results accordingly. Others display it in raw XML format. Still others might produce nothing because the document is not a standard HTML document.

If the browser does not display the raw XML, you can do so simply by right-clicking on the document and selecting View Source. After you do that, you should see something similar to Listing 5.

Listing 5. The output (abbreviated)
<?xml version='1.0' encoding='iso-8859-1' ?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <title>Fishing Reports</title>
  <subtitle>The latest reports from fishinhole.com</subtitle>
  <link href="http://www.fishinhole.com/reports" rel="self"/>
  <updated>2009-05-03T16:19:54-05:00</updated>
  <author>
   <name>NameOfYourBoss</name>
   <email>nameofyourboss@fishinhole.com</email>
  </author>
  <id>tag:fishinhole.com,2008:http://www.fishinhole.com/reports</id>
  <entry>
   <title>Speckled Trout In Old River</title>
   <link type='text/html' href='http://www.fishinhole.com/reports/report.php?id=4'/>
   <id>tag:fishinhole.com,2008:http://www.fishinhole.com/reports/report.php?id=4</id>
   <updated>2009-05-03T04:59:00-05:00</updated>
   <author>
    <name>ReelHooked</name>
   </author>
   <summary>Limited out by noon</summary>
  </entry>
...
</feed>

Another way to test it is to verify that the feed is valid. You can do that using one of the many Atom feed validators you can find in cyberspace. A good one to use is http://www.feedvalidator.org. That Web site validates feeds in Atom, RSS, and Keyhole Markup Language (KML) formats.

Business Results

Because you implement and deploy your Atom feed, thousands of new enthusiastic sport fishermen from around the world now have exposure to the fishing reports on your Web site. You are getting hundreds of incoming links from sport fishing sites that are embedding your Atom feed. Some enthusiastic sport fishermen are even using feed readers to view the reports on a daily basis.

Your boss pops back into your office after looking at the latest traffic reports. He is pleased with the additional visits and reports that unique visitors have increased by 10%. He gives you a thumbs up, slurps his coffee, and walks away.

Conclusion

The Atom specification is an ideal means of syndicating your Web content. Using PHP with MySQL, you can easily produce a Web feed that complies with the Atom standard and is always up to date because it reads directly from the database. The feed can then be read by a feed reader or embedded in other Web sites. The end result is broader exposure for your Web content, and that means more visitors and, most likely, an increase to your bottom line.


Download

DescriptionNameSize
The PHP file used in this articlesyndication.php3KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=416378
ArticleTitle=Creating an Atom feed in PHP
publish-date=07282009