Skip to main content

Tip: Parsing RDDL documents with PHP

Parse and extract RDDL resource information with PHP

Vikram Vaswani, Founder, Melonfire
Vikram Vaswani is the founder and CEO of Melonfire, a consulting services firm with special expertise in open-source tools and technologies. He is also the author of the books PHP Programming Solutions and How to do Everything with PHP and MySQL.

Summary:  The Resource Directory Description Language (RDDL) lets document authors provide more information about resources used within an XHTML document. Parse these RDDL descriptors with an API in the XML_RDDL package from PEAR, and extract resource information for use in any PHP application.

View more content in this series

Date:  10 Dec 2007
Level:  Intermediate
Activity:  2474 views

Introduction

If you've worked with XML before, you already know that a namespace lets you qualify XML element names by associating them with a particular URI, thus avoiding conflicts between elements which have the same name. Often, however, a single URI is not enough; what's really needed is a way to describe a namespace using multiple resources, including DTDs, XML Schemas, XSL stylesheets and software documents.

That's where RDDL, the Resource Directory Description Language, comes in. As per the language's official Web site, RDDL "provides a package of information about some target...the targets [are] XML Namespaces" (see Resources). RDDL provides document authors with a way to provide users with more information on a particular resource. And helping PHP developers work with this information is XML_RDDL, a package from the PHP Extension and Application Repository (PEAR). The XML_RDDL package provides an API to extract various pieces of information about a resource from an RDDL file, and then use this information in a PHP application. As such, it provides a robust, easy-to-use widget for any PHP/RDDL application.


Installation

The XML_RDDL package is maintained by Stephan Schmidt, and released to the PHP community under a PHP license. The easiest way to install it is with the automated PEAR installer, which should have been included by default with your PHP build. To install it, simply issue the following command at your shell prompt:

shell> pear install XML_RDDL

The PEAR installer will now connect to the PEAR package server, download the package, and install it to the appropriate location on your system. This tip uses XML_RDDL V 0.9.

To install the package by hand, visit its home page, download the source code archive, and manually uncompress the files to the desired location. Note that this manual installation process presupposes some knowledge of PEAR's package organization structure.

XML_RDDL also requires one other PEAR package, the XML_Parser package. You can use the PEAR automated installer to install this package as described previously; alternatively, you can find links to the package from the Resources in this tip.


Understanding RDDL descriptors

To begin, it's necessary to understand the basics of RDDL. Consider Listing 1, which illustrates how you can use RDDL:


Listing 1. Example XHTML document using RDDL
                        
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink" 
xmlns:rddl="http://www.rddl.org/" xml:lang="en">
  <head>
    <title>An Example RDDL Document</title>
  </head>
  <body>
    <h2>An Example RDDL Document</h2>
    <p>Here are some resources:</p>
    <ul>
      <li>
        <rddl:resource xlink:type="simple" 
        xlink:href="http://app.example.domain/example.xsd" 
        xlink:role="http://www.w3.org/2000/10/XMLSchema" 
        xlink:title="Example XML Schema" 
        xlink:arcrole="http://www.rddl.org/purposes#schema-validation">        
          <a href="http://app.example.domain/example.xslt">An Example XML Schema</a>      
        </rddl:resource>
      </li>
      <li>
        <rddl:resource id="dtd" xlink:type="simple" 
        xlink:href="http://app.example.domain/example.dtd" 
        xlink:role="http://www.isi.edu/in-notes/iana/assignments/media-types/
          application/xml-dtd" 
        xlink:title="Example DTD" 
        xlink:arcrole="http://www.rddl.org/purposes#validation">        
          <a href="http://app.example.domain/example.dtd">An Example DTD</a>      
        </rddl:resource>
      </li>
      <li>
        <rddl:resource xlink:type="simple" 
        xlink:href="http://app.example.domain/api.html" 
        xlink:role="http://www.w3.org/1999/xhtml" 
        xlink:title="Example API Reference" 
        xlink:arcrole="http://www.rddl.org/purposes#reference">        
          <a href="http://app.example.domain/api.html">An Example API Reference</a>      
        </rddl:resource>
      </li>
      <li>
        <rddl:resource xlink:type="simple" 
        xlink:href="http://app.example.com/video.mpg" 
        xlink:role="http://www.isi.edu/in-notes/iana/assignments/media-types/video/mpeg" 
        xlink:title="Explanatory Video" 
        xlink:arcrole="http://www.rddl.org/purposes#software-package">        
          <a href="http://app.example.com/video.mpg">Explanatory Video</a>      
        </rddl:resource>
      </li>
      <li>
        <rddl:resource xlink:type="simple" 
        xlink:href="http://app.example.com/video2.mpg" 
        xlink:role="http://www.isi.edu/in-notes/iana/assignments/media-types/video/mpeg" 
        xlink:title="Explanatory Video 2" 
        xlink:arcrole="http://www.rddl.org/purposes#software-package">        
          <a href="http://app.example.com/video2.mpg">Explanatory Video</a>      
        </rddl:resource>
      </li>
    </ul>
  </body>
</html>
        

As Listing 1 illustrates, an RDDL document is a regular XHTML document, with one important addition: the <resource> element, which describes a resource referenced in the document. This <resource> element is a modified XLink, which contains attributes describing the title, target, role and purpose of the resource. The document above lists various resources: a DTD, an XML Schema, an XHTML document and two MPEG media files.

Of the attributes that a <resource> can have, the title and href attributes are self-explanatory: They provide a string description and a URL for the link target respectively. The role and arcrole attributes of the <resource> element are a little more interesting. The role attribute describes the nature of the resource and must be a URI pointing either to the resource's namespace or referencing the resource's MIME type; you can find a list of common natures at http://www.rddl.org/natures/. The arcrole attribute specifies the purpose of the resource, drawn from a list at http://www.rddl.org/purposes/.

Note that the above statements are true as of RDDL 1.0. However, in January 2004, an updated draft of the RDDL specification, RDDL 2.0, was released, which eliminated the <resource> element and its attributes altogether. This version of the specification recommended embedding RDDL information in the standard XHTML <a> element using the new attributes nature and purpose; these became equivalent to the original role and arcrole attributes in the <resource> element. However, the XML_RDDL package does not support RDDL 2.0 and so the examples in this tip are with reference to RDDL 1.0 only.


Accessing RDDL information with PHP

Once you have an XHTML document with RDDL resources defined within it, it's quite easy to use XML_RDDL to access different bits of information from it. Consider Listing 2, which illustrates the process of retrieving a list of all RDDL resources from an XHTML document using PHP:


Listing 2. Parsing RDDL data with PHP
                        
<?php
// include class file
include 'XML/RDDL.php';

// create RDDL parser
// parse RDDL file
$rddl = new XML_RDDL();
$rddl->parseRDDL('example.html');

// print array of resources 
print_r($rddl->getAllResources());
?>
        

Listing 2 uses the XML_RDDL package of PHP to read the XHTML file from Listing 1 and extract all the resources from it. To begin, it reads the XML_RDDL class file, and initializes an instance of the XML_RDDL class. The parseRDDL() method of the class is then used to parse the source file (this can be either a local file or a remote URL). Once the document is parsed, the getAllResources() method returns a list of all the <resource> elements from the document, as a collection of associative arrays.

Listing 3 illustrates a snippet of the output from Listing 2:


Listing 3. The output of Listing 2
                        
Array
(
    [0] => Array
        (
            [lang] => en
            [type] => simple
            [href] => http://app.example.domain/example.xsd
            [role] => http://www.w3.org/2000/10/XMLSchema
            [title] => Example XML Schema
            [arcrole] => http://www.rddl.org/purposes#schema-validation
        )

    [1] => Array
        (
            [lang] => en
            [type] => simple
            [id] => dtd
            [href] => http://app.example.domain/example.dtd
            [role] => http://www.isi.edu/in-notes/iana/assignments/media-types/
              application/xml-dtd
            [title] => Example DTD
            [arcrole] => http://www.rddl.org/purposes#validation
        ) 
        ...
)
        

The foreach() loop in PHP makes it easy to reformat this array for HTML display. Listing 4 illustrates the process, and Figure 1 shows the resulting output:


Listing 4. Formatting RDDL data as a table
                        
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
   "DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title></title>
    <style type="text/css">
    table {
      width:100%;
      border-collapse:collapse;
    }
    td {
      border: solid 1px black; 
      padding: 5px; 
    }
    </style>
  </head>
  <body>
    <h2>Resources</h2>
<?php
// include class file
include 'XML/RDDL.php';

// create RDDL parser
// parse RDDL file
$rddl = new XML_RDDL();
$rddl->parseRDDL('example.html');

// get all resources as array
// format as table
$resources = $rddl->getAllResources();
if (is_array($resources) && count($resources) > 0) {
?>
    <table> 
      <tr>
        <td>Resource</td>   
        <td>Description</td>
        <td>Purpose</td>  
        <td>Role</td>
      </tr>
<?php      
  foreach ($resources as $r) {
    $purpose = explode('#', $r['arcrole']);
?>
      <tr>
        <td><a href="<?php echo $r['href']; ?>"><?php echo $r['href']; ?></a></td>   
        <td><?php echo $r['title']; ?></td>
        <td><?php echo $purpose[1]; ?></td>  
        <td><a href="<?php echo $r['role']; ?>"><?php echo $r['role']; ?></a></td>
      </tr>
<?php    
  }
?> 
    </table>
<?php 
}
?>
  </body>
</html>
        


Figure 1. The Web page created from the RDDL data
Web page created from the RDDL data

Filtering resources by nature or purpose

The getAllResources() method you saw in the previous section returns all resources found in the source file. Often, you need something more subtle: for example, all the resources with purpose validation, or all the resources having a specific nature. The XML_RDDL package includes methods to serve these needs as well. Listing 5 illustrates some of these methods:


Listing 5. Retrieving resource subsets
                        
<html xmlns="http://www.w3.org/1999/xhtml" 
  xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rddl="http://www.rddl.org/" 
  xml:lang="en">
  <head>
    <title>An Example RDDL Document</title>
  </head>
  <body>
    <pre>
<?php
// include class file
include 'XML/RDDL.php';

// create RDDL parser
// parse RDDL file
$rddl = new XML_RDDL();
$rddl->parseRDDL('example.html');

// get resources by nature
// get all DTDs
echo "Resources by nature:\n";
foreach ($rddl->getResourcesByNature('http://www.isi.edu/in-notes/iana/assignments/
  media-types/application/xml-dtd') as $r) {  
  echo $r['href'] . " \n";
}
echo "\n";

// get resources by purpose
// get all software packages
echo "Resources by purpose:\n";
foreach ($rddl->getResourcesByPurpose('http://www.rddl.org/
  purposes#software-package') as $r) {  
  echo $r['href'] . " \n";
}
echo "\n";

// get a specific resource using its id
$dtd = $rddl->getResourceById('dtd');
if (is_array($dtd)) {
  echo "Resource with id 'dtd':\n";
  echo $dtd['href'];
}
?>  
    </pre>
  </body>
</html>
            

Listing 5 illustrates three important methods: getResourcesByNature(), which accepts a particular nature URI and returns all resources with that nature; getResourcesByPurpose(), which returns all resources matching a particular purpose; and getResourceById(), which accepts an ID and returns the resource matching that identifier. These methods are useful when you need to retrieve resources matching specific criteria.

Figure 2 shows the output of Listing 5:


Figure 2. Resource subsets returned by Listing 5
Resource subsets returned by Listing 5

As these examples illustrate, the XML_RDDL package provides a useful PHP-based tool to quickly access specific fragments of information about resources in an XHTML+RDDL document. Try it out the next time you have such a document to process, and see what you think!


Resources

Learn

Get products and technologies

  • The XML_RDDL package: Download an easy-to-use interface to extract RDDL resources from XML documents.

  • The XML_Parser package: Download an XML parser based on PHP's built-in xml extension. This XML parsing class is based on PHP's bundled expat and supports two basic modes of operation: func and event.

  • IBM trial software: Build your next development project with trial software available for download directly from developerWorks.

Discuss

About the author

Vikram Vaswani is the founder and CEO of Melonfire, a consulting services firm with special expertise in open-source tools and technologies. He is also the author of the books PHP Programming Solutions and How to do Everything with PHP and MySQL.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=272818
ArticleTitle=Tip: Parsing RDDL documents with PHP
publish-date=12102007
author1-email=vikram.melonfire@gmail.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers