XML for PHP developers, Part 3: Advanced techniques to read, manipulate, and write XML

Add XSLT to DOM and SimpleXML APIs

This final article in a three-part series discusses more techniques for reading, manipulating, and writing XML in PHP5. In it, you will focus on the now familiar APIs DOM and SimpleXML in more sophisticated surroundings, and, for the first time in this three-part series, on the XSL extension.

Cliff Morgan (cliffmorgan@webproducer.us), Writer, Freelance

Cliff Morgan is an independent consultant who designs and implements Web applications and Web sites.



13 March 2007

Also available in Chinese Japanese Vietnamese

Introduction

PHP5 offers the developer a lot more muscle to work with XML. New and modified extensions such as the DOM, SimpleXML, and XSL make working with XML less code intensive. In PHP5, the DOM is compliant with the W3C standard. Most importantly, the interoperability among these extensions is significant, providing additional functionality, like swapping formats to extend usability, W3C's XPath, and more, across the board. Here you will look at input and output options, and you will depend on the Yahoo Web Services REST protocol interface to provide a more sophisticated showcase for the functionality of the now familiar DOM and SimpleXML extensions and conclude with the XSL extension.


Previously in this series

The first article of this series provided essential information on XML. It focused on quick start Application Programming Interfaces (APIs) and demonstrated how SimpleXML, when combined with the Document Object Model (DOM) as necessary, is the ideal choice for if you work with straightforward, predictable, and relatively basic XML documents. Part 2 looked at the breadth of parsing APIs available for XML in PHP5, including SimpleXML, the DOM, Simple API for XML (SAX), and XMLReader and considered which parsing techniques were most appropriate for different sizes and complexities of XML documents.

XML in PHP5

Extensible Markup Language (XML), described as both a markup language and a text-based data storage format, offers a text-based means to apply and describe a tree-based structure to information. Here you'll look at XML in the context of Web services, probably one of the most important factors driving the recent growth of XML outside the enterprise world.

In PHP5, there are totally new and entirely rewritten extensions for manipulating XML, all based on the same libxml2 code. This common base provides interoperability between these extensions that extends the functionality of each. The tree-based parsers include SimpleXML, the DOM, and the XSLT processor. If you are familiar with the DOM from other languages, you will have an easier time coding with similar functionality in PHP than before. The stream-based parsers include the Simple API for XML (SAX) and XMLReader. SAX functions the same way it did in PHP4.


Manipulating XML using the DOM

You can use to manipulate an XML file. Using the DOM is efficient only when the XML file is relatively small. The advantages to using this method are the solid standard of the familiar W3C DOM, its methods, and the flexibility it brings to coding. The disadvantages of the DOM are the difficulty in coding and performance issues with large documents.


The DOM in action

With the DOM, you can build, modify, query, validate and transform XML documents. All DOM methods and properties can be used, and most DOM level 2 methods are implemented with properties properly supported. Documents parsed with the DOM can be as complex as they come thanks to its tremendous flexibility. Remember however, that flexibility comes at a price if you load a large XML document into memory all at once.

The examples in this article use Yahoo's search API, PHP5, and REpresentational State Transfer (REST) to illustrate the use of the DOM in an interesting application environment. Yahoo chose REST because of a common belief among developers that REST offers 80% of SOAP's benefits at 20% of the cost. I chose this application to showcase PHP/XML because the popularity of Web services is probably one of the most important factors driving the recent growth of XML outside the enterprise world.

Typically, REST forms a request by beginning with a service entry URL and then appending search parameters in the form of a query string. Then Listing 1 parses the results of the query using the DOM extension.

Listing 1. The Yahoo Demo code sample using the DOM
<?php

//This query does a search for any Web pages relevant to "XML Query"
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
         "query=%5C%22XML%20Query%5C%22&appid=YahooDemo";

//Create the DOM Document object from the XML returned by the query
$xml = file_get_contents($query);
$dom = new DOMDocument;
$dom = DOMDocument::loadXML($xml);

function xml_to_result($dom) {

  //This function takes the XML document and maps it to a
  //PHP object so that you can manipulate it later.

  //First, retrieve the root element for the document
  $root = $dom->firstChild;

  //Next, loop through each of its attributes
  foreach($root->attributes as $attr) {
    $res[$attr->name] = $attr->value;
  }

  //Now, loop through each of the children of the root element
  //and treat each appropriately.

  //Start with the first child node.  (The counter, i, is for
  //tracking results.
  $node = $root->firstChild;
  $i = 0;

  //Now keep looping through as long as there is a node to work
  //with.  (At the bottom of the loop, the code moves to the next
  //sibling, so when it runs out of siblings, the routine stops.
  while($node) {

    //For each node, check to see whether it's a Result element or
    //one of the informational elements at the start of the document.
    switch($node->nodeName) {

      //Result elements need more analysis.
      case 'Result':
        //Add each child node of the Result to the result object,
        //again starting with the first child.
        $subnode = $node->firstChild;
        while($subnode) {

          //Some of these nodes just are just whitespace, which does
          //not have children.
          if ($subnode->hasChildNodes()){

            //If it does have children, get a NodeList of them, and
            //loop through it.
            $subnodes = $subnode->childNodes;
            foreach($subnodes as $n) {

              //Again check for children, adding them directly or
              //indirectly as appropriate.
              if($n->hasChildNodes()) {
                foreach($n->childNodes as $cn){
                   $res[$i][$subnode->nodeName][$n->nodeName]=
                                              trim($cn->nodeValue);
                }
            } else {
                $res[$i][$subnode->nodeName]=trim($n->nodeValue);
              }
            }
          }
          //Move on to the next subnode.
          $subnode = $subnode->nextSibling;
        }
        $i++;
        break;
      //Other elements are just added to the result object.
      default:
        $res[$node->nodeName] = trim($node->nodeValue);
        break;
    }

    //Move on to the next Result of informational element
    $node = $node->nextSibling;
  }
  return $res;
}

//First, convert the XML to a DOM object you can manipulate.
$res = xml_to_result($dom);

//Use one of those "informational" elements to display the total
//number of results for the query.
echo "<p>The query returns ".$res["totalResultsAvailable"].
            " total results  The first 10 are as follows:</p>";

//Now loop through each of the actual results.
for($i=0; $i<$res['totalResultsReturned']; $i++) {

    echo "<a href='".$res[$i]['ClickUrl']."'><b>".
                            $res[$i]['Title']."</b></a>:  ";
    echo $res[$i]['Summary'];

    echo "<br /><br />";
}

?>

Manipulating XML using SimpleXML

The SimpleXML extension is a tool of choice for manipulating an XML document, provided that the XML document isn't too complicated or too deep, and contains no mixed content. SimpleXML is easier to code than the DOM, as its name implies. It is far more intuitive if you work with a known document structure. Greatly increasing the flexibility of the DOM and SimpleXML the interoperative nature of the libXML2 architecture allows imports to swap formats from DOM to SimpleXML and back at will.

SimpleXML in action

Documents manipulated with SimpleXML simple and quick to code. The following code parses the results of the query using the SimpleXML extension. As you might expect, the following SimpleXML code (see Listing 2) is more compact than the DOM code example shown above in Listing 1.

Listing 2. The Yahoo SimpleXML example
<?php

//This query does a search for any Web pages relevant to "XML Query"
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
         "query=%5C%22XML%20Query%5C%22&appid=YahooDemo";

$xml = simplexml_load_file($query);

// Load up the root element attributes
foreach($xml->attributes() as $name=>$attr) {
   $res[$name]=$attr;
}

//Use one of those "informational" elements to display the total
//number of results for the query.
echo "<p>The query returns ".$res["totalResultsAvailable"].
            " total results  The first 10 are as follows:</p>";

//Unlike with DOM, where we loaded the entire document into the
//result object, with SimpleXML, we get back an object in the
//first place, so we can just use the number of results returned
//to loop through the Result members.

for($i=0; $i<$res['totalResultsReturned']; $i++) {

    //The object represents each piece of data as a member variable
    //rather than an array element, so the syntax is a little bit
    //different from the DOM version.

    $thisResult = $xml->Result[$i];

    echo "<a href='".$thisResult->ClickUrl."'><b>".
                       $thisResult->Title."</b></a>:  ";
    echo $thisResult->Summary;

    echo "<br /><br />";
}

?>

Listing 3 adds a cache layer to the SimpleXML example from Listing 2. The cache caches the results of any particular query for two hours.

Listing 3. The Yahoo SimpleXML example with a cache layer
<?php

//This query does a search for any Web pages relevant to "XML Query"
$query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
         "query=%5C%22XML%20Query%5C%22&appid=YahooDemo";

//The cached material should only last for 2 hours, so you need the
//current time.
$currentTime = microtime(true);

//This is where I put my tempfile; you can store yours in a more
//convenient location.
$cache = 'c:\temp\yws_'.md5($query);

//First check for an existing version of the time, and then check
//to see whether or not it's expired.
if(file_exists($cache) &&
          filemtime($cache) > (time()-7200)) {

   //If there's a valid cache file, load its data.
   $data = file_get_contents($cache);
} else {

   //If there's no valid cache file, grab a live version of the
   //data and save it to a temporary file.  Once the file is complete,
   //copy it to a permanent file.  (This prevents concurrency issues.)
   $data = file_get_contents($query);
   $tempName = tempnam('c:\temp','YWS');
   file_put_contents($tempName, $data);
   rename($tempName, $cache);
}

//Wherever the data came from, load it into a SimpleXML object.
$xml = simplexml_load_string($data);

//From here, the rest of the file is the same.

// Load up the root element attributes
foreach($xml->attributes() as $name=>$attr) {
   $res[$name]=$attr;
}

...

Manipulating XML using XSL

EXtensible Stylesheet Language (XSL) is a functional XML language that was created for the task of manipulating XML documents. Using XSL, you can transform an XML document into a redefined XML document, an XHTML document, an HTML document, or a text document based on a stylesheet definition similar to the way CSS works by implementing rules. PHP5's implementation of the W3C standard supports interoperability with the DOM and XPath. EXtensible Stylesheet Language Transformations (XSLT) is an XML extension based on libxml2, and its stylesheets are XML documents. XSLT transforms an XML source tree into an XML or XML-type result tree. These transformations apply the series of rules specified in the stylesheet to the XML data. XSLT can add or remove elements or attributes to or from the output file. It allows the developer to sort or rearrange elements and make decisions about what elements to hide or display. Different stylesheets allow for your XML to be displayed appropriately for different media, such as screen display versus print display. XSLT uses XPath to navigate through the original XML document. The XSLT transformation model usually involves a source XML file, an XSLT file containing one or more processing templates, and an XSLT processor. XSLT documents have to be loaded using the DOM. PHP5 supports only the libxslt processor.

XSL in action

An interesting application of XSL is to create XML files on the fly to contain whatever data has just been selected from the database. Using this technique, it is possible to create complete Web applications where the PHP scripts are made up of XML files from database queries, then use XSL transformations to generate the actual HTML documents.

This method completely splits the presentation layer from the business layer so that you can maintain either of these layers independently of the other.

Listing 4 illustrates the relationship between the XML input file, the XSL stylesheet, the XSLT processor, and multiple possible outputs.

Listing 4. XML transformation
<?php

// Create new XSLTProcessor
$xslt = new XSLTProcessor();

//Both the source document and the stylesheet must be
//DOMDocuments, but the result can be a DOMDocument,
//a file, or even a String.

// Load the XSLT stylesheet
$xsl = new DOMDocument();
$xsl->load('recipe.xsl');

// Load the stylesheet into the processor
$xslt->importStylesheet($xsl);

// Load XML input file
$xml = new DOMDocument();
$xml->load('recipe.xml');

//Now choose an output method and transform to it:

// Transform to a string
$results = $xslt->transformToXML($xml);
echo "String version:";
echo htmlentities($results);

// Transform to DOM object
$results = $xslt->transformToDoc($xml);
echo "The root of the DOM Document is ";
echo $results->documentElement->nodeName;

// Transform to a file
$results = $xslt->transformToURI($xml, 'results.txt');

?>

Summary

The earlier parts of this series focused on the use of the Document Object Model and on SimpleXML to perform both simple and complex parsing tasks. Part 2 also looked at the use of XMLReader, which provides a faster easier way to perform tasks that one would previously do using SAX.

Now, in this article, you saw how to access remote files such as REST-based Web services, and how to use XSLT to easily output XML data to a string, DOM Document object, or file.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=201810
ArticleTitle=XML for PHP developers, Part 3: Advanced techniques to read, manipulate, and write XML
publish-date=03132007