XML for PHP developers, Part 3
Advanced techniques to read, manipulate, and write XML
Add XSLT to DOM and SimpleXML APIs
Content series:
This content is part # of # in the series: XML for PHP developers, Part 3
This content is part of the series:XML for PHP developers, Part 3
Stay tuned for additional content in this series.
PHP5 offers the developer a lot more muscle to work with XML. New and modified extensions such as the DOM, SimpleXML, and XSL make working with XML less code intensive. In PHP5, the DOM is compliant with the W3C standard. Most importantly, the interoperability among these extensions is significant, providing additional functionality, like swapping formats to extend usability, W3C's XPath, and more, across the board. Here you will look at input and output options, and you will depend on the Yahoo Web Services REST protocol interface to provide a more sophisticated showcase for the functionality of the now familiar DOM and SimpleXML extensions and conclude with the XSL extension.
Previously in this series
The first article of this series provided essential information on XML. It focused on quick start Application Programming Interfaces (APIs) and demonstrated how SimpleXML, when combined with the Document Object Model (DOM) as necessary, is the ideal choice for if you work with straightforward, predictable, and relatively basic XML documents. Part 2 looked at the breadth of parsing APIs available for XML in PHP5, including SimpleXML, the DOM, Simple API for XML (SAX), and XMLReader and considered which parsing techniques were most appropriate for different sizes and complexities of XML documents.
XML in PHP5
Extensible Markup Language (XML), described as both a markup language and a text-based data storage format, offers a text-based means to apply and describe a tree-based structure to information. Here you'll look at XML in the context of Web services, probably one of the most important factors driving the recent growth of XML outside the enterprise world.
In PHP5, there are totally new and entirely rewritten extensions for manipulating XML, all based on the same libxml2 code. This common base provides interoperability between these extensions that extends the functionality of each. The tree-based parsers include SimpleXML, the DOM, and the XSLT processor. If you are familiar with the DOM from other languages, you will have an easier time coding with similar functionality in PHP than before. The stream-based parsers include the Simple API for XML (SAX) and XMLReader. SAX functions the same way it did in PHP4.
Manipulating XML using the DOM
You can use to manipulate an XML file. Using the DOM is efficient only when the XML file is relatively small. The advantages to using this method are the solid standard of the familiar W3C DOM, its methods, and the flexibility it brings to coding. The disadvantages of the DOM are the difficulty in coding and performance issues with large documents.
The DOM in action
With the DOM, you can build, modify, query, validate and transform XML documents. All DOM methods and properties can be used, and most DOM level 2 methods are implemented with properties properly supported. Documents parsed with the DOM can be as complex as they come thanks to its tremendous flexibility. Remember however, that flexibility comes at a price if you load a large XML document into memory all at once.
The examples in this article use Yahoo's search API, PHP5, and REpresentational State Transfer (REST) to illustrate the use of the DOM in an interesting application environment. Yahoo chose REST because of a common belief among developers that REST offers 80% of SOAP's benefits at 20% of the cost. I chose this application to showcase PHP/XML because the popularity of Web services is probably one of the most important factors driving the recent growth of XML outside the enterprise world.
Typically, REST forms a request by beginning with a service entry URL and then appending search parameters in the form of a query string. Then Listing 1 parses the results of the query using the DOM extension.
Listing 1. The Yahoo Demo code sample using the DOM
<?php //This query does a search for any Web pages relevant to "XML Query" $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?". "query=%5C%22XML%20Query%5C%22&appid=YahooDemo"; //Create the DOM Document object from the XML returned by the query $xml = file_get_contents($query); $dom = new DOMDocument; $dom = DOMDocument::loadXML($xml); function xml_to_result($dom) { //This function takes the XML document and maps it to a //PHP object so that you can manipulate it later. //First, retrieve the root element for the document $root = $dom->firstChild; //Next, loop through each of its attributes foreach($root->attributes as $attr) { $res[$attr->name] = $attr->value; } //Now, loop through each of the children of the root element //and treat each appropriately. //Start with the first child node. (The counter, i, is for //tracking results. $node = $root->firstChild; $i = 0; //Now keep looping through as long as there is a node to work //with. (At the bottom of the loop, the code moves to the next //sibling, so when it runs out of siblings, the routine stops. while($node) { //For each node, check to see whether it's a Result element or //one of the informational elements at the start of the document. switch($node->nodeName) { //Result elements need more analysis. case 'Result': //Add each child node of the Result to the result object, //again starting with the first child. $subnode = $node->firstChild; while($subnode) { //Some of these nodes just are just whitespace, which does //not have children. if ($subnode->hasChildNodes()){ //If it does have children, get a NodeList of them, and //loop through it. $subnodes = $subnode->childNodes; foreach($subnodes as $n) { //Again check for children, adding them directly or //indirectly as appropriate. if($n->hasChildNodes()) { foreach($n->childNodes as $cn){ $res[$i][$subnode->nodeName][$n->nodeName]= trim($cn->nodeValue); } } else { $res[$i][$subnode->nodeName]=trim($n->nodeValue); } } } //Move on to the next subnode. $subnode = $subnode->nextSibling; } $i++; break; //Other elements are just added to the result object. default: $res[$node->nodeName] = trim($node->nodeValue); break; } //Move on to the next Result of informational element $node = $node->nextSibling; } return $res; } //First, convert the XML to a DOM object you can manipulate. $res = xml_to_result($dom); //Use one of those "informational" elements to display the total //number of results for the query. echo "<p>The query returns ".$res["totalResultsAvailable"]. " total results The first 10 are as follows:</p>"; //Now loop through each of the actual results. for($i=0; $i<$res['totalResultsReturned']; $i++) { echo "<a href='".$res[$i]['ClickUrl']."'><b>". $res[$i]['Title']."</b></a>: "; echo $res[$i]['Summary']; echo "<br /><br />"; } ?>
Manipulating XML using SimpleXML
The SimpleXML extension is a tool of choice for manipulating an XML document, provided that the XML document isn't too complicated or too deep, and contains no mixed content. SimpleXML is easier to code than the DOM, as its name implies. It is far more intuitive if you work with a known document structure. Greatly increasing the flexibility of the DOM and SimpleXML the interoperative nature of the libXML2 architecture allows imports to swap formats from DOM to SimpleXML and back at will.
SimpleXML in action
Documents manipulated with SimpleXML simple and quick to code. The following code parses the results of the query using the SimpleXML extension. As you might expect, the following SimpleXML code (see Listing 2) is more compact than the DOM code example shown above in Listing 1.
Listing 2. The Yahoo SimpleXML example
<?php //This query does a search for any Web pages relevant to "XML Query" $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?". "query=%5C%22XML%20Query%5C%22&appid=YahooDemo"; $xml = simplexml_load_file($query); // Load up the root element attributes foreach($xml->attributes() as $name=>$attr) { $res[$name]=$attr; } //Use one of those "informational" elements to display the total //number of results for the query. echo "<p>The query returns ".$res["totalResultsAvailable"]. " total results The first 10 are as follows:</p>"; //Unlike with DOM, where we loaded the entire document into the //result object, with SimpleXML, we get back an object in the //first place, so we can just use the number of results returned //to loop through the Result members. for($i=0; $i<$res['totalResultsReturned']; $i++) { //The object represents each piece of data as a member variable //rather than an array element, so the syntax is a little bit //different from the DOM version. $thisResult = $xml->Result[$i]; echo "<a href='".$thisResult->ClickUrl."'><b>". $thisResult->Title."</b></a>: "; echo $thisResult->Summary; echo "<br /><br />"; } ?>
Listing 3 adds a cache layer to the SimpleXML example from Listing 2. The cache caches the results of any particular query for two hours.
Listing 3. The Yahoo SimpleXML example with a cache layer
<?php //This query does a search for any Web pages relevant to "XML Query" $query = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?". "query=%5C%22XML%20Query%5C%22&appid=YahooDemo"; //The cached material should only last for 2 hours, so you need the //current time. $currentTime = microtime(true); //This is where I put my tempfile; you can store yours in a more //convenient location. $cache = 'c:\temp\yws_'.md5($query); //First check for an existing version of the time, and then check //to see whether or not it's expired. if(file_exists($cache) && filemtime($cache) > (time()-7200)) { //If there's a valid cache file, load its data. $data = file_get_contents($cache); } else { //If there's no valid cache file, grab a live version of the //data and save it to a temporary file. Once the file is complete, //copy it to a permanent file. (This prevents concurrency issues.) $data = file_get_contents($query); $tempName = tempnam('c:\temp','YWS'); file_put_contents($tempName, $data); rename($tempName, $cache); } //Wherever the data came from, load it into a SimpleXML object. $xml = simplexml_load_string($data); //From here, the rest of the file is the same. // Load up the root element attributes foreach($xml->attributes() as $name=>$attr) { $res[$name]=$attr; } ...
Manipulating XML using XSL
EXtensible Stylesheet Language (XSL) is a functional XML language that was created for the task of manipulating XML documents. Using XSL, you can transform an XML document into a redefined XML document, an XHTML document, an HTML document, or a text document based on a stylesheet definition similar to the way CSS works by implementing rules. PHP5's implementation of the W3C standard supports interoperability with the DOM and XPath. EXtensible Stylesheet Language Transformations (XSLT) is an XML extension based on libxml2, and its stylesheets are XML documents. XSLT transforms an XML source tree into an XML or XML-type result tree. These transformations apply the series of rules specified in the stylesheet to the XML data. XSLT can add or remove elements or attributes to or from the output file. It allows the developer to sort or rearrange elements and make decisions about what elements to hide or display. Different stylesheets allow for your XML to be displayed appropriately for different media, such as screen display versus print display. XSLT uses XPath to navigate through the original XML document. The XSLT transformation model usually involves a source XML file, an XSLT file containing one or more processing templates, and an XSLT processor. XSLT documents have to be loaded using the DOM. PHP5 supports only the libxslt processor.
XSL in action
An interesting application of XSL is to create XML files on the fly to contain whatever data has just been selected from the database. Using this technique, it is possible to create complete Web applications where the PHP scripts are made up of XML files from database queries, then use XSL transformations to generate the actual HTML documents.
This method completely splits the presentation layer from the business layer so that you can maintain either of these layers independently of the other.
Listing 4 illustrates the relationship between the XML input file, the XSL stylesheet, the XSLT processor, and multiple possible outputs.
Listing 4. XML transformation
<?php // Create new XSLTProcessor $xslt = new XSLTProcessor(); //Both the source document and the stylesheet must be //DOMDocuments, but the result can be a DOMDocument, //a file, or even a String. // Load the XSLT stylesheet $xsl = new DOMDocument(); $xsl->load('recipe.xsl'); // Load the stylesheet into the processor $xslt->importStylesheet($xsl); // Load XML input file $xml = new DOMDocument(); $xml->load('recipe.xml'); //Now choose an output method and transform to it: // Transform to a string $results = $xslt->transformToXML($xml); echo "String version:"; echo htmlentities($results); // Transform to DOM object $results = $xslt->transformToDoc($xml); echo "The root of the DOM Document is "; echo $results->documentElement->nodeName; // Transform to a file $results = $xslt->transformToURI($xml, 'results.txt'); ?>
Summary
The earlier parts of this series focused on the use of the Document Object Model and on SimpleXML to perform both simple and complex parsing tasks. Part 2 also looked at the use of XMLReader, which provides a faster easier way to perform tasks that one would previously do using SAX.
Now, in this article, you saw how to access remote files such as REST-based Web services, and how to use XSLT to easily output XML data to a string, DOM Document object, or file.
Downloadable resources
Related topics
- XML for PHP developers, Part 1: The 15-minute PHP-with-XML starter (Cliff Morgan, developerWorks, February 2007): In the first article of this three-part series, discover PHP5's XML implementation and how easy it is to work with XML in a PHP environment.
- XML for PHP developers, Part 2: Advanced XML parsing techniques (Cliff Morgan, developerWorks, March 2007): In Part 2 of this three-part series, explore XML parsing techniques in PHP5, and learn how to decide which parsing method is best for your app.
- Tip: Use Language specific tools for XML processing (Uche Ogbuji, developerWorks, January 2004): Try these alternatives to SAX and DOM when you parse XML.
- Intuition and Binary XML (Leigh Dodds, XML.com, April 2001): Read about the debate concerning binary encoded alternatives to XML.
- What kind of language is XSLT (Michael Kay, developerWorks, April 2005): Put XSLT in context as you learn where the language comes from, what it's good at, and why you should use it.
- Tip: Implement XMLReader: An interface for XML converters (Benoît Marchal, developerWorks, November 2003): Explore APIs for XML pipelines.
- Reading and writing the XML DOM in PHP (Jack Herrington, developerWorks, December 2005): Explore three methods to read XML: the DOM library, the SAX parser, and regular expressions. Also, look at how to write XML using DOM and PHP text templating.
- SimpleXML Processing with PHP (Elliotte Rusty Harold, developerWorks, October 2006): Try the SimpleXML extension and enable your PHP pages to query, search, modify, and republish XML.
- Introducing Simple XML in PHP5 ( Alejandro Gervasio, Dev Shed, June 2006): In the first of a three-part article series on SimpleXML, save work with the basics of the simplexml extension in PHP 5, a library that primarily focuses on parsing simple XML files.
- PHP Cookbook, Second Edition (Adam Trachtenberg and David Sklar, O'Reilly Media, August 2006): Learn to build dynamic Web applications that work on any Web browser.
- XML.com: Visit O'Reilly's XML site for comprehensive coverage of the XML world.
- W3C XML Information: Read the XML specification from the source.
- PHP development home site: Learn more about this widely-used general-purpose scripting language that is especially suited for Web development.
- Visit PEAR: PHP Extension and Application Repository: Get more information on PEAR, a framework and distribution system for reusable PHP components.
- PECL: PHP Extension Community Library: Visit the sister site to PEAR and repository for PHP Extensions.
- Planet PHP: Visit the PHP developer community news source.
- xmllib2: Get the the XML C parser and toolkit of Gnome.
- IBM certification: Find out how you can become an IBM-Certified Developer.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM trial software: Build your next development project with trial software available for download directly from developerWorks.