With the XSL module in PHP 5, you can apply XSLT stylesheets to XML documents to transform the XML data into some other type of text document. This document can be another XML structure, HTML, or any other structure, including plain text or even Java and other programming languages. Regardless of the source text or target document structure, you can create all of the programming specific to the problem in XSLT. The PHP code used in this article is only necessary to create an XSLT processor object and apply the transformation.
Since the mechanics of applying XSLT stylesheets to XML in PHP are generally the same, the process can be refactored out of the business-specific code into something more reusable. This article sketches out a lightweight reusable PHP framework, called Butterfly, that processes a chain of XSLT documents (to download Butterfly, Resources). Note: The Butterfly framework described here refers to the project hosted at http://jakemiles.com/butterfly, and is unrelated to any other Java-based Web application framework of the same name. The chain starts with a source XML document (though not necessarily a file), and applies a series of XSLT stylesheets to it until it produces the final document. This functionality is a small subset of the functionality provided by the Apache Cocoon project, a Java-based framework that processes pipelines of XSLT stylesheets to produce a final document.
When the final document is a Web page, one concern when processing XSLT stylesheets is that of performance. For small data documents and simple stylesheets this might not prove to be an issue. For large data sets with thousands of elements, however, applying a series of stylesheets upon each page load can not only slow down the page, but it consumes a lot of memory and processing power on the server.
The solution to the performance problem is simple—cache the result of the XSLT transformation as a static HTML page that the Web server can serve instantly, and only perform the full chain of XSLT transformations when the source document or one of the stylesheets has changed. This caching mechanism is not unique to the specific XML or XSLT involved, and therefore, the framework can handle it generically.
Another potential problem when processing XSLT stylesheets is the source of the XML, which can originate from a file or a SQL database. The framework must be flexible enough to handle multiple data sources.
When you design an object-oriented framework, you have essentially two abstraction tools available: methods and classes. Anything that must be extensible, meaning you want to allow for new kinds of behavior in the future, must be abstracted into a method call or a class. This allows you to swap out at runtime a different class exposing the same methods, taking advantage of polymorphism. If the framework gets cluttered with classes, you can add a factory class that assembles them into common combinations and provides a simpler API for common cases.
A practical approach to a framework is to write the core functionality and then apply more structure to it to make it extensible and simplify the interface. To make the code clearer as it's introduced in this article, first look at two core interfaces used throughout the code, ButterflyDocument and ButterflyXmlDocument in Listing 1.
Listing 1. The core
ButterflyDocument and ButterflyXmlDocument interfaces
interface ButterflyDocument {
function getContents();
function writeContents();
}
interface ButterflyXmlDocument extends ButterflyDocument {
function getDom();
}
|
ButterflyDocument represents a text document whose contents
can be read or written. In Listing 1, getContents() returns its contents as a string, and writeContents() writes the document's contents to a standard output. ButterflyXmlDocument represents an XML document, and extends ButterflyDocument with a getDom() method, which returns a DOMDocument object representing its XML contents.
With those two core interfaces out of the way, look at the core functionality of the framework, the ButterflyTransformer class. This class has the fundamental job of applying an XSLT stylesheet to an XML document (see Listing 2).
Listing 2.
ButterflyTransformer
class ButterflyTransformer {
private $processor;
public function __construct ($xslSource) {
$this->processor = new XSLTProcessor();
$this->processor->importStylesheet ($xslSource->getDom());
}
public function transformToFile ($xmlSource, $filepath) {
$file = fopen ($filepath, "w");
fwrite ($file, $this->processor->transformToXml ($xmlSource->getDom()));
fclose($file);
}
public function transformToString ($xmlSource, $filepath) {
return $this->processor->transformToXml ($xmlSource->getDom());
}
}
|
In Listing 2, ButterflyTransformer takes a ButterflyXmlDocument object, representing the XSLT stylesheet, and uses PHP 5's XSL module to apply it to an XML data document. The constructor creates an XSLTProcessor object and imports the given XSLT stylesheet into it. transformToFile() applies the stylesheet to an XML document, represented by another ButterflyXmlDocument object, and writes the transformed content to the specified file path.
The next level out from ButterflyTransformer is the Butterfly class. This class represents one chain in a series (or, to
borrow from Apache Cocoon parlance, a pipeline) of stylesheet transformations, and
also handles the core caching logic (see Listing 3).
Listing 3. The
Butterfly class
class Butterfly implements ButterflyXmlDocument {
private $transformer;
private $xmlDoc;
private $cache;
public function __construct ($transformer, $xmlDoc, $cache=null) {
$this->transformer = $transformer;
$this->xmlSource = $xmlDoc;
$this->cache = $cache;
}
public function getDom() {
return $this->getTransformedDoc()->getDom();
}
public function getContents() {
return $this->getTransformedDoc()->getContents();
}
public function writeContents() {
$this->getTransformedDoc()->writeContents();
}
protected function getTransformedDoc() {
return ($this->cache != null
? $this->getTransformedCache ()
: $this->getTransformedString ());
}
protected function getTransformedCache () {
if (! $this->cache->isPresent()) {
$this->transformer->transformToFile ($this->xmlSource,
$this->cache->getFilepath());
}
return $this->cache;
}
protected function getTransformedString () {
return new ButterflyXmlString
($this->transformer->transformToString ($xmlDoc));
}
}
|
The constructor in Butterfly takes a ButterflyTransformer object to handle
the core XSLT transformation logic, a ButterflyXmlDocument to represent the source XML, and an optional ButterflyCache object to store and produce
the cached version of the transformed document. Note that Butterfly itself implements
ButterflyXmlDocument, meaning that it exposes getContents(), writeContents(), and getDom() methods. Thus, you can implement a chain of XSLT transformations as a chain of Butterfly objects, each acting as the source object for the next object.
To use Butterfly, the caller calls the constructor with these objects, then calls either getContents() or writeContents(), to obtain the transformed result string or write the transformed result directly to standard output.
getTransformedDoc() and getTransformedCache() handle the caching logic. Recall that the constructor takes the cache object as an optional argument. It leaves the cache optional so that the caller can just create a Butterfly object to apply an XSLT transformation, without worrying about caching or framework configuration. So getTransformedDoc() sees if this Butterfly was provided with a cache object, and, if so, calls getTransformedCache() to handle caching, or otherwise calls getTransformedString(). Note that either method will return a ButterflyDocument object.
The ButterflyTransformer and Butterfly classes contain the core functionality of the framework.
The next step is to implement a ButterflyXmlDocument that
isn't itself a Butterfly, so that the chain of delegation ends somewhere when you call
the writeContents() or getContents() method of a Butterfly. The most common case is that of
applying an XSLT stylesheet to an XML file. To represent this type of document you
create the ButterflyFile class (see Listing
4).
Listing 4. Creating the
ButterflyFile class
class ButterflyFile implements ButterflyDocument {
private $filepath;
public function __construct($filepath) {
$this->filepath = $filepath;
}
public function getFilepath() {
return $this->filepath;
}
public function isPresent() {
return file_exists ($this->filepath);
}
public function getContents() {
return file_get_contents($this->filepath);
}
public function writeContents() {
$file = fopen($this->filepath, "r");
fpassthru($file);
fclose($file);
}
public function delete() {
unlink ($this->filepath);
}
}
|
This is the first step towards a concrete implementation of ButterflyXmlDocument—this class only implements ButterflyDocument, ButterflyXmlDocument's
parent interface, because the framework needs to represent both an XML source document
and a plain text document, given how the result of a transformation might not be XML.
So ButterflyFile provides a standard wrapper class around a
file. You construct it with the full path of the file, and it provides isPresent(), getContents(), and writeContents() methods called by the Butterfly class. isPresent() returns true
if the file specified by the path is actually present on the disk. getContents() returns the contents of the file. writeContents() is called to write the contents of the file to
standard output, which it does by calling the fpassthru() function in PHP.
The call to fpassthru is an important element of the system.
When the final result of an XSLT pipeline is computed, the framework stores a cached version of the result for fast delivery. The call to fpassthru() here is the method providing that fast delivery. The Web server writes the contents of the cached file directly to standard output, which appears in the user's browser.
Since this caching behavior is a significant concept in the system, it makes sense to
create a ButterflyCache class to represent a cached
transformation (see Listing 5).
Listing 5. Creating a
ButterflyCache class
class ButterflyCache extends ButterflyFile {
public function __construct($filepath) {
parent::__construct ($filepath);
}
}
|
ButterflyCache does nothing except extend ButterflyFile, so a natural question is why it exists. It exists
solely to represent a special function of a file in the Butterfly system. Doing so
provides a layer of abstraction around the cached file. If Butterfly cached
files ever need additional behavior, or if a different sort of cache arises later (for
example, one that caches the result in a database), the rest of the framework classes
need not know about it. Only the code creating the cache object needs to know about
the new kind of cache. This sort of abstraction is more useful in Java or other
strongly-typed languages, because the calling code really does have a dependency on
the cache object's type at compile time. In a language like PHP, including this sort
of class is more a matter of making the intention clearer for any developers that
extend the framework later. Code to create a ButterflyCache
instead of just a ButterflyFile clearly creates a file serving to cache the contents of a transformation.
Concrete implementations of ButterflyXmlDocument
In contrast, the ButterflyXmlFile class adds both a concept
and a bit of functionality to the concept and functionality of a plain file (see Listing 6).
Listing 6. The
ButterflyXmlFile class
class ButterflyXmlFile extends ButterflyFile implements
ButterflyXmlDocument {
public static function create ($filepath) {
return new ButterflyXmlFile ($filepath);
}
public function __construct($filepath) {
parent::__construct ($filepath);
}
public function getDom() {
$dom = DOMDocument::load($this->getFilepath());
if (! $dom) {
throw new Exception ("Couldn't load DOM object from filepath " .
$this->getFilepath());
}
return $dom;
}
}
|
Like ButterflyCache, ButterflyXmlFile extends ButterflyFile, because they both represent files on disk, but ButterflyXmlFile also implements the ButterflyXmlDocument interface, therefore implementing a getDom() method that returns its contents as a DOMDocument. This is critical, because PHP's XSLTProcessor object, used in the ButterflyTransformer class, only deals with XML as DOMDocument objects. It also implicitly implements ButterflyXmlDocument's getContents() and writeContents() methods because they are defined in the base ButterflyFile class.
The other type of XML source in the system is a plain XML string. Since this
particular kind of string has special meaning and behavior in the system, it's
encapsulated in the ButterflyXmlString class (see Listing 7).
Listing 7.
ButterflyXmlString class
class ButterflyXmlString implements ButterflyXmlDocument {
protected $xml;
public function __construct($xmlString) {
$this->xml = $xmlString;
}
public function getDom() {
return DOMDocument::load($this->xml);
}
public function getContents() {
return $this->xml;
}
public function writeContents() {
echo ($this->xml);
}
}
|
In Listing 7, ButterflyXmlString wraps an XML
string and provides the ButterflyXmlDocument interface
around it. getContents() returns the string and writeContents() echoes the string to standard output (that is, the browser). getDom() returns a DOMDocument, representing the XML contents of the string, that XSLTProcessor can then use as either the XSLT stylesheet or as the source XML document to be transformed.
Using class abstraction to allow for future extensions
The use of the class as the primary means of abstraction is key to object-oriented framework design. The Butterfly class could just contain a lot of if-statements determining whether the XML source is a string or a file and act accordingly, but this code would be entirely rigid and not allow for extension. The ButterflyXmlDocument interface abstraction allows for future extensions without changes to the existing code. For example, if you want to write a ButterflyXmlSqlSource, that takes a SQL statement and turns the results into XML, you can do so and just add that class to the system, passing that to Butterfly as the XML source object rather than a ButterflyXmlFile or ButterflyXmlString.
Using a factory class to simplify object construction
The Butterfly system will now apply chains of XSLT stylesheet transformations and will provide a caching mechanism for each point in the transformation chain—each Butterfly object represents one XSLT transformation. The problem is that it's not only unclear to the end developer how to construct Butterfly objects (because the developer would have to know about the different classes involved and how to instantiate each one), but doing so can be cumbersome and require that the developer write specific code for each chain of XSLT transformations, or the code that assembles the chain of Butterflies.
The object-oriented solution to such a problem is the factory class. The purpose of
this class is to create other objects, usually based on input criteria. Butterfly's
factory class, ButterflyFactory, assembles Butterfly chains
from a configuration file that looks like Listing 8.
Listing 8. Sample Butterfly configuration file
<butterfly-config>
<chain>
<name>resume</name>
<source type="ButterflyXmlFile">
<arg>resume.xml</arg>
</source>
<xslt file="resume-restructured.xsl"/>
<xslt file="resume-restructured-to-view.xsl"/>
<xslt file="resume-view-to-html.xsl"/>
</chain>
</butterfly-config>
|
The configuration file defines Butterfly transformation chains. Each chain begins with a source and contains any number of XSLT stylesheets to apply in sequence. This example creates a single chain to render a resume XML document, resume.xml, into an HTML page through a sequence of three stylesheet transformations. The configuration is as minimal as possible because the point of a framework is to simplify tasks, so complicating the configuration would defeat that purpose.
The ButterflyFactory class reads this configuration file
format and acts as a factory for Butterfly objects (see Listing 9).
Listing 9.
ButterflyFactory
class ButterflyFactory {
private $cacheDir;
private $chains;
public function __construct ($configFile, $cacheDir) {
$this->cacheDir = $cacheDir;
$config = simplexml_load_file($configFile);
$this->chains = $this->mapChainsByName
(is_array($config->chain) ? $config->chain : array($config->chain));
}
protected function mapChainsByName ($chains) {
$byName = array();
foreach ($chains as $chain) {
$name = (string) $chain->name;
if (isset($byName[$name])) {
throw new Exception ("Two Butterfly chains defined with the same
name: $name");
}
$byName[$name] = $chain;
}
return $byName;
}
protected function getChainByName ($name) {
if (! isset($this->chains[$name])) {
throw new Exception ("ButterflyFactory: no chain with specified
name: $name");
}
return $this->chains[$name];
}
protected function createCacheFactory ($cacheDir) {
return new ButterflyCacheFactory ($cacheDir);
}
protected function createButterflyObject ($transformer, $xmlDoc, $cache) {
return new Butterfly ($transformer, $xmlDoc, $cache);
}
public function createButterfly ($xsltFilepath, $xmlDoc, $cache) {
return $this->createButterflyObject
($this->createTransformerFromFilepath ($xsltFilepath), $xmlDoc,
$cache);
}
protected function createTransformerFromFilepath ($xsltFilepath) {
return new ButterflyTransformer (new ButterflyXmlFile($xsltFilepath));
}
public function getButterfly ($chainName) {
$chain = $this->getChainByName ($chainName);
$source = $this->createChainSource ($chain);
$invalidateCache = false;
for ($i = 0; $i < count($chain->xslt); $i++) {
$xslt = $chain->xslt[$i];
$cache = $this->createButterflyCache
($this->createCacheFilename ((string) $chain->name, $i));
if ($invalidateCache && $cache->isPresent()) {
$cache->delete();
}
$source = $this->createButterfly ((string) $xslt['file'], $source,
$cache);
$invalidateCache = $invalidateCache || ! $cache->isPresent();
}
return $source;
}
protected function createChainSource ($chain) {
if (! isset($chain->source)) {
throw new Exception ("Butterfly chain has no source element:
$chainName");
}
$args = is_array($chain->source->arg)
? $chain->source->arg
: array($chain->source->arg);
$argsAsStrings = array();
foreach ($args as $arg) {
$argsAsStrings[] = (string) $arg;
}
return $this->createSourceFromType ((string) $chain->source['type'],
$argsAsStrings);
}
protected function createCacheFilename ($chainName, $xsltNumber) {
return $this->cacheDir . '/' . $chainName . '_' . $xsltNumber .
'.cache';
}
protected function createSourceFromType ($type, $args) {
return call_user_func_array (array($type, 'create'), $args);
}
protected function createButterflyCache ($cacheFilePath) {
return new ButterflyCache ($cacheFilePath);
}
}
|
A ButterflyFactory is created from the path of the configuration file and the path of the cache directory (the directory to hold all of the cached XSLT transformations). It uses the SimpleXML module to read and parse the XML configuration file into a PHP object, and then creates an associative array mapping each defined chain to its name for easy lookup.
The heart of ButterflyFactory is the getButterfly() method, that takes the name of a defined chain and
returns a ButterflyXmlDocument, which, due to the nature of
the chain definitions, will be a Butterfly object that returns the result of the
transformation. getButterfly() first looks up the chain by
the supplied name (for example, for the resume chain in the sample configuration,
"resume"), and confirms that it contains a <source> element defining the source
XML document object, which can be either a ButterflyXmlFile
or some other ButterflyXmlDocument implementation. The type attribute of the <source> element specifies the class to instantiate as the source document object. In the case of the "resume" chain, its source is a ButterflyXmlFile.
Once getButterfly() has the source object, it then loops
through the <xslt> elements defined in the <chain>, and creates a Butterfly object from each one. Because a Butterfly takes its source object as one of its arguments, the first Butterfly object is given the source object created from the chain's <source> element, but then the Butterfly object itself is assigned to the $source variable, making it the source object of the next Butterfly created in the loop. This repeats until the last one is returned as the Butterfly to be called for the result. In this way, the chain of Butterflies is created, so that calling the returned Butterfly object causes it to call its source object (possibly a Butterfly), which calls its source object, until the real source object is hit, which returns its contents to its calling Butterfly. The calling Butterfly then applies its XSLT stylesheet and returns the result to its caller, which does the same, until the top of the chain returns or writes to standard output the transformed result.
This loop also keeps an $invalidateCache flag up to date, so
that if a Butterfly's cache is deleted, it deletes the caches of the rest of the Butterfly objects above it in the chain. This way, if a cache file is removed at any step of the transformation chain, it has the effect of invalidating all dependent Butterfly cache files, forcing their recreation.
At this point the framework will do a tidy job of handling the mechanics of XSLT
stylesheet transformation, including the creation of XSLT chains and the use of cache
files to eliminate the performance hit of XSLT transformations when used to render
HTML pages. The next steps in the framework design are to provide a more elegant
cache management interface (currently it requires that the user delete the right cache
file) and provide other XML sources like one that derives its contents from a SQL
query. A class, such as ButterflyXmlSqlSource, can become
a lightweight framework itself, because of the configuration that might be required to
specify the OR mapping required to turn the relational SQL results into a hierarchical XML document. Other experiments can include the use of a SQL database for the cache files too, though using disk access and fpassthru() is definitely the best solution in terms of performance.
The Butterfly framework is a lightweight approach to simplifying the use of XSLT in PHP 5, allowing for the use of stylesheet chains and caching for performance. As frameworks go it's pretty simple and straightforward, but such a framework removes the mechanics of XSLT stylesheet application from the PHP code, allowing the developer to focus on the heart of the work, which is the XSLT itself. Explore Butterfly further (see Resources).
Learn
- XSLT 2.0 recommendation, maintained by the World Wide Web Consortium: Read more about the syntax and semantics of XSLT 2.0, a language for transforming XML documents into other XML documents.
- What kind of language is XSLT? (Michael Kay, developerWorks, February 2001, updated April 2005): What kind of a language is XSLT, what is it for, and why was it designed the way it is? Find answers to those questions.
- An XSLT style sheet and an XML dictionary approach to internationalization (Laura Menke, developerWorks, April 2001): Get an example of how XSLT can be useful in real-world problems.
- How an XSLT processor
works (Benoit Marchal, developerWorks, March 2004): Learn more of the theory behind
XSLT and the recursive nature of its coding for faster programming in XSLT.
-
Create multiple files in XSLT 2.0 (Jack Herrington, developerWorks, March 2005): Explore examples of xsl:result-document in action in this developerWorks tip.
- XSLT: Working with XML and HTML (Khun Yee Fung, Addison-Wesley, December 2000): Explore a comprehensive reference and tutorial to XSLT.
- XSLT Functions: Check out the extensive reference from the w3school.com.
- PHP.NET: Dig into the central resource for PHP developers.
- A PHP 5 migration guide (Jack Herrington, developerWorks, September 2006): Learn to migrate code developed in PHP V4 to V5.
-
Learning PHP (Tyler Anderson and Nicholas Chase, developerWorks, June 2005): Read a series of tutorials on learning to program with PHP.
- IBM developerWorks' PHP project
resources: Learn more about PHP and what you can do with it.
- Recommended
PHP reading list (Daniel Krook and Carlos Hoyos, developerWorks, March 2006): Check
out a reading list compiled for programmers and administrators by IBM Web application developers.
- PHP content
Browse all the articles, tutorials, demos in the Open source library on developerWorks.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- The technology
bookstore: Browse for books on these and other technical topics.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- Butterfly - a lightweight XSLT framework written for PHP 5: Explore this small PHP 5 framework that simplifies the use of XSLT in PHP to develop Web pages and quickly deliver results of stylesheet translations.
- IBM
trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- Participate in the discussion forum.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.

Jake Miles is Senior Technical Liaison at Twistage, Inc, a young company providing a full-stack video Web solution to businesses. He has experience with many languages and software technologies, has worked as a professional developer for 10 years, and has been an avid student and tinkerer since he was 10. He also teaches on a volunteer basis, and believes that anyone can learn anything if taught clearly enough.
Comments (Undergoing maintenance)





