Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Create a framework to support XSLT transformation pipelines

Using the Butterfly Framework to process XSLT documents

Jake Miles (jacob.miles@gmail.com), Freelance writer and developer, Twistage Inc.
Photo of Jacob Miles
Jake Miles is Senior Technical Liaison at Twistage, Inc, a young company providing a full-stack video Web solution to businesses. He has experience with many languages and software technologies, has worked as a professional developer for 10 years, and has been an avid student and tinkerer since he was 10. He also teaches on a volunteer basis, and believes that anyone can learn anything if taught clearly enough.

Summary:  Explore the creation of a framework, called Butterfly, that runs in PHP 5 and facilitates the applications of chains of XSLT stylesheets to XML source documents. This provides transparent caching of the transformed results. Inspired by the Java™-based Apache Cocoon project, so named because it houses and manages the transformation of data from one form to another (turning caterpillars into butterflies), this much lighter-weight framework is named Butterfly. With the Butterfly framework, you can set up an XML configuration file to define chains of stylesheet transformations, and then instantiate Butterfly objects that can each produce the result of an XSLT transformation chain. This article will also look at the nature of framework design in general as it sketches out this framework in particular.

Date:  18 Nov 2008
Level:  Intermediate PDF:  A4 and Letter (55KB | 15 pages)Get Adobe® Reader®
Also available in:   Chinese  Russian  Japanese

Activity:  19552 views
Comments:  

Introduction

With the XSL module in PHP 5, you can apply XSLT stylesheets to XML documents to transform the XML data into some other type of text document. This document can be another XML structure, HTML, or any other structure, including plain text or even Java and other programming languages. Regardless of the source text or target document structure, you can create all of the programming specific to the problem in XSLT. The PHP code used in this article is only necessary to create an XSLT processor object and apply the transformation.

Frequently used acronyms

  • API: application programming interface
  • HTML: Hypertext Markup Language
  • PHP: PHP Hypertext Preprocessor
  • XML: Extensible Markup Language
  • XSL: Extensible Stylesheet
  • XSLT: Extensible Stylesheet Transformations

Since the mechanics of applying XSLT stylesheets to XML in PHP are generally the same, the process can be refactored out of the business-specific code into something more reusable. This article sketches out a lightweight reusable PHP framework, called Butterfly, that processes a chain of XSLT documents (to download Butterfly, Resources). Note: The Butterfly framework described here refers to the project hosted at http://jakemiles.com/butterfly, and is unrelated to any other Java-based Web application framework of the same name. The chain starts with a source XML document (though not necessarily a file), and applies a series of XSLT stylesheets to it until it produces the final document. This functionality is a small subset of the functionality provided by the Apache Cocoon project, a Java-based framework that processes pipelines of XSLT stylesheets to produce a final document.

When the final document is a Web page, one concern when processing XSLT stylesheets is that of performance. For small data documents and simple stylesheets this might not prove to be an issue. For large data sets with thousands of elements, however, applying a series of stylesheets upon each page load can not only slow down the page, but it consumes a lot of memory and processing power on the server.

The solution to the performance problem is simple—cache the result of the XSLT transformation as a static HTML page that the Web server can serve instantly, and only perform the full chain of XSLT transformations when the source document or one of the stylesheets has changed. This caching mechanism is not unique to the specific XML or XSLT involved, and therefore, the framework can handle it generically.

Another potential problem when processing XSLT stylesheets is the source of the XML, which can originate from a file or a SQL database. The framework must be flexible enough to handle multiple data sources.


The beginnings of a framework

When you design an object-oriented framework, you have essentially two abstraction tools available: methods and classes. Anything that must be extensible, meaning you want to allow for new kinds of behavior in the future, must be abstracted into a method call or a class. This allows you to swap out at runtime a different class exposing the same methods, taking advantage of polymorphism. If the framework gets cluttered with classes, you can add a factory class that assembles them into common combinations and provides a simpler API for common cases.

A practical approach to a framework is to write the core functionality and then apply more structure to it to make it extensible and simplify the interface. To make the code clearer as it's introduced in this article, first look at two core interfaces used throughout the code, ButterflyDocument and ButterflyXmlDocument in Listing 1.


Listing 1. The core ButterflyDocument and ButterflyXmlDocument interfaces

interface ButterflyDocument {
  function getContents();
  function writeContents();
}

interface ButterflyXmlDocument extends ButterflyDocument {
  function getDom();
}

ButterflyDocument represents a text document whose contents can be read or written. In Listing 1, getContents() returns its contents as a string, and writeContents() writes the document's contents to a standard output. ButterflyXmlDocument represents an XML document, and extends ButterflyDocument with a getDom() method, which returns a DOMDocument object representing its XML contents.

With those two core interfaces out of the way, look at the core functionality of the framework, the ButterflyTransformer class. This class has the fundamental job of applying an XSLT stylesheet to an XML document (see Listing 2).


Listing 2. ButterflyTransformer

class ButterflyTransformer {

  private $processor;

  public function __construct ($xslSource) {
    $this->processor = new XSLTProcessor();
    $this->processor->importStylesheet ($xslSource->getDom());
  }

  public function transformToFile ($xmlSource, $filepath) {
    $file = fopen ($filepath, "w");
    fwrite ($file, $this->processor->transformToXml ($xmlSource->getDom()));
    fclose($file);
  }

  public function transformToString ($xmlSource, $filepath) {
    return $this->processor->transformToXml ($xmlSource->getDom());
  }
}

In Listing 2, ButterflyTransformer takes a ButterflyXmlDocument object, representing the XSLT stylesheet, and uses PHP 5's XSL module to apply it to an XML data document. The constructor creates an XSLTProcessor object and imports the given XSLT stylesheet into it. transformToFile() applies the stylesheet to an XML document, represented by another ButterflyXmlDocument object, and writes the transformed content to the specified file path.


Adding chains and caching

The next level out from ButterflyTransformer is the Butterfly class. This class represents one chain in a series (or, to borrow from Apache Cocoon parlance, a pipeline) of stylesheet transformations, and also handles the core caching logic (see Listing 3).


Listing 3. The Butterfly class

class Butterfly implements ButterflyXmlDocument {
                 
  private $transformer;
  private $xmlDoc;
  private $cache;

  public function __construct ($transformer, $xmlDoc, $cache=null) {
    $this->transformer = $transformer;
    $this->xmlSource = $xmlDoc;
    $this->cache = $cache;
  }

  public function getDom() {
    return $this->getTransformedDoc()->getDom();
  }

  public function getContents() {
    return $this->getTransformedDoc()->getContents();
  }

  public function writeContents() {
    $this->getTransformedDoc()->writeContents();
  }

  protected function getTransformedDoc() {
    return ($this->cache != null
        ? $this->getTransformedCache ()
        : $this->getTransformedString ());
  }
 
  protected function getTransformedCache () {
    if (! $this->cache->isPresent()) {    
      $this->transformer->transformToFile ($this->xmlSource,
                       $this->cache->getFilepath());
    }
    return $this->cache;  
  }

  protected function getTransformedString () {
    return new ButterflyXmlString
      ($this->transformer->transformToString ($xmlDoc));
  }
}

The constructor in Butterfly takes a ButterflyTransformer object to handle the core XSLT transformation logic, a ButterflyXmlDocument to represent the source XML, and an optional ButterflyCache object to store and produce the cached version of the transformed document. Note that Butterfly itself implements ButterflyXmlDocument, meaning that it exposes getContents(), writeContents(), and getDom() methods. Thus, you can implement a chain of XSLT transformations as a chain of Butterfly objects, each acting as the source object for the next object.

To use Butterfly, the caller calls the constructor with these objects, then calls either getContents() or writeContents(), to obtain the transformed result string or write the transformed result directly to standard output.

getTransformedDoc() and getTransformedCache() handle the caching logic. Recall that the constructor takes the cache object as an optional argument. It leaves the cache optional so that the caller can just create a Butterfly object to apply an XSLT transformation, without worrying about caching or framework configuration. So getTransformedDoc() sees if this Butterfly was provided with a cache object, and, if so, calls getTransformedCache() to handle caching, or otherwise calls getTransformedString(). Note that either method will return a ButterflyDocument object.


The source document

The ButterflyTransformer and Butterfly classes contain the core functionality of the framework. The next step is to implement a ButterflyXmlDocument that isn't itself a Butterfly, so that the chain of delegation ends somewhere when you call the writeContents() or getContents() method of a Butterfly. The most common case is that of applying an XSLT stylesheet to an XML file. To represent this type of document you create the ButterflyFile class (see Listing 4).


Listing 4. Creating the ButterflyFile class

class ButterflyFile implements ButterflyDocument {

  private $filepath;

  public function __construct($filepath) {
    $this->filepath = $filepath;
  }

  public function getFilepath() {
    return $this->filepath;
  }

  public function isPresent() {
    return file_exists ($this->filepath);
  }

  public function getContents() {
    return file_get_contents($this->filepath);
  }

  public function writeContents() {
    $file = fopen($this->filepath, "r");
    fpassthru($file);
    fclose($file);
  }

  public function delete() {
    unlink ($this->filepath);
  }
}

This is the first step towards a concrete implementation of ButterflyXmlDocument—this class only implements ButterflyDocument, ButterflyXmlDocument's parent interface, because the framework needs to represent both an XML source document and a plain text document, given how the result of a transformation might not be XML. So ButterflyFile provides a standard wrapper class around a file. You construct it with the full path of the file, and it provides isPresent(), getContents(), and writeContents() methods called by the Butterfly class. isPresent() returns true if the file specified by the path is actually present on the disk. getContents() returns the contents of the file. writeContents() is called to write the contents of the file to standard output, which it does by calling the fpassthru() function in PHP.

The call to fpassthru is an important element of the system. When the final result of an XSLT pipeline is computed, the framework stores a cached version of the result for fast delivery. The call to fpassthru() here is the method providing that fast delivery. The Web server writes the contents of the cached file directly to standard output, which appears in the user's browser.


Representing a cache file

Since this caching behavior is a significant concept in the system, it makes sense to create a ButterflyCache class to represent a cached transformation (see Listing 5).


Listing 5. Creating a ButterflyCache class

class ButterflyCache extends ButterflyFile {

  public function __construct($filepath) {
    parent::__construct ($filepath);
  }  

}

ButterflyCache does nothing except extend ButterflyFile, so a natural question is why it exists. It exists solely to represent a special function of a file in the Butterfly system. Doing so provides a layer of abstraction around the cached file. If Butterfly cached files ever need additional behavior, or if a different sort of cache arises later (for example, one that caches the result in a database), the rest of the framework classes need not know about it. Only the code creating the cache object needs to know about the new kind of cache. This sort of abstraction is more useful in Java or other strongly-typed languages, because the calling code really does have a dependency on the cache object's type at compile time. In a language like PHP, including this sort of class is more a matter of making the intention clearer for any developers that extend the framework later. Code to create a ButterflyCache instead of just a ButterflyFile clearly creates a file serving to cache the contents of a transformation.


Concrete implementations of ButterflyXmlDocument

In contrast, the ButterflyXmlFile class adds both a concept and a bit of functionality to the concept and functionality of a plain file (see Listing 6).


Listing 6. The ButterflyXmlFile class

class ButterflyXmlFile extends ButterflyFile implements 
ButterflyXmlDocument {

  public static function create ($filepath) {
    return new ButterflyXmlFile ($filepath);
  }

  public function __construct($filepath) {
    parent::__construct ($filepath);
  }

  public function getDom() {
    $dom = DOMDocument::load($this->getFilepath());
    if (! $dom) {
      throw new Exception ("Couldn't load DOM object from filepath " . 
$this->getFilepath());
    }
    return $dom;
  }
}

Like ButterflyCache, ButterflyXmlFile extends ButterflyFile, because they both represent files on disk, but ButterflyXmlFile also implements the ButterflyXmlDocument interface, therefore implementing a getDom() method that returns its contents as a DOMDocument. This is critical, because PHP's XSLTProcessor object, used in the ButterflyTransformer class, only deals with XML as DOMDocument objects. It also implicitly implements ButterflyXmlDocument's getContents() and writeContents() methods because they are defined in the base ButterflyFile class.

The other type of XML source in the system is a plain XML string. Since this particular kind of string has special meaning and behavior in the system, it's encapsulated in the ButterflyXmlString class (see Listing 7).


Listing 7. ButterflyXmlString class

class ButterflyXmlString implements ButterflyXmlDocument {

  protected $xml;

  public function __construct($xmlString) {
    $this->xml = $xmlString;
  }

  public function getDom() {
    return DOMDocument::load($this->xml);
  }

  public function getContents() {
    return $this->xml;
  }

  public function writeContents() {
    echo ($this->xml);
  }
}

In Listing 7, ButterflyXmlString wraps an XML string and provides the ButterflyXmlDocument interface around it. getContents() returns the string and writeContents() echoes the string to standard output (that is, the browser). getDom() returns a DOMDocument, representing the XML contents of the string, that XSLTProcessor can then use as either the XSLT stylesheet or as the source XML document to be transformed.


Using class abstraction to allow for future extensions

The use of the class as the primary means of abstraction is key to object-oriented framework design. The Butterfly class could just contain a lot of if-statements determining whether the XML source is a string or a file and act accordingly, but this code would be entirely rigid and not allow for extension. The ButterflyXmlDocument interface abstraction allows for future extensions without changes to the existing code. For example, if you want to write a ButterflyXmlSqlSource, that takes a SQL statement and turns the results into XML, you can do so and just add that class to the system, passing that to Butterfly as the XML source object rather than a ButterflyXmlFile or ButterflyXmlString.


Using a factory class to simplify object construction

The Butterfly system will now apply chains of XSLT stylesheet transformations and will provide a caching mechanism for each point in the transformation chain—each Butterfly object represents one XSLT transformation. The problem is that it's not only unclear to the end developer how to construct Butterfly objects (because the developer would have to know about the different classes involved and how to instantiate each one), but doing so can be cumbersome and require that the developer write specific code for each chain of XSLT transformations, or the code that assembles the chain of Butterflies.

The object-oriented solution to such a problem is the factory class. The purpose of this class is to create other objects, usually based on input criteria. Butterfly's factory class, ButterflyFactory, assembles Butterfly chains from a configuration file that looks like Listing 8.


Listing 8. Sample Butterfly configuration file

<butterfly-config>
  <chain>
    <name>resume</name>
    <source type="ButterflyXmlFile">
      <arg>resume.xml</arg>
    </source>
    <xslt file="resume-restructured.xsl"/>
    <xslt file="resume-restructured-to-view.xsl"/>
    <xslt file="resume-view-to-html.xsl"/>
  </chain>
</butterfly-config>

The configuration file defines Butterfly transformation chains. Each chain begins with a source and contains any number of XSLT stylesheets to apply in sequence. This example creates a single chain to render a resume XML document, resume.xml, into an HTML page through a sequence of three stylesheet transformations. The configuration is as minimal as possible because the point of a framework is to simplify tasks, so complicating the configuration would defeat that purpose.

The ButterflyFactory class reads this configuration file format and acts as a factory for Butterfly objects (see Listing 9).


Listing 9. ButterflyFactory

class ButterflyFactory {

  private $cacheDir;
  private $chains;

  public function __construct ($configFile, $cacheDir) {
    $this->cacheDir = $cacheDir;
    $config = simplexml_load_file($configFile);
    $this->chains = $this->mapChainsByName
       (is_array($config->chain) ? $config->chain : array($config->chain));
  }

  protected function mapChainsByName ($chains) {
    $byName = array();
    foreach ($chains as $chain) {
      $name = (string) $chain->name;
      if (isset($byName[$name])) {
    throw new Exception ("Two Butterfly chains defined with the same 
name: $name");
      }
      $byName[$name] = $chain;
    }
    return $byName;
  }

  protected function getChainByName ($name) {
    if (! isset($this->chains[$name])) {
      throw new Exception ("ButterflyFactory: no chain with specified 
name: $name");
    }
    return $this->chains[$name];
  }


  protected function createCacheFactory ($cacheDir) {
    return new ButterflyCacheFactory ($cacheDir);
  }

  protected function createButterflyObject ($transformer, $xmlDoc, $cache) {
    return new Butterfly ($transformer, $xmlDoc, $cache);
  }

  public function createButterfly ($xsltFilepath, $xmlDoc, $cache) {
    return $this->createButterflyObject
      ($this->createTransformerFromFilepath ($xsltFilepath), $xmlDoc, 
$cache);
  }
 
  protected function createTransformerFromFilepath ($xsltFilepath) {
    return new ButterflyTransformer (new ButterflyXmlFile($xsltFilepath));
  }

  public function getButterfly ($chainName) {

    $chain = $this->getChainByName ($chainName);  

    $source = $this->createChainSource ($chain);

    $invalidateCache = false;
    for ($i = 0; $i < count($chain->xslt); $i++) {
      $xslt = $chain->xslt[$i];

      $cache = $this->createButterflyCache
     ($this->createCacheFilename ((string) $chain->name, $i));

      if ($invalidateCache && $cache->isPresent()) {  
    $cache->delete();
      }

      $source = $this->createButterfly ((string) $xslt['file'], $source, 
$cache);

      $invalidateCache = $invalidateCache || ! $cache->isPresent();
    }

    return $source;
  }

  protected function createChainSource ($chain) {

    if (! isset($chain->source)) {
      throw new Exception ("Butterfly chain has no source element: 
$chainName");
    }
  
    $args = is_array($chain->source->arg)
       ? $chain->source->arg
       : array($chain->source->arg);

    $argsAsStrings = array();
    foreach ($args as $arg) {
      $argsAsStrings[] = (string) $arg;
    }

    return $this->createSourceFromType ((string) $chain->source['type'],
                    $argsAsStrings);
  }

  protected function createCacheFilename ($chainName, $xsltNumber) {
    return $this->cacheDir . '/' . $chainName . '_' . $xsltNumber . 
'.cache';
  }

  protected function createSourceFromType ($type, $args) {
    return call_user_func_array (array($type, 'create'), $args);
  }

  protected function createButterflyCache ($cacheFilePath) {
    return new ButterflyCache ($cacheFilePath);
  }    
}

A ButterflyFactory is created from the path of the configuration file and the path of the cache directory (the directory to hold all of the cached XSLT transformations). It uses the SimpleXML module to read and parse the XML configuration file into a PHP object, and then creates an associative array mapping each defined chain to its name for easy lookup.

The heart of ButterflyFactory is the getButterfly() method, that takes the name of a defined chain and returns a ButterflyXmlDocument, which, due to the nature of the chain definitions, will be a Butterfly object that returns the result of the transformation. getButterfly() first looks up the chain by the supplied name (for example, for the resume chain in the sample configuration, "resume"), and confirms that it contains a <source> element defining the source XML document object, which can be either a ButterflyXmlFile or some other ButterflyXmlDocument implementation. The type attribute of the <source> element specifies the class to instantiate as the source document object. In the case of the "resume" chain, its source is a ButterflyXmlFile.

Once getButterfly() has the source object, it then loops through the <xslt> elements defined in the <chain>, and creates a Butterfly object from each one. Because a Butterfly takes its source object as one of its arguments, the first Butterfly object is given the source object created from the chain's <source> element, but then the Butterfly object itself is assigned to the $source variable, making it the source object of the next Butterfly created in the loop. This repeats until the last one is returned as the Butterfly to be called for the result. In this way, the chain of Butterflies is created, so that calling the returned Butterfly object causes it to call its source object (possibly a Butterfly), which calls its source object, until the real source object is hit, which returns its contents to its calling Butterfly. The calling Butterfly then applies its XSLT stylesheet and returns the result to its caller, which does the same, until the top of the chain returns or writes to standard output the transformed result.

This loop also keeps an $invalidateCache flag up to date, so that if a Butterfly's cache is deleted, it deletes the caches of the rest of the Butterfly objects above it in the chain. This way, if a cache file is removed at any step of the transformation chain, it has the effect of invalidating all dependent Butterfly cache files, forcing their recreation.


Next steps

At this point the framework will do a tidy job of handling the mechanics of XSLT stylesheet transformation, including the creation of XSLT chains and the use of cache files to eliminate the performance hit of XSLT transformations when used to render HTML pages. The next steps in the framework design are to provide a more elegant cache management interface (currently it requires that the user delete the right cache file) and provide other XML sources like one that derives its contents from a SQL query. A class, such as ButterflyXmlSqlSource, can become a lightweight framework itself, because of the configuration that might be required to specify the OR mapping required to turn the relational SQL results into a hierarchical XML document. Other experiments can include the use of a SQL database for the cache files too, though using disk access and fpassthru() is definitely the best solution in terms of performance.


Summary

The Butterfly framework is a lightweight approach to simplifying the use of XSLT in PHP 5, allowing for the use of stylesheet chains and caching for performance. As frameworks go it's pretty simple and straightforward, but such a framework removes the mechanics of XSLT stylesheet application from the PHP code, allowing the developer to focus on the heart of the work, which is the XSLT itself. Explore Butterfly further (see Resources).


Resources

Learn

Get products and technologies

Discuss

About the author

Photo of Jacob Miles

Jake Miles is Senior Technical Liaison at Twistage, Inc, a young company providing a full-stack video Web solution to businesses. He has experience with many languages and software technologies, has worked as a professional developer for 10 years, and has been an avid student and tinkerer since he was 10. He also teaches on a volunteer basis, and believes that anyone can learn anything if taught clearly enough.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=352086
ArticleTitle=Create a framework to support XSLT transformation pipelines
publish-date=11182008
author1-email=jacob.miles@gmail.com
author1-email-cc=dwxed@us.ibm.com