Skip to main content

skip to main content

developerWorks  >  Open source  >

Streamline working with XML in PHP using Service Data Objects

Explore SDO by building a simple blog and RSS feed

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


My developerWorks needs you!

Connect to your technical community


Rate this page

Help us improve this content


Level: Intermediate

Matthew Peters (matthew_peters@uk.ibm.com), Software Engineer, IBM U.K. Laboratories
Caroline Maynard (caroline.maynard@uk.ibm.com), Software Engineer, IBM U.K. Laboratories
Graham Charters (charters@uk.ibm.com), Senior Software Engineer, IBM U.K. Laboratories

12 May 2006

Most PHP programmers will know that much of the function they use resides in PHP extensions, which usually either come packaged with their PHP distribution or can be downloaded from the PECL site. One such extension supports Service Data Objects (SDO) for PHP, which in February moved from a beta-level 0.9.0 release to a stable 1.0. Written by some of the original developers of the SDO extension, this article is aimed at the PHP programmer who wants to understand what SDO for PHP is, how it can be used, and how it can streamline working with XML.

Looking at SDOs and their associated interface, you should get a clear idea of the API the SDO extension provides. We then move on to show a working example of using SDOs in a two-part application comprising a small PHP application to implement a simple Web log (blog) and a part that displays that blog as an RSS feed. Both parts use SDOs as a way of working with XML. We hope you will agree that SDO is an attractive option for working with XML data in PHP.

An introduction to SDO

Let's begin by pulling apart the term Service Data Object (SDO) to see how the name came about.

SDOs are PHP V5 objects. Unlike ordinary PHP V5 objects, SDOs are intended only to carry data and not to have application methods or functionality defined on them. Hence, they are Data Objects. They were devised as a way of making data available to an application program while making the format independent of its original source, so the data would be structured and manipulated in the same way regardless of whether it came from a relational database or XML. In some loose way, this made them Service Data Objects. Today we think of them as useful in service-oriented applications: When data with a complex structure needs to be exchanged between two components in a service-oriented application, SDOs are a good way to do it.

SDOs are a generalization of the Data Transfer Object pattern. If you search for "Data Transfer Object" online, you will quickly come across the definition in Martin Fowler's Patterns of Enterprise Application Architecture: "An object that carries data between processes in order to reduce the number of method calls." A Data Transfer Object is a way to package a collection of values in a single object so they can be passed around economically.

SDOs extend the Data Transfer Object pattern in several ways, making something more powerful than the simple pattern. They still have this in common with Data Transfer Objects, though: They are only carriers of data and do not have application methods or business logic defined for them.

There are three important ways in which SDOs add to the basic Data Transfer Object pattern:

  • They come not just as individual data objects but also as collections of objects that can refer to one another. This makes them useful for representing the kind of structured data that when stored would commonly be held in an XML file.
  • A collection of SDOs that has been altered also maintains a record of its original values, enabling certain sorts of optimistic locking algorithms.
  • SDOs are objects that only have meaning when they are in memory or when serialized and in transmission, but data that remains only in memory or in transmission and never makes it out to a back-end store is not always all that useful. The PHP implementation of SDO includes two so-called Data Access Services (DASes), which have the job of getting data from some back-end store and turning it into a graph of SDOs or putting data from a graph of SDOs back to the store again. One DAS is for working with data in relational databases, and one is for working with data in XML files.

Incidentally, the implementation of SDO for PHP is one of a family in that there are also implementations for C++ and for Java™ technology.

Anatomy of an SDO

An SDO is a PHP V5 object. Like any PHP V5 object, an SDO has a set of properties, but unlike a normal PHP V5 object they are just containers for data values and cannot have application methods defined for them. You do not code a class definition for them, nor do you call a constructor to create them.

var_dump()

Leaving aside for a moment the question of how they are created, if we were to use the var_dump() function on an SDO, we would see that it is an instance of SDO_DataObjectImpl, and we would see the property names and their values. Suppose we have an SDO called $author and we use var_dump on it, we might see:


Listing 1. var_dump() of an SDO

object(SDO_DataObjectImpl)#2 (3) {
  ["name"]=>
  string(19) "William Shakespeare"
  ["dob"]=>
  string(28) "April 1564, most likely 23rd"
  ["pob"]=>
  string(33) "Stratford-upon-Avon, Warwickshire"
}

This data object has three properties that are all PHP strings: name, dob (date of birth) and pob (place of birth). SDOs can have data object properties that can be any of the simple PHP scalar types: string, integer, float or Boolean, as well as NULL or a reference to another SDO. The type of any given property is fixed, and if a value of a different type is assigned, it will be converted. For example, assigning an integer value to a string property will cause the integer to be converted to a string.

All SDOs have a PHP type of SDO_DataObjectImpl, but SDOs also have an SDO type name. The var_dump() function shows the contents of the data object, but does not show its SDO type name. To see this, we use the getTypeName() method on the object, which in this case prints like this: Author.

We'll see later where this type name was defined to SDO, as well as how the set of properties was specified.

Print

The PHP print instruction on this object, incidentally, produces the same information as var_dump() in a slightly more compact form, all on one line (which we have split into five for readability).


Listing 2. print() of an SDO

object(SDO_DataObject)#2 (3) {
  name=>"William Shakespeare"; 
  dob=>"April 1564, most likely 23rd"; 
  pob=>"Stratford-upon-Avon, Warwickshire"
}

Setting and getting properties

SDOs support object syntax and associative array syntax, so the properties could have been set with object syntax:

$author->name = 'William Shakespeare';
$author->dob  = 'April 1564, most likely 23rd';
$author->pob  = 'Stratford-upon-Avon, Warwickshire';

or with associative array syntax:

$author['name'] = 'William Shakespeare';
$author['dob']  = 'April 1564, most likely 23rd';
$author['pob']  = 'Stratford-upon-Avon, Warwickshire';

Of course, we can get values from the data object, as well as set them using the same two forms:

$name_of_the_author = $author->name;

or

$name_of_the_author = $author['name'];

Many-valued and single-valued properties

A property can be many-valued or single-valued. If a property is many-valued, var_dump() will show it pointing to an SDO_DataObjectList object, which is a list of the individual values. Here is an example where a works property has been added to the Author type and has been defined to comprise a list of strings:


Listing 3. var_dump() of an SDO with a many-valued property

object(SDO_DataObjectImpl)#2 (4) {
  ["name"]=>
  string(19) "William Shakespeare"
  ["dob"]=>
  string(28) "April 1564, most likely 23rd"
  ["pob"]=>
  string(33) "Stratford-upon-Avon, Warwickshire"
  ["works"]=>
  object(SDO_DataObjectList)#4 (2) {
    [0]=>
    string(17) "The Winter's Tale"
    [1]=>
    string(9) "King Lear"
  }
}

In this case, the more succinct print instruction merely indicates that there is a many-valued property called works, but does not expand the property.


Listing 4. Print of an SDO with a many-valued property

object(SDO_DataObject)#2 (4) {
  name=>"William Shakespeare"; 
  dob=>"April 1564, most likely 23rd"; 
  pob=>"Stratford-upon-Avon, Warwickshire"; 
  works[2]
}

To see the contents of the works property, you would need to print it separately.

Because SDO_DataObjectList implements the PHP ArrayAccess interface, many-valued properties behave much like PHP arrays. For example, we might have added these two works to the list:

$author->works[] = "The Winter's Tale";
$author->works[] = "King Lear";

And if we wanted to iterate through the values and print them, we might do something like this:

foreach ($author->works as $work) {
  print $work . "\n";
}

A note of caution: An SDO_DataObjectList does not behave exactly like a PHP array. For example, unset on an item in a list will cause the indices of all subsequent items to be shuffled up; there are no holes allowed in the set of indices.

Reflection API

We have seen that var_dump() and print show the current set of properties of any given SDO. To get the most complete picture of an SDO -- to see the names and types of properties that have not been set, for instance -- there is a reflection API described in the SDO section of the PHP manual (see Resources).

Connecting SDOs: More complex structures

SDOs are for representing structured data, and a single SDO on its own can only represent so much. SDOs become much more capable when they are connected to form a graph of objects.

SDOs are connected by references from one to another. The reference from one SDO to another is also a property of that SDO. In our previous examples, the properties were all primitives or a list of primitives, but a property may also be a reference to another SDO or a list of references.

Here's a simple example: Once again, we have used var_dump() to dump an SDO called $author, but this time, thename property was defined to be not just a simple string but instead a reference to another SDO. This second SDO represents the author's name with two properties of its own: first and last. When we dump the author object, we see both objects because var_dump() follows the reference from the author object to the name object:


Listing 5. var_dump() of a pair of connected SDOs

object(SDO_DataObjectImpl)#2 (3) {
  ["name"]=>
  object(SDO_DataObjectImpl)#3 (2) {
    ["first"]=>
    string(7) "William"
    ["last"]=>
    string(11) "Shakespeare"
  }
  ["dob"]=>
  string(28) "April 1564, most likely 23rd"
  ["pob"]=>
  string(33) "Stratford-upon-Avon, Warwickshire"
}

Leaving aside the matter of how the two SDOs are created, the second SDO would have been assigned to the name property of the author SDO in the same way as the strings we saw before. The whole sequence might have looked like this:


Listing 6. Assignment of a reference using object syntax

$name->first  = 'William';
$name->last   = 'Shakespeare';

$author->name = $name;	// assign an SDO to the name property of $author
$author->dob  = 'April 1564, most likely 23rd';
$author->pob  = 'Stratford-upon-Avon, Warwickshire';

or like this:


Listing 7. Assignment of a reference using associative array syntax

$name['first']  = 'William';
$name['last']   = 'Shakespeare';

$author['name'] = $name;	// assign an SDO to the name property of $author
$author['dob']  = 'April 1564, most likely 23rd';
$author['pob']  = 'Stratford-upon-Avon, Warwickshire';

where $author and $name are both SDOs.

It is helpful to distinguish clearly the different uses of the word name here and be clear on what is an SDO type name and what is a property name. In a manner that has not been described yet, $author and $name have been created as objects of a certain SDO type. If we call getTypeName() on them, we will see the type names. Let us suppose they are Author and Name. Now, the $author object has a property called name. This has been defined so you can only assign to it objects of SDO type Name. If you attempt to assign an SDO of a different SDO type (another author, perhaps) or a primitive, SDO would throw an exception. There is no type conversion for SDO reference properties. There is inheritance, though: SDO understands an inheritance hierarchy among types, and given a property expecting a given type, you can assign a subtype.

In sum: There is an object called $name, it has an SDO type of Name, and it is assigned to a property of $author called name.

So far so good, but there are two important aspects to SDO reference properties we must describe.

Many- and single-valued reference properties

We saw the first aspect already when we looked at properties containing primitives: Reference properties can be many-valued or single-valued. A many-valued reference property will point to an SDO_DataObjectList object that will be a list of data objects. These data objects will all be of the same type. To illustrate this, suppose that we make the list of the author's works into a many-valued reference to SDOs of type Work, where SDOs of type Work have a title and rough date of composition. Suppose also they keep the single-valued reference to an SDO of type Name. In this case the structure of the graph of SDOs might look like this:


Figure 1. SDOs illustrating many- and single-valued reference properties
SDOs illustrating many- and single-valued reference properties

Containment and noncontainment reference properties

The second important aspect to reference properties is that they can be containment or noncontainment references.

The notion of containment vs. noncontainment is specific to reference properties and needs some explanation. As mentioned, one goal of SDO is that it is possible to represent the sort of structured data usually stored as XML. The basic structure of a well-formed XML document is a hierarchy: There is a single element at the top of the hierarchy, usually called the root element or document element, which typically contains other elements that probably contain other elements in turn. Since any given element is contained within the start and end tag of its containing element, the structure has to form a tree. Let's illustrate this with the author and name example. This is one way the author and name objects on which we used var_dump above might have been represented in XML.


Listing 8. XML instance document illustrating containment

  
<author>
  <name>
    <first>William</first>
    <last>Shakespeare</last>
  </name>
  <dob>April 1564, most likely 23rd</dob>
  <pob>Stratford-upon-Avon, Warwickshire</pob>
</author>


The simple <first> and <last> elements are contained within the <name> element, which is in turn contained within the <author> element. The simple elements <first> and <last> are modeled as primitive (string) properties of the name SDO. The fact that the <name> element is contained within the <author> element is modeled as a containment reference property from the name property of the author SDO to the name SDO.

So, if these inclusion-style relationships are what SDO calls containment references, what is a noncontainment reference? Although not all applications use them, XML also allows a way to express links between elements that are independent of the containment hierarchy, using XML IDs and IDREFs. It is these extra relationships that SDO models as noncontainment references.

To illustrate this, we need a new example and there is one that is used in a number of places in other documents on SDO. The example is that of a company that contains departments, which in turn contain employees. Here is the example expressed as an XML document:


Listing 9. XML instance document illustrating an ID/IDREF relationship (noncontainment)

    
<?xml version="1.0" encoding="UTF-8"?>
<company xmlns="companyNS" 
         xsi:type="CompanyType" 
         xmlns:tns="companyNS" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         name="MegaCorp" 
         employeeOfTheMonth="E0003">
  <departments name="Advanced Technologies" location="NY" number="123">
    <employees name="John Jones" SN="E0001"/>
    <employees name="Jane Doe" SN="E0003"/>
    <employees name="Al Smith" SN="E0004" manager="true"/>
  </departments>
</company>


You will see how the XML models the simple containment hierarchy where a single company element contains a departments element, which in turn contains three employees elements. The departments and employees elements are perhaps unfortunately named; singular rather than plural might have been better. When this document is loaded into memory as a graph of SDOs, the graph will contain five data objects and two data object lists. There will be one company data object, one department data object, and three employee data objects. There will be two list objects -- one for the collection of departments (even though there is only one department, the property is many-valued), and one for the collection of employees. The company object will have a departments property that will point to a list of department objects (containing just one in this case). The department will have an employees property that points to the list of employee objects.

Each data object also has one or more primitive properties; company has a name property derived from the name attribute and has the value 'MegaCorp', the department and employee data objects also have name properties, and so on.

Notice the employeeOfTheMonth attribute on the company element. You will see it contains the serial number of one of the employees. Although we have not shown you the XML schema for this XML document that connects the serial number attribute to the employeeOfTheMonth attribute, this is a use of the XML ID and IDREF. The SN (serial number) is defined as an ID field, and the employeeOfTheMonth as an IDREF. This represents a relationship between these two elements that is independent of the main hierarchy.

We model these ID and IDREF attributes in SDO with noncontainment references. Like ID and IDREF attributes in XML, noncontainment relationships are like an extra layer over the containment hierarchy. It is a rule of SDOs that just as an IDREF cannot refer to an XML element that does not exist somewhere in the document, any SDO in a graph reachable by a noncontainment reference must also be reachable by a containment reference from the root. You cannot point to an SDO only with a noncontainment reference. This aspect of graphs of SDOs, called closure, is checked whenever SDOs are written out to storage by any of the DASes.

Here is a diagram to illustrate the principles. Note how the noncontainment reference is independent of the containment hierarchy. Note that the employee reachable by following the noncontainment relationship is also reachable by following containment references from the root element.


Figure 2. SDOs illustrating containment and noncontainment reference properties
Containment and noncontainment references

Like containment references, noncontainment references can be many- or single-valued.

They are used about as much as XML's ID and IDREFs, which is to say that some applications will use them a lot, and some never.



Back to top


Creating data objects

Now that we have seen how to get and set the properties on a data object and how to connect them, we need to explain how to create data objects.

SDOs are created by calling a data factory object. Before the data factory will create anything, it needs to have defined to it the model -- that is, the set of type names and the properties each type can have. It is this model that constrains the properties types can take, so that, for example, you cannot normally add a property to a data object if that property is not in the model, and you cannot assign a data object of one type to a property that has been specified to take a property of a different type.

In the PHP implementation of SDO, there are three ways you can get a data factory initialized to start creating data objects. The first two involve the use of the supplied DASes, which presuppose you are probably going to be reading from and writing to a relational database or an XML file. In these circumstances, the DASes create and initialize a data factory, but they keep it hidden and provide an interface themselves for creating data objects. The other way to obtain a data factory is to create it yourself, and use the AddType() and AddPropertyToType() methods to define the model, just as the DASes would. Although this is an unconventional thing to do, we show briefly what this looks like, to give you a clear picture of what goes on under the covers in the supplied DASes.

Here is how to define the two types we used in the examples above:

$data_factory = SDO_DAS_DataFactory::getDataFactory();
$data_factory->addType('NAMESPACE', 'Author');
$data_factory->addType('NAMESPACE', 'Name');

You may remember that we illustrated the getTypeName() call on a $author object and saw the name Author coming out. It is a call to addType() like this that will have set this name in the SDO model.

All types exist in a namespace. Here, the namespace is set to NAMESPACE, but it could just as easily have been left blank or null if no namespace were wanted.

Here is how to add the simple string property first to the Name type:


Listing 10. Adding a primitive property to an SDO type

$data_factory->addPropertyToType (
  'NAMESPACE' , 'Name',              	// adding to NAMESPACE:Name ...
  'first',                           	// ... a property called first ...
  SDO_TYPE_NAMESPACE_URI, 'String',  	// ... to take string values ...
  array('many' => false));           	// ... which is single-valued.
  

The first two arguments to this rather ungainly call specify the namespace and name of the type we're adding a property to -- in this case, NAMESPACE:Name. The third argument is the name of the property we're adding -- in this case, first. The fourth and fifth arguments are the namespace and name of the types the new property can take -- in this case, commonj.sdo:String (the value of the constant SDO_TYPE_NAMESPACE_URI is commonj.sdo:; this is the namespace in which primitive types live in the SDO model). The final argument is an associative array that specifies properties, such as single- or many-valued, and for object references (which this is not), whether containment or noncontainment.

After the calls so far, you would be able to call this data factory to create a data object of type Name, and it would have one property called first that expects to be assigned objects of type string. We will illustrate this shortly.

Here is how the containment reference from Author to Name would be specified:


Listing 11. Adding a containment reference property to an SDO type

$data_factory->addPropertyToType (
  'NAMESPACE' , 'Author',           	// adding to NAMESPACE:Author ...
  'name',                           	// ... a property called name ...
  'NAMESPACE', 'Name',              	// to take objects of type NAMESPACE:Name ...
  array('many' => false, 'containment' => true));	// ... single-valued \
  and containment.
  

Here, we are adding a property called name to the Author type and constraining it to refer objects of the Name type. The name property is a single-valued containment reference.

We must stress again that it would be unconventional to define the types and properties to the data factory like this yourself. This is usually done on your behalf by the DASes, which we will meet in the next section.

The other properties we have seen would be added the same way. If we were adding the works property we used in an earlier example, this would be many-valued.

Once the model has been specified to the data factory, we are in a position to create the data objects. The first object we create can only be created by calling the data factory or calling the appropriate method on whichever DAS we are using. Here, we call the data factory to create the top-level object, and we pass the type name and the namespace:

$author = $data_factory->create('NAMESPACE', 'Author');

Once we have one object, there are two ways to create any child data objects. The straightforward way is to call the data factory again to create an object of the appropriate type, then assign it to the property in its parent object.


Listing 12. Creating and assigning second data object with call to data factory

$name = $data_factory->create('NAMESPACE', 'Name');
$author->name = $name;

There is also a shortcut that is more common. Call createDataObject() on an existing object, passing the name of the property to contain the new object.

$name = $author->createDataObject('name');

This call does a look-up in the model to see what sort of data object the property name can point to, creates a data object of that type, assigns that data object to the given property, then returns the created data object. We are effectively using one data object as a data factory for the objects it refers to. Once you have one data object to start with, this is the common way to create others.

Data Access Services

Now let's see how to do what we have just done in the more conventional way using one of the DASes. First, a little background.

The job of an SDO DAS is to load data from a store and turn it into a data graph for the application program to work with, then to write a data graph back out again. Two DASes are provided for the PHP implementation of SDO: one for working with data in XML and one for working with a relational database. They are each described in detail in their respective chapters of the PHP documentation, so we will just give a short introduction here. You will also see several examples of using the XML DAS in our example in the second half of this article.

Both DASes need to start by initializing an SDO model, which they do by calling the addType() and addPropertyToType() methods you saw. The model needs to match the types, properties, and relationships that exist in the source XML or relational database, and this information needs to be specified to the DAS. The way this is specified is quite different between the two DASes, though.

The XML DAS always initializes its model by reading and parsing an XML schema definition file (an XSD file) that corresponds to the XML instance document it is going to load. The rules for mapping from an XML schema to an SDO model are given in the SDO V2.0 specification document, but are essentially that any element that is a complex type will become an SDO type in the model. The containment relationships between elements will be reflected by containment references between SDOs. Simple types like strings end up as primitive properties on the SDOs. Attributes also become properties.

The schema definition file is usually passed to the XML DAS when initialized, using the static create() method on the class, like this:

$xmldas = SDO_DAS_XML::create('author.xsd');

Once the model has been defined to the XML DAS, there are loadFile() and saveFile() methods for loading and saving the XML data from or to a file, and loadString() and saveString() for loading and saving it from and to a string. There is also a createDocument() method that creates a document from scratch without needing to start from a loaded file.

The Relational DAS is initialized quite differently. It uses data supplied by the application program and passed to the DAS in associative arrays when created. One parameter is a collection of associative arrays that describe the database: table names, column names, primary keys, and foreign keys. This is information that in principle could be obtained automatically from the database. Although this is not done now, perhaps a future version of the Relational DAS will do so. The other important information the application must supply defines how the data in the database should be mapped to a data graph, which type should be regarded as the top of the graph, which foreign keys should be interpreted as containment properties, and so on.

Once the Relational DAS has absorbed the model, it can be used to load data from the database into memory as SDOs. You give the Relational DAS a SQL query to execute, and it will issue the query and break the result set into a graph of SDOs. If you then update the objects, perhaps adding to or deleting from the graph, then call applyChanges on the Relational DAS, it will generate the SQL statement necessary to apply the changes back to the database. Much more has been written about the Relational DAS in the online documentation and other developerWorks articles (see Resources).

My half-baked RSS feed

To illustrate the use of SDO and the XML DAS, we will develop a sample application that uses SDO to do all the XML handling. The application is in two parts: a simple blogging application that saves the contents of the blog as an XML file and another part that reads the XML file and republishes it as an RSS feed. You should find that SDO makes working with XML from PHP about as easy as it can be. See Download for the code.

Blogging application

In structure, our blog and our blogging application are simple. The first script puts up an HTML form on which the user will enter a news item with a title and description. Here is the screen as our first entry:


Figure 3. Adding an item to the blog
Adding an item to the blog

When we submit this entry the second script, which is the destination for the form on the first, writes the data to the blog and puts up a screen to verify that the item has been added.


Figure 4. Confirmation
Confirmation

We will look at the scripts, but first, take a look at our blog with just this one entry in it:


Listing 13. Blog with one item

    
<?xml version="1.0" encoding="UTF-8"?>
<blog xsi:type="blog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <item>
    <title>Hello World</title>
    <description>The traditional opening in all fields of computing</description>
    <date>Thu, 6 Apr 2006 15:25:58 BST</date>
    <guid>b787c22d15b34b0eb29305e9ea17d8e9</guid>
    <from_ip>127.0.0.1</from_ip>
  </item>
</blog>


There is one top-level element, <blog>, that contains a number of <item>s. Each element has a title, the main text, a date, and a unique key field called guid (for globally unique ID), which is assigned when the item is created. Although we do not need this for the blog itself, it will be needed later as part of the RSS feed. We assign the guid by creating it as a hash of the date and time the item was entered. To get some idea of who has written to the blog, we also capture the IP address of the originating site.

Since the blog is cumulative and we want to read in a blog like the one above, add an item, and write it out, we use the SDO XML DAS. You will recall that the XML DAS gets its information about the model the SDOs must follow by reading an XML schema file. Accordingly, we must provide one.


Listing 14. XML schema for the blog

        
<?xml version="1.0" encoding="utf-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="blog">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="item" maxOccurs="unbounded">
           <xs:complexType>
             <xs:sequence>
               <xs:element name="title" type="xs:string"/>
               <xs:element name="description" type="xs:string"/>
               <xs:element name="date" type="xs:string"/>
               <xs:element name="guid" type="xs:string"/>
               <xs:element name="from_ip" type="xs:string"/>
             </xs:sequence>
           </xs:complexType>
         </xs:element>
       </xs:sequence>
     </xs:complexType>
   </xs:element>
 </xs:schema>


You should be able to see that this schema file does indeed correspond to the instance document above. Our blog has a document element called <blog>, containing an unbounded sequence of items with title, description, and so on.

Application code

Here is the HTML script that puts up a form to accept the title and description (there is no PHP or SDO needed for this one):


Listing 15. HTML page to capture item for blog

    
<html>
<head>
<title>Add an item to my half-baked blog</title>
</head>

<body>
<p>
<strong>Add an item to my half-baked blog</strong>
<br/>
<br/>

<form method="post" action="additem.php">
  Title:
  <br/>
  <input type="text" size="50" name="title"/>
  <br/>
  Description:
  <br/>
  <textarea rows="5" cols="50" name="description"></textarea>
  <br/>
  <input value="Submit" type="submit"/>
</form>
</p>
</body>
</html>


This links to the second script, additem.php:


Listing 16. PHP script to add item to blog

        
<html>

<head>
<title>An item has been added to my half-baked blog</title>
</head>

<body>

<p><strong>An item has been added to my half-baked blog</strong><br />

<?php
/* initialize the XML DAS and read in the blog */
$xmldas                = SDO_DAS_XML::create('./blog.xsd');

/* read in and parse the XML instance document */
$xmldoc                = $xmldas->loadFile('./blog.xml');
$blog                  = $xmldoc->getRootDataObject();

/* create a new item and copy info from the html form */
$new_item              = $blog->createDataObject('item');
$new_item->title       = $_POST['title'];
$new_item->description = $_POST['description'];
$new_item->date        = date("D\, j M Y G:i:s T");
$new_item->guid        = md5($new_item->date);
$new_item->from_ip     = $_SERVER['REMOTE_ADDR'];
 
/* write the blog back to the file from whence it came */
$xmldas->saveFile($xmldoc,'./blog.xml',2);

echo "Title: "         . $new_item->title;
echo "<br/>";
echo "Description: "   . $new_item->description;
echo "<br/>";
echo "Date: "          . $new_item->date;
?>

</p>
</body>
</html>


This second script shows our first use of SDO and the XML DAS, so it is worth looking at, though there should be few surprises.

The first thing we do is use the SDO_DAS_XML::create() static method to initialize an XML DAS with the schema file containing the description of our blog. The XML DAS will parse the schema file, decide what the SDO model of types and properties looks like, and make calls to addType() and addPropertyToType() on a data factory to initialize it with the SDO model.

The subsequent call to loadFile() then reads in and parses the XML instance document, returning an object to represent the document. Suppose we are adding a second item, and the item we added above with the title Hello World is already saved. Under the covers, loadFile() has made repeated calls to createDataObject() to construct the SDOs. Since we are assuming we are loading a blog with one item in it, the XML DAS will have created one SDO of type blog and one of type item. The blog data object will contain a many-valued containment reference property called item, which points to a list containing the one item. When we call getRootDataObject() on the document objects, we will get the SDO representing the document element, <blog>. If we were to use var_dump() on it, we would see:


Listing 17. The blog as displayed by var_dump()

object(SDO_DataObjectImpl)#9 (1) {
  ["item"]=>
  object(SDO_DataObjectList)#10 (1) {
    [0]=>
    object(SDO_DataObjectImpl)#11 (5) {
      ["title"]=>
      string(11) "Hello World"
      ["description"]=>
      string(50) "The traditional opening in all fields of computing"
      ["date"]=>
      string(28) "Thu, 6 Apr 2006 15:25:58 BST"
      ["guid"]=>
      string(32) "b787c22d15b34b0eb29305e9ea17d8e9"
      ["from_ip"]=>
      string(9) "127.0.0.1"
    }
  }
}

You should find that the correspondence between the instance document, the schema, and the SDO data graph is clear.

Now in the lines that follow, we create a new item and copy in the item title and description entered on the HTML form. At this time, we also get the current date and time, generate the guid, and save the IP address from which the item was entered. Note that the application creates the new item by calling createDataObject() on the blog SDO, which passes the property name item. As explained, this means that underneath the covers, the model is inspected, it is found that the property item is intended to take SDOs of type item (the first is a property name, the second a type name), then a data object of type item is created and assigned to the many-valued containment reference property item in the SDO $blog.

Finally, we call saveFile() to write the blog back to the XML file from whence it came.

Suppose we add a second item with the title "What next?" The blog might now look like this:


Listing 18. Blog with the newly added item

    
<?xml version="1.0" encoding="UTF-8"?>
<blog xsi:type="blog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <item>
    <title>Hello World</title>
    <description>The traditional opening in all fields of computing</description>
    <date>Thu, 6 Apr 2006 15:25:58 BST</date>
    <guid>b787c22d15b34b0eb29305e9ea17d8e9</guid>
    <from_ip>127.0.0.1</from_ip>
  </item>
  <item>
    <title>What next?</title>
    <description>At this point we need a witty remark</description>
    <date>Thu, 6 Apr 2006 15:46:27 BST</date>
    <guid>582befcb0ab4ac24c5b7966529097b5a</guid>
    <from_ip>127.0.0.1</from_ip>
  </item>
</blog>


Introduction to RSS

RSS

No article touching on RSS would be complete without saying something on versions of RSS. There are two main strands of RSS, and they differ substantially from one another. The version numbers may surprise, too: 0.92 and 2.0 belong to one camp, while 1.0 belongs to the other. Many feeds we find on the Web are 2.0, and this is what we will use. If you are interested in knowing more about the history of RSS, we recommend the O'Reilly book Developing Feeds with RSS and Atom, by Ben Hammersley, which starts with a detailed history (see Resources).

Here is the briefest of introductions to RSS. Although terse, it might help to understand the structure of the XML document we are trying to create, especially if you have not looked at the contents of an RSS feed before.

An RSS feed is just an XML document intended to provide a summary of some or all of a Web site's contents. An RSS feed typically contains a list of recent articles available on the site, and for each of them a title, a brief summary, and a link to the main article. For our example, we will stick with this news-oriented paradigm and produce an RSS feed from our blog. But once you see the code that generates the feed, you will quickly appreciate that it would be equally simple to produce a summary like this from almost anything that has a simple structure of a list of items.

It is unusual to view the XML file as it stands. Instead, people use a so-called feed reader, a simple program that reads and formats the feed. There are dozens of feed readers. We tested our application with a small but representative sample of Windows® feed readers. Some have quirks, and they do not always agree exactly on how to interpret a feed, so we stuck to a subset of RSS and used it in a way that all feed readers handled properly. The feed readers we used are all free to download. They were Awasu Personal Edition, SharpReader, Mozilla's Thunderbird, and the Live Bookmarks feature in Mozilla Firefox. Of these, we found Awasu the most useful for developing this application, partly because it will go and reread a feed on demand and partly because it will also display the feed exactly as it originally received it.

Schema file for RSS

To work with an XML document and the XML DAS, you need to start with an XML schema file, which the DAS uses to build the SDO model. There is no officially sanctioned schema file for RSS V2.0, although the specification and sample feeds can be found on the Harvard Law site (see Resources). When you search the Web for RSS20.xsd, you find a schema file for RSS V2.0. However, the one we found was quite long, contained many elements we did not want to use, and did not generate the feed exactly as we wanted. As a result, we wrote a much-simplified schema to define just the parts of RSS we wanted:


Listing 19. A simplified schema for subset of RSS V2.0

        
<?xml version="1.0" encoding="utf-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="rss">
    <xs:complexType>
      <xs:attribute name="version" default="2.0" />
      <xs:sequence>
        <xs:element name="channel" type="channel" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:complexType name="channel">
    <xs:sequence>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="link" type="xs:string"/>
      <xs:element name="description" type="xs:string"/>
      <xs:element name="copyright" type="xs:string"/>
      <xs:element name="language" type="xs:string"/>
      <xs:element name="webMaster" type="xs:string"/>
      <xs:element name="lastBuildDate" type="xs:string"/>
      <xs:element name="pubDate" type="xs:string"/>
      <xs:element name="item" type="item" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="item">
    <xs:sequence>
      <xs:element name="title" type="xs:string"/>
      <xs:element name="link" type="xs:string"/>
      <xs:element name="description" type="xs:string"/>
      <xs:element name="guid" type="guid"/>
      <xs:element name="pubDate" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="guid">
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute name="isPermaLink" use="optional" type="xs:boolean"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

</xs:schema>


If you are not used to looking at XML schema this may be daunting, but it says:

  • A feed consists of a single RSS element.
  • The RSS element contains a number of channel elements.
  • Each channel element contains a number of item elements.
  • The RSS element has a single attribute: version.
  • A channel element has a number of other elements like copyright information that occur only once and the list of items.
  • Each item has a title element, a description, a link that will contain a URL to the referenced article, a publication date, and the guid.

The meaning of the guid is described in the RSS specification. A string should be unique to that article. The string can be an opaque identifier, or it can be a URL pointing to the article, in which case the isPermaLink attribute should be set to true, and the feed reader interprets it as a link. We have chosen to generate an opaque identifier and use the <link> element to point back to the article.

Producing the feed

We will describe the application that produces the RSS feed from the blog later. First, let's look at what the feed would look like if the blog we are producing the feed for has just the two items we added above. With luck, you will be able to see how this structure corresponds to the XML schema we wrote for our RSS feed.


Listing 20. The blog when output as an RSS feed

        
<?xml version="1.0" encoding="UTF-8"?>
<rss xsi:type="rss" \
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0">
  <channel>
    <title>My half-baked feed - RSS/PHP edition</title>
    <link>http://localhost/rss/index.php</link>
    <description>Comment from Yours Truly</description>
    <copyright>All mine!</copyright>
    <language>en-gb</language>
    <webMaster>mfp</webMaster>
    <lastBuildDate>Thu, 6 Apr 2006 15:47:20 BST</lastBuildDate>
    <pubDate>Thu, 6 Apr 2006 15:47:20 BST</pubDate>
    <item>
      <title>Hello World</title>
      <link>http://localhost/rss/showitem.\
      php?id=b787c22d15b34b0eb29305e9ea17d8e9</link>
      <description>The traditional opening in all fields \
      of computing</description>
      <guid isPermaLink="false">b787c22d15b34b0eb29305e9ea17d8e9</guid>
      <pubDate>Thu, 6 Apr 2006 15:25:58 BST</pubDate>
    </item>
    <item>
      <title>What next?</title>
      <link>http://localhost/rss/showitem.\
      php?id=582befcb0ab4ac24c5b7966529097b5a</link>
      <description>At this point we need a witty remark</description>
      <guid isPermaLink="false">582befcb0ab4ac24c5b7966529097b5a</guid>
      <pubDate>Thu, 6 Apr 2006 15:46:27 BST</pubDate>
    </item>
  </channel>
</rss>

This is what this feed will look like when displayed by the Awasu Personal Edition feed reader:


Figure 5. The feed displayed in Awasu
The feed as displayed in Awasu

If you click on the titles Hello World or What next? -- which have the small document icons next to them -- Awasu will follow the URL in the <link> element of the item. We will show this below.

Generating the feed

Now let's see the PHP script that generates this feed:


Listing 21. The PHP script that generates feed from the blog

        
<?php
/* Write out the header to indicate XML follows */
header('Content-type: application/xml');

/* Construct an XML DAS using the schema for RSS */
$rss_xmldas             = SDO_DAS_XML::create('./rss.xsd');

/* Load an XML file that contains a few settings */
$rss_document           = $rss_xmldas->loadFile('./base.xml');
$rss_data_object        = $rss_document->getRootDataObject();

/* Set build and publish date on the channel */
$channel                = $rss_data_object->channel;
$channel->lastBuildDate = date("D\, j M Y G:i:s T");
$channel->pubDate       = date("D\, j M Y G:i:s T");

/* Open and load the blog, using a second XML DAS */             
$blog_xmldas            = SDO_DAS_XML::create('./blog.xsd');
$blog_document          = $blog_xmldas->loadFile('./blog.xml');
$blog_data_object       = $blog_document->getRootDataObject();

/* iterate through the items in the blog and for each one
 * create a corresponding item in the rss feed 
 */
foreach ($blog_data_object->item as $item) { 
    $new_channel_item              = $channel->createDataObject('item');

    $new_channel_item->title       = $item->title;
    $new_channel_item->description = $item->description;
    $new_channel_item->pubDate     = $item->date;
    $new_channel_item->link        = \
    "http://localhost/rss/showitem.php?id=" . $item->guid;

    $guid                          = $new_channel_item->createDataObject('guid');
    $guid->value                   = md5($new_channel_item->pubDate);
    $guid->isPermaLink             = false;
}

print $rss_xmldas->saveString($rss_document,2);
?> 


Our aim here is to construct a data object for the RSS feed, then read in our blog and create a corresponding item data object in the RSS feed for each blog item. We could decide to limit ourselves to only a few recent articles and cut off the list at a certain date, but here, we just copy all the items.

First, we write out the appropriate HTTP header to indicate that an XML document follows. Then we construct an XML DAS using our RSS schema and load an XML file to start us off. This XML file contains a few settings that probably do not change from one generation of the feed to another -- copyright statement, title of the feed, and so on. This is just a tidy place to keep these settings. We could just as easily have hardcoded them as assignments to the properties of the channel data object in the script.

After initializing the properties of the RSS feed in this way, we acquire the RSS data object and set a few more properties -- those relating to the time of generation. You can find out what any of these mean in the RSS V2.0 specification.

The next three statements open and load the blog, and acquire the blog data item from the document. We use a second XML DAS to do this. This is perfectly all right because we now have two DASes, each loaded with a different model. As long as we do not expect one DAS to understand the data objects created by the other, no problems should occur. Incidentally, it might have been possible to load both schema files into the one DAS -- a DAS can manage many schema files -- but that would have meant putting the type names they have in common -- title, description, item, guid -- in separate namespaces. We chose to separate them into separate DASes since we did not want to illustrate the use of namespaces.

Now we have both data graphs loaded in memory. Finishing off the RSS feed is a matter of iterating through the items within the blog, and for each of them, creating a corresponding item within the feed. Note that we cannot copy the items themselves, for not only is the structure of an item in the blog slightly different from that in the feed but we would be copying an item created by one DAS into a graph created by another, which is not allowed.

Once we have all the information in the feed as we want it, the application writes out the entire XML document as a string with saveString(). To format it neatly should it be read by a person, we use saveString() to indent the text by two columns at each level.

Here is the base XML file to load:


Listing 22. Base file that contains a few hardcoded values for the feed

    
<?xml version="1.0" encoding="iso-8859-1"?>
<rss 
  xsi:type="rss" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  version="2.0">
  <channel>
    <title>My half-baked feed - RSS/PHP edition</title>
    <link>http://localhost/rss/index.php</link>
    <description>Comment from Yours Truly</description>
    <copyright>All mine!</copyright>
    <language>en-gb</language>
    <webMaster>mfp</webMaster>
  </channel>
</rss> 


A script to show an individual item

The last piece of the puzzle concerns the <link> element in the feed. The <link> element for each item points to a URL http://localhost/rss/showitem.php?... The <link> element is interpreted by any of the feed readers as a link to go to the real article. In our case, we just go to a simple page that displays what we have on the item. Notice that guid is on the end of the <link> value, so showitem.php knows which item to display. The script opens the blog, extracts the item with the corresponding guid, and formats it. The code follows:


Listing 23. PHP script to show a single given item


<html>

<head>
<title>Show Item</title>
</head>

<body>
<p><strong>My half-baked feed</strong><br />

<?php
/* open and load the blog */
$xmldas    = SDO_DAS_XML::create('./blog.xsd');
$xmldoc    = $xmldas->loadFile('./blog.xml');
$blog      = $xmldoc->getRootDataObject();

/* get the id of the desired item from the URL */
$id        = $_GET['id'];   // id was inserted in the URL by the feed

/* use XPath to find the right item within the blog */
$item      = $blog["item[guid=$id]"]; 

echo "<br/>";
echo "The following item was added on " . $item->date;
echo " from ip address: " . $item->from_ip;
echo "<br/>";
echo "<br/>";
echo "Title: " . $item->title;
echo "<br/>";
echo "Description: " . $item->description;
echo "<br/>";
   
?>
</p>
</body>
</html>


This script creates a DAS with an XSD, loads a document, and gets the root data object, which is a pattern that should by now be familiar. Then there is just one more aspect of SDO not yet described: the use of an XPath-like expression to find the item we want within the data graph. The whole blog will have been loaded into memory as a data graph, and the expression ["item[guid=$id]"] (remember the $id will have been substituted with the actual value of $id by PHP) will be interpreted as an XPath search string. $item will be assigned the item with the correct ID. SDO implements a capable subset of XPath.

Here is what it looks like if we follow the link from the first item, again shown using Awasu:


Figure 6. The first item displayed in Awasu
The first item displayed by Awasu

Getting it running

Perhaps you have been content just to read along, but if you wanted to download the code and get this running on your machine, and perhaps tinker with it, the following tips may help.

First, note that in the PHP script that generates the feed, the link is specified with a full URL. The script expects all the files to be installed in an rss subdirectory of your Web server's document root -- for example, htdocs/rss with Apache.

The various feed readers mentioned do not all implement the same logic with regard to interpreting the guid and link, especially if the link is not specified or if isPermaLink is set to true or is not specified. Under some circumstances, we saw SharpReader take the guid and put it on the end of the address of the site that delivered the feed, in order to construct a link. We also saw Awasu simply put a http:// on the front of the guid. The scheme we ended up with, described above, of a link and a guid with isPermaLink false, works fine with all our feed readers.

Some of the feed readers have what you might call a mailbox mentality, so that if they have once seen an item with a given guid, that item will keep appearing in the view they show of the feed, even if the latest version no longer has that item. Similar confusion can occur if two items end up with the same guid, or the guid of an item changes. It is best not to do that, but when experimenting, these things can happen, and they can be difficult to clear up.

Usually, deleting the feed from the feed reader is enough, but we found that for Thunderbird, the only way to completely clear out the history of a feed was to delete the directories corresponding to the RSS News & Blogs account under Thunderbird's mail directory.
Note: Take utmost care not to delete anything else you really wanted to keep, such as your mail directory.

You can rest assured that at least your RSS feed is being served by checking the Web server's log. If you're working on a UNIX® system, or on a Windows system with MKS toolkit or cygwin installed, you can also tail -f it.

Conclusion

Our aim was to introduce you to Service Data Objects, to show you the API for working with them, and to illustrate that they make a convenient and natural way to work with XML data from PHP, expressing the structured data from an XML document in a natural form. Although we are not deceived into thinking that we have written the most advanced blogging application possible, or the most advanced RSS feed generator, perhaps you found the application interesting.

You will have seen that to use XML with SDO, you do need a XML schema file from which the XML DAS can initialize the model of types and properties, and perhaps you found that off-putting. If you do have a schema, the SDO API polices all assignments and alterations to the data graph, ensuring that any changes you make to the objects and the data graph will form a document consistent with the schema -- a sort of schema validation on the fly. Perhaps you will find this useful.

We asserted that one of the original objectives with SDO was to provide a way of working with structured data that is independent of the source of the data. We have not illustrated the use of the Relational DAS, but we will leave you with the thought that had we wished to do so, we could have written the application above to work with data in a relational database and not in XML. And had we done so, we would have needed to initialize a Relational DAS, rather than an XML DAS, and we would have done that in a different way, but all the rest of the SDO manipulation would have been identical.

We wish you success in using SDO to work with structured data in PHP.




Back to top


Download

DescriptionNameSizeDownload method
PHP scripts, XML, and XSD filesos-php-sdo-download.zip4KBHTTP
Information about download methods


Resources

Learn

Get products and technologies

Discuss


About the authors

Matthew Peters

Matthew Peters works at IBM's development lab in Hursley, England. He has worked in various roles on IBM's CICS® and MQSeries® products, and also spent a number of years working with partners in scientific and technical computing and large-scale parallel processing. In recent years, he worked on the garbage collector in the IBM JVM. He has a degree in mathematics from Queens' College in Cambridge and a master's in software engineering from Oxford University.


Caroline Maynard

Caroline Maynard works at IBM's development lab in Hursley, England, where she has worked in diverse areas, including networking, graphics, and voice. Most recently, she led the development of the IBM Java ORB, which underpins the WebSphere Application Server EJB container. She is interested in the integration of IBM offerings with open source Linux, Apache, MySQL, PHP/Perl/Python (LAMP) technologies. She holds a degree in mathematics from the University of Sussex.


Author1 photo

Graham Charters works at IBM's development lab in Hursley, England. Past roles have included IBM WebSphere® Application Server development, and architecture responsibilities in WebSphere Business Integration, and Adapters. His current interests are in the relationships between open source technologies, such as those of Linux®, Apache, MySQL, PHP/Perl/Python (LAMP) and the WebSphere platform. He holds degrees in computer science, numerical analysis, and machine vision, all from the University of Manchester.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top