DB2 and Ruby on Rails, Part 2: DB2 and pureXML with Ruby on Rails

Native XML support of pureXML with Ruby on Rails -- a powerful combination for Web application development

XML is one of the most common data interchange formats on the Web today. DB2® support for pureXML™, combined with the parsing and generating XML APIs (REXML) available to Ruby on Rails, provides a powerful combination for Web application development. The native XML support in DB2 data servers brings flexibility for storing semi-structured, hierarchical XML documents alongside relational data, through its hybrid database engine compiler and parsers, for both SQL and XQuery. The second article in the Ruby on Rails series demonstrates how to utilize pureXML in our Team Room example from the first article.

John Chun, DB2 Advanced Support Specialist, IBM, Software Group

John Chun is a specialist of the DB2 Advanced Support team working in the area of application development and tooling. He has worked in the IBM DBT Toronto lab for 7 years resolving DB2 application issues with various languages including Java, C, C++, Perl, REXX, C# and others. John has worked on a number of projects involving the DB2 CLI and OLEDB driver, as well as the .NET data provider. John is a DB2 Certified Solutions Expert and Certified Websphere Administrator.



Christine Law, DB2 Advanced Support Specialist, IBM, Software Group

Christine Law is senior DB2 specialist and an IBM Certified Expert at the IBM Toronto Lab, where she is responsible for resolving DB2 applications problems and defects. She has extensive application development experience on Linux, UNIX and Windows platforms with different programming languages and scripting languages, specializing in JDBC, SQLJ, stored procedures and embedded SQL. Her recent interests includes Open Source technologies such as AJAX and Ruby.



Salvador Ledezma, Staff Software Engineer, IBM

Salvador has been working at IBM since 2002 at the Silicon Valley Labs, in San Jose, CA, initially developing J2EE applications for the DB2 z/OS database product. He currently spends his time working on runtime and tooling technologies for IBM's data server products, including DB2 pureXML.



Alex Pitigoi, Advisory Software Engineer, IBM

Alex Pitigoi is an advisory software engineer at the IBM Toronto Lab. He has worked on various software development projects in the Information Management since 1998, focusing on Web technologies and database administration. Most recently, he drove the development of the SQLModel project, now incorporated into the Eclipse Data Tools Project, as well as the overall architecture for the database administrative Web Tools across multiple IBM data servers. Alex also worked on the DB2 Satellite Administration Center, the IBM Express Runtime, and lead the development of the first set of Web Tools delivered for DB2. His current focus is IBM's data servers enablement for new open source technologies (Ruby, Python, PHP).



07 June 2007

Also available in

Introduction

In Part 1 of the Ruby on Rails series, you used Ruby on Rails and DB2 to build a Team Room that enables registered members to share various text documents, image files, and XML documents. To manage the ever-growing collection of shared documents, you learned how to group your documents by categories. Then, you learned how to add a subscription feature so we can notify our users by email when new documents are added to the document category to which they are subscribed. At the end of the first article, members were able to upload different types of files into our Team Room, where the files are stored in the backend DB2 data server. Now, let's enhance our Team Room further to provide more advanced user features and better access to resources.

Updating Part 1 Team Room application for further development

Step 1. Adding user management features

Let's start with a necessary addition to the user model and its underlaying table persistence, that enables proper authentication: a unique userid string and a hashed password using an SHA algorithm with a pseudo-random seed (salt). You can improve this user model further by adding other user attributes (such as active, rank, and so on), while a few controller and view changes were added to allow registration of new users and secured login. Recall that our Rails project is located in D:\rails\teamroom, so all the paths referenced are relative paths within the D:\rails\teamroom directory.

a) Issue ruby script/generate migration add_user_credentials_columns to start the migration process, to add necessary columns to the USERS table.

b) Edit the db/migrate/008_add_user_credentials_columns.rb file to add the necessary columns (see Listing 1):

Listing 1. Edit 008_add_user_credentials_columns.rb
class AddUserCredentialsColumns < ActiveRecord::Migration
  def self.up
    add_column    :users, :userid,      :string, :limit => 8
    add_column    :users, :hash_passwd, :string
    add_column    :users, :salt,        :string
  end

  def self.down
    remove_column :users, :userid
    remove_column :users, :hash_passwd
    remove_column :users, :salt
  end
end

c) Run rake db:migrate to add these new columns to the USERS table.

Step 2. Making subjects available to multiple user subscriptions

In the Team Room described in the first article, each subject can only belong to one subscription. Remember that each subscription is a collection of subjects to which a user has subscribed. This makes our Team Room rather unreal, because once a user selected a specific subject, the selected subject was no longer available to any other users in our Team Room.

Due to the popularity of our Team Room, many of our members would like to subscribe to multiple subjects. So, it only seems right to honor our members' request and allow multiple subscriptions to the same subject. In the updated Team Room, a user can subscribe to many subjects through a single subscription. To implement this change, follow these steps:

  • Previously there was a one-to-many relationship between SUBSCRIPTIONS and SUBJECTS. But each subject can only belonged to one subscription. To lift this restriction, we need to allow a many-to-many relationship between SUBSCRIPTIONS and SUBJECTS.

    Note 1:

    See the "Normalization" section of the IBM DB2 Database for Linux, UNIX, and Windows Information Center.

  • To ensure the database is normalized (see Note 1), a new table SUBJECTS_SUBSCRIPTIONS will be created to join the SUBJECTS and SUBSCRIPTIONS tables. In choosing the name of the intermediate join table, you may notice we have used the Rails convention: Active Record assumes that this join table's name is the concatenation of the two target table names in alphabetical order, and will contain foreign key pairs linking the two target tables.
  • Existing associations will be modified, and new associations will be added to the SUBJECTS_SUBSCRIPTIONS table, subject and subscription models to reflect the changes.

The SUBJECTS_SUBSCRIPTIONS table contains the following columns:

Table 1. SUBJECTS_SUBSCRIPTIONS table columns and description
Column NameData TypeDescription
SUBSCRIPTION_IDIntegerForeign ID to SUBSCRIPTIONS table
SUBJECT_IDIntegerForeign ID to SUBJECTS table

To do by way of migration, follow these steps:

a) Issue ruby script/generate migration create_subjects_subscriptions_table.

b) Edit the db/migrate/009_create_subjects_subscriptions_table.rb file:

Listing 2. Edit 009_create_subjects_subscriptions_table.rb
class CreateSubjectsSubscriptions < ActiveRecord::Migration
  def self.up
    create_table :subjects_subscriptions,  :id   => false do |t|
      t.column :subscription_id, :integer, :null => false
      t.column :subject_id,      :integer, :null => false
    end  
    remove_column :subjects, :subscription_id
    add_index     :subjects_subscriptions, :subject_id
  end

  def self.down
    drop_table    :subjects_subscriptions
    add_column    :subjects, :subscription_id, :integer
  end
end

c) Run rake db:migrate to create the SUBJECTS_SUBSCRIPTIONS table.

e) Replace the existing association belongs_to :subscription in /app/models/subject.rb file with a new association has_and_belongs_to_many :subscriptions.

f) Replace the existing association has_many: subject in /app/models/subscription.rb file with has_and_belongs_to_many :subjects.

Note 2:

When you migrate back, you must ensure that associations with the database object in each model are changed and reverted accordingly.

Step 3. XML data: Customer information

Our marketing department has collected anonymous customer information in XML format in order to analyze customer shopping habits. The following is an example of data that could be collected by retailers for market research.

Listing 3. Sample XML document
<marketinfo xmlns="http://www.ibm.com/developerworks">
  <sales>
    <customer>
      <address>
        <city>Nashville</city>
        <state>TN</state>
        <zip>46808</zip>
      </address>
      <categories>
        <category type='Toys'>
          <item>
            <SKU>2434901</SKU>
          </item>
          <item>
            <SKU>9043272</SKU>
          </item>
        </category>
        <category type='Video Games'>
          <item>
            <SKU>1915216</SKU>
          </item>          
        </category>
      </categories>
      <last_purchase>2007-05-12</last_purchase>
    </customer>
  </sales>
</marketinfo>

Each customer purchase includes an address within the USA or Canada, a goods category detail including the item SKU number, as well as the last purchase date. The product categories include:

  • Apparel
  • Automotive
  • Baby
  • Books
  • Computers
  • Cosmetics
  • Electronics
  • Garden & Patio
  • Home
  • Jewelry
  • Movies
  • Music
  • Pets
  • Pharmacy
  • Sports
  • Toys
  • Video Games

The upcoming section demonstrates how to perform query operations on this XML data using our Team Room application.

You can associate a marketing report in text document format with the above XML customer data. The marketing report can include market analysis or details on how the collection took place.

As you may recall from Part 1, we created a column of type XML to store collected marketing data. To efficiently manage XML data, like traditional SQL data types, DB2 uses the XML data model internally, as the logical data model and as the fundamental unit of physical storage. Furthermore, this data model is exposed to database users when XML data type is specified. While this provides great power and flexibility for managing XML data, especially for XML-centric developers, currently there are some limitations on the types of traditional database administration activities that can be performed on an XML column. For example, a table with an XML column cannot physically be reorganized by DB2, due to XML's hierarchical storage structure. This effectively means that an ALTER operation cannot be used to drop a column on a table that contains an XML column. While this limitation may be removed in a future release, keep this limitation in mind as this is a fundamental operation for Ruby on Rails migrations.

To continue to take advantage of the flexibility of Ruby on Rails migrations, without giving up the power that comes with XML, let's create a separate table to store XML data. Let's call this table XML_CONTENTS. It will store XML documents, and the DOCUMENTS table will continue to store all other associated information. Future addition or removal of columns on the DOCUMENTS table can take place, neither being effected by XML data or effecting XML data in the XML_CONTENTS table.

For further details on current limitations using the XML data type, please refer to the "Restrictions on native XML data store" section within the IBM DB2 Database for Linux, UNIX, and Windows Information Center.

In order to perform this task, let's generate and run the following migration:

a) Run ruby script/generate migration create_xml_contents, which will create the db/migrate/010_create_xml_contents.rb file.

b) Edit the db/migrate/010_create_xml_contents.rb.rb file as follows:

Listing 4. Edit 010_create_xml_contents.rb
class CreateXmlContents < ActiveRecord::Migration
  def self.up
    drop_table :documents
    create_table :documents do |t|
      t.column :name,         :string,   :null => false
      t.column :size,         :integer,  :null => false
      t.column :data,         :binary,   :limit => 2.megabytes
      t.column :content_type, :string,   :null => false
      t.column :created_at,   :timestamp
      t.column :updated_at,   :timestamp
      t.column :platform,     :string,   :limit =>10
      t.column :subject_id,   :integer
      t.column :user_id,      :integer
    end
    create_table :xml_contents do |t|
      t.column :name,        :string
      t.column :data,        :xml,       :null => false
      t.column :document_id, :integer
    end
  end

  def self.down
    drop_table :documents
    drop_table :xml_contents
    create_table :documents do |t|
      t.column :name,           :string,   :null => false
      t.column :size,           :integer,  :null => false
      t.column :data,           :binary,   :limit => 2.megabytes
      t.column :content_type,   :string,   :null => false
      t.column :created_at,     :timestamp
      t.column :updated_at,     :timestamp
      t.column :platform,       :string,   :limit =>10
      t.column :subject_id,     :integer
      t.column :user_id,        :integer
      t.column :xmldata,        :xml,      :null => false
    end
  end
end

c) Run rake db:migrate to drop the existing DOCUMENTS table, create the new XML-data-only table called XML_CONTENTS, and the new DOCUMENTS table (without the XML column).

d) Re-establish a relationship between the new DOCUMENTS table and the XML_CONTENTS table.

First, add the belongs_to :document association to the /app/models/xml_content.rb file generated at Step d.

Second, add the has_one :xml_content association to the /app/models/document.rb file.

e) The upload functionality is similar to what was implemented before in the previous document model (document.rb) which now becomes a parent model, and adds handling for the creation of the child model (xml_content).

The NAME column is populated with the original filename.

Listing 5. Assigning file attributes to DOCUMENTS record
self.name = File.basename(doc_field.original_filename).gsub(/[^\w._-]/, '')
self.content_type  = doc_field.content_type.chomp
self.size = doc_field.size
self.created_at = Time.now

The XML_CONTENTS.DATA column is populated with the specified file.

Listing 6. Assigning XML file content to XML_CONTENTS.DATA
unless self.content_type.include?('text/xml')
  self.data = doc_field.read
else
  content = XmlContent.new
  content.name = self.name
  content.data = doc_field.read
  self.xml_content = content
end

The final /app/models/document.rb should appear as shown in Listing 7.

Listing 7. document.rb
class Document < ActiveRecord::Base
  belongs_to  :user
  belongs_to  :subject
  has_one     :xml_content

  #           values  displayed  |  stored
  PLATFORM_TYPES = [ ['Neutral',    'Any'],
                     ['Windows',    'WinXP'],
                     ['Mac OS X',   'MacOS'],
                     ['Linux',      'Linux']]


  def uploaded_doc=(doc_field)
    self.name = File.basename(doc_field.original_filename).gsub(/[^\w._-]/, '')
    self.content_type  = doc_field.content_type.chomp
    self.size = doc_field.size
    self.created_at = Time.now
    unless self.content_type.include?('text/xml')
      self.data = doc_field.read
    else
      content = XmlContent.new
      content.name = self.name
      content.data = doc_field.read
      self.xml_content = content
    end
  end
end

Now that models for the XML_CONTENTS and DOCUMENTS tables are defined, you must modify the views to handle upload and display functionality. The upload functionality and the view listing the documents was updated to explicitly select the model attributes (columns) to be displayed.

The /app/views/documents/list.rhtml implements this explicit column selection as below:

Listing 8. /app/views/documents/list.rhtml
<table cellpadding="0" cellspacing="0">
  <tr>
   <th>ID</th>
   <th>Document name</th>
   <th>Subject</th>
   <th>Shared by</th>
   <th>Size</th>
   <th>Update at</th>
   <th>Platform</th>
   <th></th>
   <th></th>
   <th></th>
  </tr>
  <% @documents.each_with_index do |document,i| %>
    <% row_class = i%2 ==0 ? "even" : "odd" %>
    <tr class="<%=row_class%>">
    <td><%= document.id %></td>
    <td><%= document.name %></td>
  <% if document.subject %>
    <td><%= link_to "#{document.subject.name}", 
                     :controller => 'subjects', :action => 'list'  %></td>
  <% else %>
    <td></td>
  <% end %>
  <% if document.user %>
    <td><%= link_to "#{document.user.userid}", 
                     :controller => 'users', :action => 'list'  %></td>
  <% else %>
    <td></td>
  <% end %>
    <td><%= number_to_human_size( document.size ) %></td>
    <td><%= document.updated_at.strftime("%d/%m/%Y %I:%M%p") %></td>
    <td><%= document.platform %></td>
    <td><%= link_to 'Show', :action => 'show', :id => document %></td>
    <td><%= link_to 'Edit', :action => 'edit', :id => document %></td>
    <td><%= link_to 'Remove', { :action => 'destroy', :id => document }, 
                     :confirm => 'Are you sure?', :method => :post %></td>
    </tr>
  <% end %>
</table>

Then, the following update was made to the controller for handling documents show functionality for XML data. These entries are added to /app/controllers/documents_controller.rb.

Listing 9. documents_controller.rb
def show
  @document = Document.find(params[:id])
  doc_type = @document.content_type
  unless doc_type.include?('text/xml')
    doc_content = @document.data
  else
    doc_content = @document.xml_content.data
  end
  send_data(doc_content,
            :filename => @document.name,
            :type => doc_type,
            :disposition => "inline")
end

The following update was also made in support to XML document upload, and you may notice the implementation of document to subject and user relationships handling.

Listing 10. documents_controller.rb
def upload
  if params[:document][:uploaded_doc].to_s.empty?
    flash[:notice] = "Please provide a file for upload"
    redirect_to(:action => "new" )
  else
    @document = Document.new(params[:document])
    @subject = params[:subject_name] && params[:subject_name].empty? ? 
               Subject.new :
               Subject.find_by_name(params[:subject_name])

    Document.transaction do
      User.find(session[:user_id]).documents << @document
      @subject.documents << @document
      @subject.size = @subject.documents.size
      if @subject.new_record?
        @subject.name = params[:subject][:name]
        @subject.tag = params[:subject][:tag]
        @subject.description = params[:subject][:description]
        @subject.save
      end

      if @document.save
        flash[:notice] = "Document #{@document.name} successfully created."
        if @document.subject.subscriptions
          SubscriptionMailer.deliver_notify(@document)
        end
        redirect_to :action => 'list'
      else
        render :action => 'new'
      end
    end
  end

Figure 1 below depicts the associations between different models in our Team Room after performing the migrations steps above.

Figure 1. New associations between different models after the changes
Database schema diagram

Uploading XML marketing data to Team Room

Note 3:

Only well-formed XML documents can be imported using IMPORT command.

Now that all necessary schema changes were implemented through migrations, we can upload the XML data into the XML_CONTENTS table. The easiest way to insert multiple XML documents into a table is to use the IMPORT utility in DB2 9 data servers. The IMPORT utility allows to import well-formed XML documents with or without XML schema validation. For our current usage, we need to ensure proper updates to the XML_CONTENTS table, the DOCUMENTS table and the associated entries in the SUBJECTS table. Therefore the scaffold generated should be used to upload data one file at a time in order to maintain all the foreign keys associations.

Note 4:

You can drop the XML schema using following command:
drop xsrobject teamroom.marketinfo

In Figure 2 you can see the new Document view /app/views/documents/new.rhtml shows a document creation associated with the XML content, and also the creation of a new subject which is associated with the new document.

Figure 2. Uploading XML documents into Team Room repository
Uploading new XML document

Now is the right time to upload the remaining XML documents containing simulated marketing data for a few regions in Canada and US to be used in exercising some XQuery and XPath searches. The XML files are located under /test/fixtures along with marketinfo.xsd XML schema to be used for validation purpose. See Downloads to get the updated Team Room application at the end of this article.


Basic CRUD operations with XML data types

Let's perform some basic XML Create, Retrieve, Update, and Delete (CRUD) operations to get a feel for how to manage XML data.

To demonstrate this, pretend that the Marketing department is interested in a survey of the cities where customers live. Perhaps there will be a directed marketing campaign in those cities in order to increase customer awareness and customer return trips to the store. For this purpose we can extract data from our Marketing Info XML documents stored in the XML_CONTENTS table. Furthermore, we have decided to store the survey produced to keep track (on an ongoing basis) of the growing repository of cities served. If we decide to create this database of cities as XML, notice that we do not have to worry about creating another table to keep track of the data. We can compose it as an XML document and simply insert it back as XML in the same column. At some point in the future, the DBA may want to create a new table for separation of application data, but this is strictly a logical and semantic requirement and is not required by the database.

In DB2, XML data can be queried in a number of ways: using SQL, XQuery, or a combination of the two. This example uses XQuery. Using XQuery facilitates returning query results to be used as values for constructing XML documents. Listing 11 shows the XQuery to be executed:

Listing 11. The city survey XQuery
XQUERY
<cities>
   declare default element namespace "http://www.ibm.com/developerworks";
   { for $c in fn:distinct-values(
         db2-fn:xmlcolumn(
        'XML_CONTENTS.DATA')/marketinfo/sales/customer/address/city)
         order by $c
      return <city>{$c}</city>
   }
</cities>

Let's start inside the statement with the function db2-fn:xmlcolumn() and work our way outward in order to understand how this specific XQuery works. The db2-fn:xmlcolumn() is a function that retrieves a sequence from an XML column in the DB2 database currently connected. Here, we are retrieving data from the XML_CONTENTS table, specifically the DATA column. But we don't want all the data, we only want the subset identified by the XPath expression: /marketinfo/sales/customer/address/city.

In other words, we want to look at all the XML documents in every row in the table and select all the city elements that appear in the XPath. This poses a potential problem because more than one customer may live in the same city. To account for that, we use the XQuery function fn:distinct-values(). As the name implies, it will only return a sequence of the distinct city elements, city names will not be repeated. This sequence is assigned to the variable $c.

Before the final step, the cities in $c are ordered. The result is then returned. One of the features that makes XQuery so powerful is that the format of the data returned can be highly customized. Since we currently have a sequence of city names, we wrap each city in a <cities> element. This is a sequence of elements, yet not a valid XML document because it doesn't have a root element. To ensure that a valid XML document is returned, the entire result is placed within a <cities> element so that the data now looks like Listing 12 in ascending order.

Listing 12. Typical XML data returned
<cities>
  <city>Atlanta</city>
  <city>Augusta</city>
  <city>Austin</city>
  <city>Baton Rouge</city>
  <city> ... </city>
</cities>

To learn more about XQuery in DB2 9, please refer to the DB2 XML Guide (a link to which is found in the Resources section at the end of this article).

Now that we have our listing of cities as an XML document, let's insert this document back in to the database. A developer would insert the XML document in just the same way that any type of data is inserted into the database.

Listing 13. Inserting XML documents
class DocumentsController < ApplicationController
  [...]
  def upload
  [...]
      @document = Document.new(params[:document])
      @subject = params[:subject_name] && params[:subject_name].empty? ? 
                 Subject.new :
                 Subject.find_by_name(params[:subject_name])

      Document.transaction do
        User.find(session[:user_id]).documents << @document
        @subject.documents << @document
        @subject.size = @subject.documents.size
        if @subject.new_record?
          @subject.name        = params[:subject][:name]
          @subject.tag         = params[:subject][:tag]
          @subject.description = params[:subject][:description]
          @subject.save
        end
  
        if @document.save
          flash[:notice] = "Document #{@document.name} successfully created."
  [...]
end

As the number of customers at any given moment is not static, we may want to update the list of customer cities periodically, possibly weekly or monthly, so that the list stays up-to-date. In order to do that, you would execute the same XQuery statement. The document that is currently in the database would then be replaced by the latest version of the XML file based on its id. Notice how the existing document is retrieved through a hidden :id parameter passed in the edit view form (/app/views/documents/_form.rhtml) to be used in the DocumentsController:

Listing 14. Updating XML documents
class DocumentsController < ApplicationController
  [...]
  def update
    @document = Document.find(params[:document][:id])
    if @document.update_attributes(params[:document])
      flash[:notice] = 'Document was successfully updated.'
      redirect_to :action => 'show', :id => @document
    else
      render :action => 'edit'
    end
  end
  [...]
end

Internally, DB2 deallocates the existing XML data pages, and the new value is inserted again. So effectively, the entire XML document is replaced with the new, updated document.

Finally, there may come a time when this data is no longer useful or it may be handled by some other mechanism. For example, the DBA may insist on a separation of concerns, such as using a different XML column in a different table. To remove the document, issue a DELETE statement with an appropriate WHERE predicate. In Ruby, it may be done in this fashion:

Listing 15. Deleting XML documents
class DocumentsController < ApplicationController
  [...]
  def destroy
    Document.find(params[:id]).destroy
    redirect_to :action => 'list'
  end
end

Querying XML data

Beyond basic CRUD operations, the power of the XQuery language allows you to be very precise in how you query and how the data is returned. For example, while having the list of cities where customers reside is useful, the marketing department might be more interested in which product categories are active. This can further narrow the focus of a marketing initiative that may not have the budget to blanket all products in a marketing campaign. One metric that might indicate active products is whether recent purchases were made in a particular product category or more specifically, whether these categories were active in a particular time period.

We will start our data mining efforts by posing the question: Which product categories where purchased between April 15, 2007 and April 30, 2007?

Listing 16 is the query used to answer the question the Marketing team posed:

Listing 16. XQuery to obtain product category purchases between April 15 and April 30
XQUERY
declare default element namespace "http://www.ibm.com/developerworks";
<categories>
  { let $categories := fn:distinct-values( 
      for $c in db2-fn:xmlcolumn( "XML_CONTENTS.DATA")/marketinfo/sales/customer
          where xs:date($c/last_purchase) >  xs:date("2007-04-15")
            and xs:date($c/last_purchase) <= xs:date("2007-04-30")
         return $c/categories/category/@type)
      for $c in $categories 
         return <category>{$c}</category>
  }
</categories>

Like previous examples, we want to return a valid XML document, so we will wrap the result of the XQuery with the <categories> root XML element. The enclosed XQuery is separated into two pieces. First, a "let" statement serves as a binding of an expression to the variable $categories. This variable is then used as the context for a for statement. Let's examine each of these individually.

The "let" statement is itself composed of a FLWOR (for, let, where, order by, and return) XQuery expression. As you may have noticed, a FLWOR expression is often compared with a SELECT-FROM-WHERE block in SQL. Here we are iterating over every customer: for $c in db2-fn:xmlcolumn("XML_CONTENTS.DATA") /marketinfo/sales/customer, where the purchase date is greater than April 15, 2007 and less than April 30, 2007. If the purchase meets this criteria, marketing wants to know about it: where xs:date($c/last_purchase) > xs:date("2007-04-15") and xs:date($c/last_purchase) <= xs:date("2007-04-30").

Of those purchases, we return the category type stored as an XML attribute: $c/categories/category/@type. As in the last example, we use the fn:distinct-values() function to avoid duplicate purchases in the same category. This list of categories is what is bound to the variable $categories.

Currently $categories stores a sequence of distinct categories, but since we are returning XML we will wrap each value in the sequence with the <category> XML element. The final output is similar to the output shown in Listing 17, and we know that it has been filtered to include purchases from April 15 to April 30.

Listing 17. Sample data of product categories purchased between April 15 to April 30
<categories>
  <category>Home</category>
  <category>Electronics</category>
  <category>Apparel</category>
  <category>Gifts & Flowers</category>
  <category>Baby</category>
</categories>

This XQuery can be parameterized in order to make the query execute over any range of dates that Marketing desires. Further, to encapsulate this query, you can push the functionality down into a stored procedure, abstracting away the XQuery details for the Rails developer.

In Figure 3 below the Documents list view /app/views/documents/list.rhtml layout includes also the pre-built report forms allowing to exercise XQuery and XPath searches.

Figure 3. Team Room Documents repository and pre-built marketing reports
Team Room Documents repository

Querying XML data on steroids

Now that we have gotten our feet wet, Marketing has become a bit more demanding in their thirst for data. They come back and wonder how many products of a particular category type have been sold in a region containing a particular zip code. The zip code may be an area where there have been a large number of purchases or perhaps a large number of complaints. This can help marketing and product managers determine which products are popular in which sales regions and correlate them with the zip code in question. Perhaps marketing can discern which products to discontinue or on which zip codes to focus marketing efforts based on popularity of a product in a region.

We will use the SQL/XML function XMLQUERY() to determine this. Querying with SQL/XML provides the joint power of both. For example, SQL/XML allows you to do the following operations:

  1. Use predicates on both relational and XML data
  2. Access and extract fragments of XML data
  3. Use aggregation and grouping of XML data on the SQL level
  4. Join relational and XML data
  5. Pass parameters to XQuery expressions

We will use (1), (2), and (5) in our XQuery. For simplicity, let's not parameterize the example and assume we are interested in the "Jewelry" category type in zip code "79081". But to the joy of the Marketing team, the application code provided (see the Downloads section at the end of this article) does contain a version of the query that allows parameters for any category type or zip code.

Listing 18. SQL/XML query to obtain total category type sold for a particular zip code
select name, xmlquery( 
               'declare default element namespace "http://www.ibm.com/developerworks";
                 let $total := sum (
                                    for $i in $t//category 
                                       let $sum := count($i/item)
                                       where $i/@type = "Jewelry"
                                    return $sum
                                    )
                return <total>{$total}</total>'
                   passing data as "t"
                      ) as data 
      from teamroom.documents
      where xmlexists( 
                'declare default element namespace "http://www.ibm.com/developerworks";
                 $t/marketinfo/sales/customer/address[zip = "79081"]'
                    passing data as "t"
                      )

The SELECT statement allows us to return one row per region identified by the name column. Additionally, for each row, we will return a valid XML document containing a <total> root element only. Because we are calling the XQuery inside the SQL language, we must use the SQL/XML function XMLQUERY() and let the XQuery engine know the context upon which to operate. In our previous examples, this was performed using the db2-fn:xmlcolumn() function. Here, we use the clause passing data as t. We identify data as the XML column to operate on and assign it to the variable t. Wherever the variable $t appears in the XQuery, DB2 knows to substitute the XML document in the data column of the current row of the SELECT statement.

Another distinction that made at the SQL level is that we don't want the XMLQUERY() to operate on all XML documents in the column. We would like to limit the query to rows that contain a customer whose zip code is "79081". Once we find those rows, we'll execute the XQuery only on them. Specifically, all customer purchases in this row are queried, whether the zip code is "79081" or not.

The filtering is done with the SQL WHERE clause using the SQL/XML XMLEXISTS() function predicate. The XMLEXISTS predicate determines whether an XQuery expression returns a sequence of one or more elements. In our example, the XQuery expression that we pass to XMLEXISTS() is a simple XPath expression where $t is the XML column data: $t/marketinfo/sales/customer/address[zip = "79081"].

The XPath expression can be interpreted as: return a sequence of address elements whose zip is "79081". If the specified XPath returns an empty sequence, XMLEXISTS returns false. Otherwise, it returns true. Now that SQL has helped obtain a subset of XML documents, the XQuery uses a let binding to a variable $total. $total is assigned the result of a FLWOR expression. For any <category> element in the XML document (for $i in $t//category) whose type is "Jewelry" (where $i/@type = "Jewelry") count the number of sold and assign it to $sum (let $sum := count($i/item)) and return that sum (return $sum).

This returns a sequence of sums for each customer. The result is passed to the sum() function to get the total over all customers for that particular category. We now the $total. Since we want to return XML, we return the whole wrapped in a <total> element (return <total>{$total}</total>'). In the end, the result is similar to: return <total>198</total>'.

For further details on and examples of using of XQuery in DB2 9, please refer to the DB2 XML Guide (see the Resources section for a link to this document.)

Shredding XML data

Consider a scenario where a newly built application is expected to integrate with legacy applications and repositories that are not enabled for native XML. Information stored in an XML document may need to be placed in a relational table object.

Let's consider a scenario in which the results of the market analysis are expected to be contributed into a relational store for analysis by a legacy application. This will require a shredding of the XML data so it can be inserted into relational tables.

As shown, pureXML provides a multitude of facilities to work directly with XML data. When required, pureXML also provides the capability to perform shredding, also known as decomposition. DB2 9 data server provides the following built-in tools to help facilitate XML shredding.

1. XDBDECOMPXML stored procedure

The annotated XML schema decomposition feature in DB2 9 can be used to decompose XML documents into relational tables. As the name implies, it uses annotations in the XML schema as the mapping language to map information in an XML document to relational tables. Since it requires an XML schema, the XML schema document must be stored in the DB2 XSR and designated for decomposition. Decomposing or "shredding" an XML document into the mapped relational columns is done through a DB2 stored procedure (xdbDecompXML) call or using a Command Line Processor (CLP) command.

For details regarding XML shredding support through the Visual Studio Add-in, please refer to "Introduction to annotated XML schema decomposition using the DB2 Visual Studio 2005 Add-in".

Another way to annotate XML schema is to use the DB2 Developer Workbench (DWB). The DWB, a free download, is a comprehensive environment for creating, editing, debugging, deploying, and testing DB2 database applications, including developing stored procedures and user-defined functions. Please see the Downloads section at the end of this article for download information.

One of the components of the DWB is the annotated XML schema decomposition Mapping Editor. Using a simple and intuitive graphical interface, it allows you to map relationships between an XML schema and a relational schema. Graphically mapping XML elements or attributes to relational columns in DB2 automatically annotates the XML schema document. Once the XML schema is saved and then registered in the XSR, you are ready to decompose XML documents into DB2.

A full discussion of the XML schema annotations and the xdbDecompXML set of stored procedures is beyond the scope of this article. To learn more about annotated XML schema decomposition, including such advanced features as conditional decomposition based on content or specifying content transformation to be applied before insertion, please refer to the DB2 9 XML Guide publication (see the Resources section of this article). For those familiar with XML Extender and its method for decomposition, please also refer to the developerWorks article entitled "From DAD to Annotated XML Schema Decomposition" by Mayank Pradhan for further information.

2. XMLTABLE SQL table function

XMLTABLE is a SQL table function that returns a table from the evaluation of XQuery expressions on an XML document. The table that is returned can contain columns of any SQL data type, including XML. Using XMLTABLE, values retrieved from within XML documents, when combined with an INSERT statement, can then be inserted into a relational table. This allows for the same functionality as annotated XML schema decomposition. This is often referred to as an "Insert-from-XMLTABLE" statement.

This provides a simple form of XML shredding and is ideal for storing fragments of XML documents in various columns of relational tables.

The following statement executes an XQuery expression and return values as a table from the DATA column. Each row shows the city, state, zip code and the last purchase date associated with the particular customer.

Listing 19. The XMLTABLE function
SELECT X.CITY, X.STATE, X.ZIP, X.LAST_PURCHASE FROM 
TEAMROOM.XML_CONTENTS, 
    XMLTABLE (XMLNAMESPACES (DEFAULT 'http://www.ibm.com/developerworks'), 
       'db2-fn:xmlcolumn("XML_CONTENTS.DATA")//customer' 
       COLUMNS 
       "CITY"          CHAR(16) PATH './address/city', 
       "STATE"         CHAR(16) PATH './address/state', 
       "ZIP"           CHAR(6)  PATH './address/zip', 
       "LAST_PURCHASE" DATE     PATH './last_purchase') as X

The output of the above XQuery looks like this example:

Listing 20. Sample XMLTABLE output
Baton Rouge      LA               77888  03/10/2007
Baton Rouge      LA               14257  01/07/2007
Richmond         VA               78045  01/26/2007
Oklahoma City    OK               71107  04/13/2007
Tallahassee      FL               41720  04/25/2007
Richmond         VA               39591  03/25/2007
Richmond         VA               36522  03/23/2007
Richmond         VA               32230  02/12/2007
Charleston       WV               33015  02/12/2007
Columbia         SC               72647  01/11/2007
Raleigh          NC               11238  04/02/2007
Nashville        TN               21245  01/06/2007
Fankfort         KY               53793  04/18/2007
Austin           TX               35462  03/13/2007
Columbia         SC               68359  01/01/2007
Jackson          MS               25770  01/20/2007
Little Rock      AR               46342  03/10/2007
Tallahassee      FL               54306  01/20/2007
Charleston       WV               44339  02/20/2007
Frankfort        KY               92403  02/27/2007

<etc ........>

Assuming you have defined a CUSTOMER_INFOS table with proper SQL types defined, you can then insert this customer data extracted from the XML marketing information (in tabular format ) into a relational table by wrapping an INSERT statement around the SELECT statement like this:

Listing 21. Inserting decomposed XML data into relational table
INSERT INTO TEAMROOM.CUSTOMER_INFOS
SELECT X.CITY, X.STATE, X.ZIP, X.LAST_PURCHASE FROM 
TEAMROOM.XML_CONTENTS, 
    XMLTABLE (XMLNAMESPACES (DEFAULT 'http://www.ibm.com/developerworks'), 
       'db2-fn:xmlcolumn("XML_CONTENTS.DATA")//customer' 
       COLUMNS 
       "CITY" VARCHAR(16)    PATH './address/city', 
       "STATE" CHAR(16)      PATH './address/state', 
       "ZIP" CHAR(6)         PATH './address/zip', 
       "LAST_PURCHASE" DATE  PATH './last_purchase') as X

In addition, Ruby is shipped with an XML processor called Ruby Electric XML (REXML). It can be used to tree parse and stream parse XML data. Please see the Resources section of this document for further details.


Living the Web 2.0 dream

Part 1 of this series introduced the ability for a user to subscribe to updates of particular interest. When an update is made, the user receives and email notification. How quaint and so 20th century! In this new world of Web 2.0, someone might be interested in receiving these notifications as an RSS or Atom feed in their favorite reader.

Using DB2 9 pureXML, it is possible to generate such a feed that someone can easily poll using their favorite feed reader. We do this using the SQL/XML XML publishing functions. XML publishing functions are used to construct XML nodes and documents. Both relational and XML data can be used.

Note: Publishing functions are sometimes also referred to as constructor functions.

The XML publishing functions available in DB2 9 are:

  • XMLNAMESPACES
  • XMLELEMENT
  • XMLATTRIBUTE
  • XMLFOREST
  • XMLDOCUMENT
  • XMLCONCAT
  • XMLCOMMENT
  • XMLPI
  • XMLTEXT

Let's provide an Atom feed that notifies users when subscriptions of interest are updated. While we chose an Atom feed, DB2 is equally able to generate an RSS feed. It is just a matter of constructing the feed using the appropriate XML schema of the feed.

Listing 22. The Atom feed stored procedure
CREATE PROCEDURE GET_ATOM_FEED (  )
  DYNAMIC RESULT SETS 1
------------------------------------------------------------------------
-- SQL Stored Procedure 
------------------------------------------------------------------------
P1: BEGIN
  -- Declare cursor
  DECLARE cursor1 CURSOR WITH RETURN FOR
    SELECT XMLSERIALIZE(
      XMLDOCUMENT(
        XMLELEMENT (NAME "feed",
          XMLNAMESPACES(DEFAULT 'http://www.w3.org/2005/Atom'),
          XMLCONCAT (
            XMLELEMENT (NAME "id", 'http://localhost:3000/documents'),
            XMLELEMENT (NAME "title", 'Teamroom Documents'),
            XMLELEMENT (NAME "updated", CURRENT TIMESTAMP),
            XMLELEMENT (NAME "link", 
                        XMLATTRIBUTES('http://localhost:3000/documents/atom_feed' 
                        as "href", 'self' as "rel")),
            XMLELEMENT (NAME "author", 
              XMLCONCAT(
                XMLELEMENT (NAME "name", 'TeamRoom'),
                XMLELEMENT (NAME "email", 'teamroom@developerWorks.ibm.com')
              )
            ),
            XMLAGG (
              XMLELEMENT (NAME "entry",
                XMLCONCAT (
                    XMLELEMENT (NAME "title", name),
                    XMLELEMENT (NAME "id", 'http://localhost:3000/documents/show/' 
                               || CHAR(id)),
                    XMLELEMENT (NAME "updated", updated_at),
                    XMLELEMENT (NAME "link", 'http://localhost:3000/documents/show/' 
                               || CHAR(id)),
                    XMLELEMENT (NAME "category", category),
                    XMLELEMENT (NAME "summary", content_type),
                    XMLELEMENT (NAME "content", XMLATTRIBUTES('text' as "type"), content)
                )
              )
            )
          )
        )
      ) 
      AS CLOB INCLUDING XMLDECLARATION
    )
      FROM (SELECT d.id as id, d.name as name, d.content_type as content_type, 
            d.updated_at as updated_at, s.name as category, s.description as content
          FROM DOCUMENTS d, SUBJECTS s 
          WHERE d.subject_id = s.id
          ORDER BY d.updated_at DESC 
          FETCH FIRST 10 ROWS ONLY) 
      AS doc_list;
      
  -- Cursor left open for client application
  OPEN cursor1;
END P1

We also chose to provide the functionality as a stored procedure so that the Rails developer does not have to worry about the details of generating the Atom feed. The developer will simply call the stored procedure.

Listing 23. Calling the Atom feed stored procedure
class Document < ActiveRecord::Base
  [...]
  def atom_feed
    feed = Document.find_by_sql("call teamroom.get_atom_feed()")
    content = feed[0].attributes["1"]
    send_data(content,
              :filename => 'TeamRoomFeed.atom',
              :type => 'text/xml',
              :disposition => "inline")
  end
  [...]
end

A full discussion of the DB2 XML publishing functions and the Atom Syndication Format Protocol is beyond the scope of this article. Please see the Resources section for further information and some useful links. Additionally, the application available for download contains the full implementation of the Atom feed, including the stored procedure and the application code to generate the feed.

Conclusion

DB2's pureXML ability to store XML data in its native hierarchical format, allows applications to easily talk the common language of XML while providing the performance, scalability, reliability and availability advantages that come with the DB2 relational database management system. When combined with the simplicity and flexibility of the Ruby on Rails framework, a developer can build, deploy, and maintain world class Web 2.0 applications quickly and easily. Development options are flexible to allow shifting information from the relational to the hierarchical format as business needs require. While shifting from one data format to another, performance remains consistently fast because you're employing the same optimization technology, and all other reliability and availability features apply: backup, recovery and scalability provided by DB2 technology.

DB2 pureXML not only supports the first generation of native XML support, but it also stores post-parsed data on disk with node-level granularity which allows high query performance through indexing (the actual document node position), and the usage of XQuery and XPath primitives built into the DB2 query engine. This makes for a very good fit in the Ruby on Rails runtime which provides one of the friendliest set of XML libraries (for instance, REXML, ROXML), and allows for rapid integration of XML-based data stores into a Web application.


Download

DescriptionNameSize
Team room sample codeTeamroom2.zip10KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Open source
ArticleID=228898
ArticleTitle=DB2 and Ruby on Rails, Part 2: DB2 and pureXML with Ruby on Rails
publish-date=06072007