 | Level: Intermediate John Chun, DB2 Advanced Support Specialist, IBM Christine Law, DB2 Advanced Support Specialist, IBM Salvador Ledezma, Staff Software Engineer,
IBM
Alex Pitigoi, Advisory Software Engineer,
IBM
07 Jun 2007
XML is one of the most common data interchange formats on the Web today.
DB2® support for pureXML™, combined with the parsing and generating XML APIs (REXML)
available to Ruby on Rails, provides a powerful combination for Web application development.
The native XML support in DB2 data servers brings flexibility for storing semi-structured, hierarchical XML documents alongside
relational data, through its hybrid database engine compiler and parsers, for both SQL and XQuery.
The second article in the Ruby on Rails series demonstrates how to utilize pureXML in our Team Room example from the first article.
Introduction
In Part 1 of the Ruby on Rails series, you used Ruby on Rails and DB2 to build a Team Room that enables registered
members to share various text documents, image files, and XML documents.
To manage the ever-growing collection of shared documents, you learned how to group your documents by categories.
Then, you learned how to add a subscription feature so we can notify our users by email when new documents
are added to the document category to which they are subscribed. At the end of the first article,
members were able to upload different types of files into our Team Room, where the files are
stored in the backend DB2 data server. Now, let's enhance our Team Room further to
provide more advanced user features and better access to resources.
Updating Part 1 Team Room application for further development
Step 1. Adding user management features
Let's start with a necessary addition to the user model and its underlaying table
persistence, that enables proper authentication: a unique userid string and a
hashed password using an SHA algorithm with a pseudo-random seed (salt).
You can improve this user model further by adding
other user attributes (such as active, rank, and so on), while a few controller and view changes were
added to allow registration of new users and secured login. Recall that our Rails project is located in
D:\rails\teamroom, so all the paths referenced are relative paths within the D:\rails\teamroom
directory.
a) Issue ruby script/generate migration add_user_credentials_columns
to start the migration process, to add necessary columns to the USERS table.
b) Edit the db/migrate/008_add_user_credentials_columns.rb file to add the necessary columns (see Listing 1):
Listing 1. Edit 008_add_user_credentials_columns.rb
class AddUserCredentialsColumns < ActiveRecord::Migration
def self.up
add_column :users, :userid, :string, :limit => 8
add_column :users, :hash_passwd, :string
add_column :users, :salt, :string
end
def self.down
remove_column :users, :userid
remove_column :users, :hash_passwd
remove_column :users, :salt
end
end
|
c) Run rake db:migrate to add these new columns to the USERS table.
Step 2. Making subjects available to multiple user subscriptions
In the Team Room described in the first article, each subject can only belong to one subscription.
Remember that each subscription is a collection of
subjects to which a user has subscribed. This makes our Team Room rather unreal, because once a user selected a
specific subject, the selected subject was no longer available to any other users in our Team Room.
Due to the popularity of our Team Room, many of our members would like to subscribe to
multiple subjects. So, it only seems right to honor our members' request and allow
multiple subscriptions to the same subject. In the updated Team Room, a user can subscribe to many subjects through
a single subscription. To implement this change, follow these steps:
- Previously there was a one-to-many relationship between SUBSCRIPTIONS and SUBJECTS.
But each subject can only belonged to one subscription. To lift this restriction, we need to allow a
many-to-many relationship between SUBSCRIPTIONS and SUBJECTS.
- To ensure the database is normalized (see Note 1), a new table SUBJECTS_SUBSCRIPTIONS
will be created to join the SUBJECTS and SUBSCRIPTIONS tables. In choosing the name of the intermediate join table,
you may notice we have used the Rails convention: Active Record assumes that this join table's name
is the concatenation of the two target table names in alphabetical order, and will contain foreign key pairs
linking the two target tables
- Existing associations will be modified, and new associations will be added to the SUBJECTS_SUBSCRIPTIONS table, subject and
subscription models to reflect the changes.
 | |
Note 1: See the "Normalization" section of the IBM DB2 Database for Linux, UNIX, and Windows Information Center.
|
|
The SUBJECTS_SUBSCRIPTIONS table contains the following columns:
Table 1. SUBJECTS_SUBSCRIPTIONS table columns and description
| Column Name | Data Type | Description |
|---|
| SUBSCRIPTION_ID | Integer | Foreign ID to SUBSCRIPTIONS table |
|---|
| SUBJECT_ID | Integer | Foreign ID to SUBJECTS table |
|---|
To do by way of migration, follow these steps:
a) Issue ruby script/generate migration create_subjects_subscriptions_table.
b) Edit the db/migrate/009_create_subjects_subscriptions_table.rb file:
Listing 2. Edit 009_create_subjects_subscriptions_table.rb
class CreateSubjectsSubscriptions < ActiveRecord::Migration
def self.up
create_table :subjects_subscriptions, :id => false do |t|
t.column :subscription_id, :integer, :null => false
t.column :subject_id, :integer, :null => false
end
remove_column :subjects, :subscription_id
add_index :subjects_subscriptions, :subject_id
end
def self.down
drop_table :subjects_subscriptions
add_column :subjects, :subscription_id, :integer
end
end
|
c) Run rake db:migrate to create the SUBJECTS_SUBSCRIPTIONS table.
e) Replace the existing association belongs_to :subscription
in /app/models/subject.rb file with a new association has_and_belongs_to_many :subscriptions.
f) Replace the existing association has_many: subject in
/app/models/subscription.rb file with has_and_belongs_to_many :subjects.
 | |
Note 2: When you migrate back, you must ensure that
associations with the database object in each model are changed and reverted accordingly.
|
|
Step 3. XML data: Customer information
Our marketing department has collected anonymous customer information in XML format in order to
analyze customer shopping habits. The following is an example of data that could be collected
by retailers for market research.
Listing 3. Sample XML document
<marketinfo xmlns="http://www.ibm.com/developerworks">
<sales>
<customer>
<address>
<city>Nashville</city>
<state>TN</state>
<zip>46808</zip>
</address>
<categories>
<category type='Toys'>
<item>
<SKU>2434901</SKU>
</item>
<item>
<SKU>9043272</SKU>
</item>
</category>
<category type='Video Games'>
<item>
<SKU>1915216</SKU>
</item>
</category>
</categories>
<last_purchase>2007-05-12</last_purchase>
</customer>
</sales>
</marketinfo>
|
Each customer purchase includes an address within the USA or Canada, a goods category detail including the item SKU number, as well as
the last purchase date. The product categories include:
- Apparel
- Automotive
- Baby
- Books
- Computers
- Cosmetics
- Electronics
- Garden & Patio
- Home
- Jewelry
- Movies
- Music
- Pets
- Pharmacy
- Sports
- Toys
- Video Games
The upcoming section demonstrates how to perform query operations on this XML data using our Team Room application.
You can associate a marketing report in text document format with the above XML customer data.
The marketing report can include market analysis or details on how the collection took place.
As you may recall from Part 1, we created a column of type XML to store collected marketing data.
To efficiently manage XML data, like traditional SQL data types, DB2 uses the XML data model internally, as the logical data model and as the fundamental unit of physical storage. Furthermore, this data model is exposed to database users when XML data type is specified.
While this provides great power and flexibility for managing XML data, especially for XML-centric developers,
currently there are some limitations on the types of traditional database administration activities that can be performed on an XML column.
For example, a table with an XML column cannot physically be reorganized by DB2, due to XML's hierarchical storage structure.
This effectively means that an ALTER operation cannot be used to drop a column on a table that contains an XML column.
While this limitation may be removed in a future release, keep this limitation in mind as this is a fundamental operation for Ruby on Rails migrations.
To continue to take advantage of the flexibility of Ruby on Rails migrations, without giving up the power that comes with XML,
let's create a separate table to store XML data.
Let's call this table XML_CONTENTS. It will store XML documents, and the DOCUMENTS table will continue to store all other associated information.
Future addition or removal of columns on the DOCUMENTS table can take place, neither being effected by XML data or effecting XML data in the XML_CONTENTS table.
For further details on current limitations using the XML data type, please refer to the "Restrictions on native XML data store" section within the IBM DB2 Database for Linux, UNIX, and Windows Information Center.
In order to perform this task, let's generate and run the following migration:
a) Run ruby script/generate migration create_xml_contents, which will create the db/migrate/010_create_xml_contents.rb file.
b) Edit the db/migrate/010_create_xml_contents.rb.rb file as follows:
Listing 4. Edit 010_create_xml_contents.rb
class CreateXmlContents < ActiveRecord::Migration
def self.up
drop_table :documents
create_table :documents do |t|
t.column :name, :string, :null => false
t.column :size, :integer, :null => false
t.column :data, :binary, :limit => 2.megabytes
t.column :content_type, :string, :null => false
t.column :created_at, :timestamp
t.column :updated_at, :timestamp
t.column :platform, :string, :limit =>10
t.column :subject_id, :integer
t.column :user_id, :integer
end
create_table :xml_contents do |t|
t.column :name, :string
t.column :data, :xml, :null => false
t.column :document_id, :integer
end
end
def self.down
drop_table :documents
drop_table :xml_contents
create_table :documents do |t|
t.column :name, :string, :null => false
t.column :size, :integer, :null => false
t.column :data, :binary, :limit => 2.megabytes
t.column :content_type, :string, :null => false
t.column :created_at, :timestamp
t.column :updated_at, :timestamp
t.column :platform, :string, :limit =>10
t.column :subject_id, :integer
t.column :user_id, :integer
t.column :xmldata, :xml, :null => false
end
end
end
|
c) Run rake db:migrate to drop the existing DOCUMENTS table, create the new XML-data-only table called XML_CONTENTS, and the new DOCUMENTS table (without the XML column).
d) Re-establish a relationship between the new DOCUMENTS table and the XML_CONTENTS table.
First, add the belongs_to :document association to the /app/models/xml_content.rb file generated at Step d.
Second, add the has_one :xml_content association to the /app/models/document.rb file.
e) The upload functionality is similar to what was implemented before in the previous document model
(document.rb) which now becomes a parent model, and adds handling for the creation of the child model (xml_content).
The NAME column is populated with the original filename.
Listing 5. Assigning file attributes to DOCUMENTS record
self.name = File.basename(doc_field.original_filename).gsub(/[^\w._-]/, '')
self.content_type = doc_field.content_type.chomp
self.size = doc_field.size
self.created_at = Time.now
|
The XML_CONTENTS.DATA column is populated with the specified file.
Listing 6. Assigning XML file content to XML_CONTENTS.DATA
unless self.content_type.include?('text/xml')
self.data = doc_field.read
else
content = XmlContent.new
content.name = self.name
content.data = doc_field.read
self.xml_content = content
end
|
The final /app/models/document.rb should appear as shown in Listing 7.
Listing 7. document.rb
class Document < ActiveRecord::Base
belongs_to :user
belongs_to :subject
has_one :xml_content
# values displayed | stored
PLATFORM_TYPES = [ ['Neutral', 'Any'],
['Windows', 'WinXP'],
['Mac OS X', 'MacOS'],
['Linux', 'Linux']]
def uploaded_doc=(doc_field)
self.name = File.basename(doc_field.original_filename).gsub(/[^\w._-]/, '')
self.content_type = doc_field.content_type.chomp
self.size = doc_field.size
self.created_at = Time.now
unless self.content_type.include?('text/xml')
self.data = doc_field.read
else
content = XmlContent.new
content.name = self.name
content.data = doc_field.read
self.xml_content = content
end
end
end
|
Now that models for the XML_CONTENTS and DOCUMENTS tables are defined, you must modify the views
to handle upload and display functionality.
The upload functionality and the view listing the documents was updated to explicitly select the
model attributes (columns) to be displayed.
The /app/views/documents/list.rhtml implements this explicit column selection as below:
Listing 8. /app/views/documents/list.rhtml
<table cellpadding="0" cellspacing="0">
<tr>
<th>ID</th>
<th>Document name</th>
<th>Subject</th>
<th>Shared by</th>
<th>Size</th>
<th>Update at</th>
<th>Platform</th>
<th></th>
<th></th>
<th></th>
</tr>
<% @documents.each_with_index do |document,i| %>
<% row_class = i%2 ==0 ? "even" : "odd" %>
<tr class="<%=row_class%>">
<td><%= document.id %></td>
<td><%= document.name %></td>
<% if document.subject %>
<td><%= link_to "#{document.subject.name}",
:controller => 'subjects', :action => 'list' %></td>
<% else %>
<td></td>
<% end %>
<% if document.user %>
<td><%= link_to "#{document.user.userid}",
:controller => 'users', :action => 'list' %></td>
<% else %>
<td></td>
<% end %>
<td><%= number_to_human_size( document.size ) %></td>
<td><%= document.updated_at.strftime("%d/%m/%Y %I:%M%p") %></td>
<td><%= document.platform %></td>
<td><%= link_to 'Show', :action => 'show', :id => document %></td>
<td><%= link_to 'Edit', :action => 'edit', :id => document %></td>
<td><%= link_to 'Remove', { :action => 'destroy', :id => document },
:confirm => 'Are you sure?', :method => :post %></td>
</tr>
<% end %>
</table>
|
Then, the following update was made to the controller for handling documents show functionality for XML data.
These entries are added to /app/controllers/documents_controller.rb.
Listing 9. documents_controller.rb
def show
@document = Document.find(params[:id])
doc_type = @document.content_type
unless doc_type.include?('text/xml')
doc_content = @document.data
else
doc_content = @document.xml_content.data
end
send_data(doc_content,
:filename => @document.name,
:type => doc_type,
:disposition => "inline")
end
|
The following update was also made in support to XML document upload, and you may notice
the implementation of document to subject and user relationships handling.
Listing 10. documents_controller.rb
def upload
if params[:document][:uploaded_doc].to_s.empty?
flash[:notice] = "Please provide a file for upload"
redirect_to(:action => "new" )
else
@document = Document.new(params[:document])
@subject = params[:subject_name] && params[:subject_name].empty? ?
Subject.new :
Subject.find_by_name(params[:subject_name])
Document.transaction do
User.find(session[:user_id]).documents << @document
@subject.documents << @document
@subject.size = @subject.documents.size
if @subject.new_record?
@subject.name = params[:subject][:name]
@subject.tag = params[:subject][:tag]
@subject.description = params[:subject][:description]
@subject.save
end
if @document.save
flash[:notice] = "Document #{@document.name} successfully created."
if @document.subject.subscriptions
SubscriptionMailer.deliver_notify(@document)
end
redirect_to :action => 'list'
else
render :action => 'new'
end
end
end
|
Figure 1 below depicts the associations between different models in our Team Room after performing the migrations steps above.
Figure 1. New associations between different models after the changes
Uploading XML marketing data to Team Room
 | |
Note 3: Only well-formed XML documents can be imported using IMPORT command.
Note 4: You can drop the XML schema using following command:
drop xsrobject teamroom.marketinfo
|
|
Now that all necessary schema changes were implemented through migrations, we can upload the XML data
into the XML_CONTENTS table. The easiest way to insert multiple XML documents into a table is to use the IMPORT utility in DB2 9 data servers.
The IMPORT utility allows to import well-formed XML documents with or without XML schema validation. For our current usage, we need to
ensure proper updates to the XML_CONTENTS table, the DOCUMENTS table and the associated entries in the SUBJECTS table. Therefore the
scaffold generated should be used to upload data one file at a time in order to maintain all the foreign keys associations.
In Figure 2 you can see the new Document view /app/views/documents/new.rhtml
shows a document creation associated with the XML content, and also the creation of a new subject
which is associated with the new document.
Figure 2. Uploading XML documents into Team Room repository
Now is the right time to upload the remaining XML documents containing simulated marketing data
for a few regions in Canada and US to be used in exercising some XQuery and XPath searches.
The XML files are located under /test/fixtures along with marketinfo.xsd XML schema to be used
for validation purpose. See Downloads to get the updated Team Room application at the end of this article.
Basic CRUD operations with XML data types
Let's perform some basic XML Create, Retrieve, Update, and Delete (CRUD) operations
to get a feel for how to manage XML data.
To demonstrate this, pretend that the Marketing department is interested in a survey
of the cities where customers live. Perhaps there will be a directed marketing campaign in
those cities in order to increase customer awareness and customer return trips to the store.
For this purpose we can extract data from our Marketing Info XML documents stored in the XML_CONTENTS table.
Furthermore, we have decided to store the survey produced to keep track (on an ongoing
basis) of the growing repository of cities served. If we decide to create this database of cities
as XML, notice that we do not have to worry about creating another table to keep track of the data. We can compose it as an XML document and simply insert it back as XML in the same column. At some point in the future, the DBA may want to create a new table for separation of application data, but this is strictly a logical and semantic requirement and is not required by the database.
In DB2, XML data can be queried in a number of ways: using SQL, XQuery, or a combination
of the two. This example uses XQuery. Using XQuery facilitates returning query
results to be used as values for constructing XML documents. Listing 11 shows the XQuery to be
executed:
Listing 11. The city survey XQuery
XQUERY
<cities>
declare default element namespace "http://www.ibm.com/developerworks";
{ for $c in fn:distinct-values(
db2-fn:xmlcolumn(
'XML_CONTENTS.DATA')/marketinfo/sales/customer/address/city)
order by $c
return <city>{$c}</city>
}
</cities>
|
Let's start inside the statement with the function db2-fn:xmlcolumn()
and work our way outward in order to understand how this specific XQuery works.
The db2-fn:xmlcolumn() is a function that retrieves a sequence
from an XML column in the DB2 database currently connected. Here, we are retrieving data
from the XML_CONTENTS table, specifically the DATA column. But we don't want all the data,
we only want the subset identified by the XPath expression:
/marketinfo/sales/customer/address/city.
In other words, we want to look at all the XML documents in every row in the table and select
all the city elements that appear in the XPath. This poses a potential problem because more than
one customer may live in the same city. To account for that, we use the XQuery function
fn:distinct-values(). As the name implies, it will only return a
sequence of the distinct city elements, city names will not be repeated.
This sequence is assigned to the variable $c.
Before the final step, the cities in $c are ordered. The result
is then returned. One of the features that makes XQuery so powerful is that the format of
the data returned can be highly customized. Since we currently have a sequence of city names,
we wrap each city in a <cities> element. This is a sequence
of elements, yet not a valid XML document because it doesn't have a root element. To ensure
that a valid XML document is returned, the entire result is placed within a <cities> element so that the data now looks like
Listing 12 in ascending order.
Listing 12. Typical XML data returned
<cities>
<city>Atlanta</city>
<city>Augusta</city>
<city>Austin</city>
<city>Baton Rouge</city>
<city> ... </city>
</cities>
|
To learn more about XQuery in DB2 9, please refer to the DB2 XML Guide (a link to which is found in the Resources section at the end of this article).
Now that we have our listing of cities as an XML document, let's insert this document back
in to the database. A developer would insert the XML document in just the same way that any
type of data is inserted into the database.
Listing 13. Inserting XML documents
class DocumentsController < ApplicationController
[...]
def upload
[...]
@document = Document.new(params[:document])
@subject = params[:subject_name] && params[:subject_name].empty? ?
Subject.new :
Subject.find_by_name(params[:subject_name])
Document.transaction do
User.find(session[:user_id]).documents << @document
@subject.documents << @document
@subject.size = @subject.documents.size
if @subject.new_record?
@subject.name = params[:subject][:name]
@subject.tag = params[:subject][:tag]
@subject.description = params[:subject][:description]
@subject.save
end
if @document.save
flash[:notice] = "Document #{@document.name} successfully created."
[...]
end
|
As the number of customers at any given moment is not static, we may want to update the list
of customer cities periodically, possibly weekly or monthly, so that the list stays up-to-date.
In order to do that, you would execute the same XQuery statement. The document that is currently in the
database would then be replaced by the latest version of the XML file based on its id.
Notice how the existing document is retrieved through a hidden :id parameter passed in the edit view
form (/app/views/documents/_form.rhtml) to be used in the DocumentsController:
Listing 14. Updating XML documents
class DocumentsController < ApplicationController
[...]
def update
@document = Document.find(params[:document][:id])
if @document.update_attributes(params[:document])
flash[:notice] = 'Document was successfully updated.'
redirect_to :action => 'show', :id => @document
else
render :action => 'edit'
end
end
[...]
end
|
Internally, DB2 deallocates the existing XML data pages, and the new value is inserted again. So effectively, the entire XML document is replaced with the new, updated document.
Finally, there may come a time when this data is no longer useful or it may be handled by
some other mechanism. For example, the DBA may insist on a separation of concerns, such as
using a different XML column in a different table. To remove the document, issue a DELETE statement
with an appropriate WHERE predicate. In Ruby, it may be done in this fashion:
Listing 15. Deleting XML documents
class DocumentsController < ApplicationController
[...]
def destroy
Document.find(params[:id]).destroy
redirect_to :action => 'list'
end
end
|
Querying XML data
Beyond basic CRUD operations, the power of the XQuery language allows you to be very precise
in how you query and how the data is returned. For example, while having the list of cities where
customers reside is useful, the marketing department might be more interested in which product
categories are active. This can further narrow the focus of a marketing initiative that may not
have the budget to blanket all products in a marketing campaign. One metric that might indicate
active products is whether recent purchases were made in a particular product category or more
specifically, whether these categories were active in a particular time period.
We will start our data mining efforts by posing the question: Which product categories where
purchased between April 15, 2007 and April 30, 2007?
Listing 16 is the query used to answer the question the Marketing team posed:
Listing 16.
XQuery to obtain product category purchases between April 15 and April 30
XQUERY
declare default element namespace "http://www.ibm.com/developerworks";
<categories>
{ let $categories := fn:distinct-values(
for $c in db2-fn:xmlcolumn( "XML_CONTENTS.DATA")/marketinfo/sales/customer
where xs:date($c/last_purchase) > xs:date("2007-04-15")
and xs:date($c/last_purchase) <= xs:date("2007-04-30")
return $c/categories/category/@type)
for $c in $categories
return <category>{$c}</category>
}
</categories>
|
Like previous examples, we want to return a valid XML document, so we will wrap
the result of the XQuery with the <categories> root XML element.
The enclosed XQuery is separated into two pieces. First, a "let" statement serves as a binding
of an expression to the variable $categories. This variable is then used
as the context for a for statement. Let's examine each of these
individually.
The "let" statement is itself composed of a FLWOR (for, let, where, order by, and return)
XQuery expression. As you may have noticed, a FLWOR expression is often compared with a
SELECT-FROM-WHERE block in SQL. Here we are iterating over every customer:
for $c in db2-fn:xmlcolumn("XML_CONTENTS.DATA")
/marketinfo/sales/customer, where the purchase date is greater than April 15, 2007 and less than April 30, 2007. If the purchase
meets this criteria, marketing wants to know about it:
where xs:date($c/last_purchase) > xs:date("2007-04-15")
and xs:date($c/last_purchase) <= xs:date("2007-04-30").
Of those purchases, we return the category type stored as an XML attribute:
$c/categories/category/@type. As in the last example, we use the
fn:distinct-values() function to avoid duplicate purchases in the
same category. This list of categories is what is bound to the variable
$categories.
Currently $categories stores a sequence of distinct categories,
but since we are returning XML we will wrap each value in the sequence with the
<category> XML element. The final output is similar to the
output shown in Listing 17, and we know that it has been filtered to include purchases from
April 15 to April 30.
Listing 17.
Sample data of product categories purchased between April 15 to April 30
<categories>
<category>Home</category>
<category>Electronics</category>
<category>Apparel</category>
<category>Gifts & Flowers</category>
<category>Baby</category>
</categories>
|
This XQuery can be parameterized in order to make the query execute over any range of dates
that Marketing desires. Further, to encapsulate this query, you can push the functionality down
into a stored procedure, abstracting away the XQuery details for the Rails developer.
In Figure 3 below the Documents list view /app/views/documents/list.rhtml
layout includes also the pre-built report forms allowing to exercise XQuery and XPath searches.
Figure 3. Team Room Documents repository and pre-built marketing reports
Querying XML data on steroids
Now that we have gotten our feet wet, Marketing has become a bit more demanding in their thirst
for data. They come back and wonder how many products of a particular category type have been sold
in a region containing a particular zip code. The zip code may be an area where there have been a
large number of purchases or perhaps a large number of complaints. This can help marketing and
product managers determine which products are popular in which sales regions and correlate them with
the zip code in question. Perhaps marketing can discern which products to discontinue
or on which zip codes to focus marketing efforts based on popularity of a product in a region.
We will use the SQL/XML function XMLQUERY() to determine this. Querying with SQL/XML
provides the joint power of both. For example, SQL/XML allows you to do the
following operations:
- Use predicates on both relational and XML data
- Access and extract fragments of XML data
- Use aggregation and grouping of XML data on the SQL level
- Join relational and XML data
- Pass parameters to XQuery expressions
We will use (1), (2), and (5) in our XQuery. For simplicity, let's not parameterize the
example and assume we are interested in the "Jewelry" category type in zip code "79081".
But to the joy of the Marketing team, the application code provided (see the Downloads section at the end of this article) does contain a version of
the query that allows parameters for any category type or zip code.
Listing 18. SQL/XML query to obtain total category type sold for a particular zip code
select name, xmlquery(
'declare default element namespace "http://www.ibm.com/developerworks";
let $total := sum (
for $i in $t//category
let $sum := count($i/item)
where $i/@type = "Jewelry"
return $sum
)
return <total>{$total}</total>'
passing data as "t"
) as data
from teamroom.documents
where xmlexists(
'declare default element namespace "http://www.ibm.com/developerworks";
$t/marketinfo/sales/customer/address[zip = "79081"]'
passing data as "t"
)
|
The SELECT statement allows us to return one row per region identified by the name column.
Additionally, for each row, we will return a valid XML document containing a <total>
root element only. Because we are calling the XQuery inside the SQL language, we must use the SQL/XML
function XMLQUERY() and let the XQuery engine know the context upon
which to operate. In our previous examples, this was performed using the
db2-fn:xmlcolumn() function. Here, we use the clause passing data
as t. We identify data as the XML column to operate on and
assign it to the variable t. Wherever the variable $t appears in
the XQuery, DB2 knows to substitute the XML document in the data
column of the current row of the SELECT statement.
Another distinction that made at the SQL level is that we don't want the
XMLQUERY() to operate on all XML documents in the column.
We would like to limit the query to rows that contain a customer whose zip code is "79081".
Once we find those rows, we'll execute the XQuery only on them. Specifically, all customer
purchases in this row are queried, whether the zip code is "79081" or not.
The filtering is done with the SQL WHERE clause using the SQL/XML
XMLEXISTS() function predicate. The XMLEXISTS
predicate determines whether an XQuery expression returns a sequence of one or more elements.
In our example, the XQuery expression that we pass to XMLEXISTS()
is a simple XPath expression where $t is the XML column data:
$t/marketinfo/sales/customer/address[zip = "79081"].
The XPath expression can be interpreted as: return a sequence of address elements whose zip
is "79081". If the specified XPath returns an empty sequence, XMLEXISTS
returns false. Otherwise, it returns true. Now that SQL has helped obtain a subset of XML
documents, the XQuery uses a let binding to a variable
$total. $total is assigned the result of
a FLWOR expression. For any <category> element in the
XML document (for $i in $t//category) whose type is "Jewelry"
(where $i/@type = "Jewelry") count the number of sold and assign
it to $sum (let $sum := count($i/item)) and
return that sum (return $sum).
This returns a sequence of sums for each customer. The result is passed to the
sum() function to get the total over all customers for that
particular category. We now the $total. Since we want to
return XML, we return the whole wrapped in a <total> element
(return <total>{$total}</total>'). In the end, the result
is similar to: return <total>198</total>'.
For further details on and examples of using of XQuery in DB2 9, please refer to the DB2 XML Guide (see the Resources section for a link to this document.)
Shredding XML data
Consider a scenario where a newly built application is expected to integrate with legacy applications and repositories that are not enabled for native XML.
Information stored in an XML document may need to be placed in a relational table object.
Let's consider a scenario in which the results of the market analysis are expected to be contributed into a relational store for analysis by a legacy application.
This will require a shredding of the XML data so it can be inserted into relational tables.
As shown, pureXML provides a multitude of facilities to work directly with XML data.
When required, pureXML also provides the capability to perform shredding, also known as decomposition.
DB2 9 data server provides the following built-in tools to help facilitate XML shredding.
1. XDBDECOMPXML stored procedure
The annotated XML schema decomposition feature in DB2 9 can be used to decompose XML documents into relational tables.
As the name implies, it uses annotations in the XML schema as the mapping language to map information in an XML document to relational tables.
Since it requires an XML schema, the XML schema document must be stored in the DB2 XSR and designated for decomposition.
Decomposing or "shredding" an XML document into the mapped relational columns is done through a DB2 stored procedure (xdbDecompXML) call or using a
Command Line Processor (CLP) command.
For details regarding XML shredding support through the Visual Studio Add-in, please refer to "Introduction to annotated XML schema decomposition using the DB2 Visual Studio 2005 Add-in".
Another way to annotate XML schema is to use the DB2 Developer Workbench (DWB).
The DWB, a free download, is a comprehensive environment for creating, editing, debugging, deploying,
and testing DB2 database applications, including developing stored procedures and user-defined functions.
Please see the Downloads section at the end of this article for download information.
One of the components of the DWB is the annotated XML schema decomposition Mapping Editor.
Using a simple and intuitive graphical interface, it allows you to map relationships between an XML schema and a relational schema.
Graphically mapping XML elements or attributes to relational columns in DB2 automatically annotates the XML schema document.
Once the XML schema is saved and then registered in the XSR, you are ready to decompose XML documents into DB2.
A full discussion of the XML schema annotations and the xdbDecompXML set of stored procedures is beyond the scope of this article.
To learn more about annotated XML schema decomposition, including such advanced features as conditional decomposition based on content
or specifying content transformation to be applied before insertion, please refer to the DB2 9 XML Guide publication (see the Resources section of this article).
For those familiar with XML Extender and its method for decomposition, please also refer to the developerWorks article entitled
"From DAD to Annotated XML Schema Decomposition" by Mayank Pradhan for further information.
2. XMLTABLE SQL table function
XMLTABLE is a SQL table function that returns a table from the evaluation of XQuery expressions on an XML document.
The table that is returned can contain columns of any SQL data type, including XML.
Using XMLTABLE, values retrieved from within XML documents, when combined with an INSERT statement,
can then be inserted into a relational table. This allows for the same functionality as annotated XML schema decomposition.
This is often referred to as an "Insert-from-XMLTABLE" statement.
This provides a simple form of XML shredding and is ideal for storing fragments of XML documents in various columns of relational tables.
The following statement executes an XQuery expression and return values as a table from the
DATA column. Each row shows the city, state, zip code and the
last purchase date associated with the particular customer.
Listing 19. The XMLTABLE function
SELECT X.CITY, X.STATE, X.ZIP, X.LAST_PURCHASE FROM
TEAMROOM.XML_CONTENTS,
XMLTABLE (XMLNAMESPACES (DEFAULT 'http://www.ibm.com/developerworks'),
'db2-fn:xmlcolumn("XML_CONTENTS.DATA")//customer'
COLUMNS
"CITY" CHAR(16) PATH './address/city',
"STATE" CHAR(16) PATH './address/state',
"ZIP" CHAR(6) PATH './address/zip',
"LAST_PURCHASE" DATE PATH './last_purchase') as X
|
The output of the above XQuery looks like this example:
Listing 20. Sample XMLTABLE output
Baton Rouge LA 77888 03/10/2007
Baton Rouge LA 14257 01/07/2007
Richmond VA 78045 01/26/2007
Oklahoma City OK 71107 04/13/2007
Tallahassee FL 41720 04/25/2007
Richmond VA 39591 03/25/2007
Richmond VA 36522 03/23/2007
Richmond VA 32230 02/12/2007
Charleston WV 33015 02/12/2007
Columbia SC 72647 01/11/2007
Raleigh NC 11238 04/02/2007
Nashville TN 21245 01/06/2007
Fankfort KY 53793 04/18/2007
Austin TX 35462 03/13/2007
Columbia SC 68359 01/01/2007
Jackson MS 25770 01/20/2007
Little Rock AR 46342 03/10/2007
Tallahassee FL 54306 01/20/2007
Charleston WV 44339 02/20/2007
Frankfort KY 92403 02/27/2007
<etc ........>
|
Assuming you have defined a CUSTOMER_INFOS table with proper SQL types defined,
you can then insert this customer data extracted from the XML
marketing information (in tabular format ) into a relational table by wrapping an INSERT statement around the SELECT statement like this:
Listing 21. Inserting decomposed XML data into relational table
INSERT INTO TEAMROOM.CUSTOMER_INFOS
SELECT X.CITY, X.STATE, X.ZIP, X.LAST_PURCHASE FROM
TEAMROOM.XML_CONTENTS,
XMLTABLE (XMLNAMESPACES (DEFAULT 'http://www.ibm.com/developerworks'),
'db2-fn:xmlcolumn("XML_CONTENTS.DATA")//customer'
COLUMNS
"CITY" VARCHAR(16) PATH './address/city',
"STATE" CHAR(16) PATH './address/state',
"ZIP" CHAR(6) PATH './address/zip',
"LAST_PURCHASE" DATE PATH './last_purchase') as X
|
In addition, Ruby is shipped with an XML processor called Ruby Electric XML (REXML).
It can be used to tree parse and stream parse XML data. Please see the Resources section of this document for further details.
Living the Web 2.0 dream
Part 1 of this series introduced the ability for a user to subscribe to updates of
particular interest. When an update is made, the user receives and email notification.
How quaint and so 20th century! In this new world of Web 2.0, someone might be interested
in receiving these notifications as an RSS or Atom feed in their favorite reader.
Using DB2 9 pureXML, it is possible to generate such a feed that someone can easily poll
using their favorite feed reader. We do this using the SQL/XML XML publishing functions.
XML publishing functions are used to construct XML nodes and documents. Both relational and XML
data can be used.
Note: Publishing functions are sometimes also referred to as constructor functions.
The XML publishing functions available in DB2 9 are:
- XMLNAMESPACES
- XMLELEMENT
- XMLATTRIBUTE
- XMLFOREST
- XMLDOCUMENT
- XMLCONCAT
- XMLCOMMENT
- XMLPI
- XMLTEXT
Let's provide an Atom feed that notifies users when subscriptions of interest are updated.
While we chose an Atom feed, DB2 is equally able to generate an RSS feed. It is just a matter
of constructing the feed using the appropriate XML schema of the feed.
Listing 22. The Atom feed stored procedure
CREATE PROCEDURE GET_ATOM_FEED ( )
DYNAMIC RESULT SETS 1
------------------------------------------------------------------------
-- SQL Stored Procedure
------------------------------------------------------------------------
P1: BEGIN
-- Declare cursor
DECLARE cursor1 CURSOR WITH RETURN FOR
SELECT XMLSERIALIZE(
XMLDOCUMENT(
XMLELEMENT (NAME "feed",
XMLNAMESPACES(DEFAULT 'http://www.w3.org/2005/Atom'),
XMLCONCAT (
XMLELEMENT (NAME "id", 'http://localhost:3000/documents'),
XMLELEMENT (NAME "title", 'Teamroom Documents'),
XMLELEMENT (NAME "updated", CURRENT TIMESTAMP),
XMLELEMENT (NAME "link",
XMLATTRIBUTES('http://localhost:3000/documents/atom_feed'
as "href", 'self' as "rel")),
XMLELEMENT (NAME "author",
XMLCONCAT(
XMLELEMENT (NAME "name", 'TeamRoom'),
XMLELEMENT (NAME "email", 'teamroom@developerWorks.ibm.com')
)
),
XMLAGG (
XMLELEMENT (NAME "entry",
XMLCONCAT (
XMLELEMENT (NAME "title", name),
XMLELEMENT (NAME "id", 'http://localhost:3000/documents/show/'
|| CHAR(id)),
XMLELEMENT (NAME "updated", updated_at),
XMLELEMENT (NAME "link", 'http://localhost:3000/documents/show/'
|| CHAR(id)),
XMLELEMENT (NAME "category", category),
XMLELEMENT (NAME "summary", content_type),
XMLELEMENT (NAME "content", XMLATTRIBUTES('text' as "type"), content)
)
)
)
)
)
)
AS CLOB INCLUDING XMLDECLARATION
)
FROM (SELECT d.id as id, d.name as name, d.content_type as content_type,
d.updated_at as updated_at, s.name as category, s.description as content
FROM DOCUMENTS d, SUBJECTS s
WHERE d.subject_id = s.id
ORDER BY d.updated_at DESC
FETCH FIRST 10 ROWS ONLY)
AS doc_list;
-- Cursor left open for client application
OPEN cursor1;
END P1 |
We also chose to provide the functionality as a stored procedure so that the Rails developer
does not have to worry about the details of generating the Atom feed. The developer will simply
call the stored procedure.
Listing 23.
Calling the Atom feed stored procedure
class Document < ActiveRecord::Base
[...]
def atom_feed
feed = Document.find_by_sql("call teamroom.get_atom_feed()")
content = feed[0].attributes["1"]
send_data(content,
:filename => 'TeamRoomFeed.atom',
:type => 'text/xml',
:disposition => "inline")
end
[...]
end
|
A full discussion of the DB2 XML publishing functions and the Atom Syndication Format Protocol
is beyond the scope of this article. Please see the Resources section for further information
and some useful links. Additionally, the application available for download contains the full
implementation of the Atom feed, including the stored procedure and the application code to
generate the feed.
Conclusion
DB2's pureXML ability to store XML data in its native hierarchical format, allows applications to easily
talk the common language of XML while providing the performance, scalability,
reliability and availability advantages that come with the DB2 relational database management system.
When combined with the simplicity and flexibility of the Ruby on Rails framework, a developer can build,
deploy, and maintain world class Web 2.0 applications quickly and easily. Development options are flexible
to allow shifting information from the relational to the hierarchical format as business needs require.
While shifting from one data format to another, performance remains consistently fast because you're employing
the same optimization technology, and all other reliability and availability features apply: backup,
recovery and scalability provided by DB2 technology.
DB2 pureXML not only supports the first generation of native XML support, but it also stores
post-parsed data on disk with node-level granularity which allows high query performance through indexing
(the actual document node position), and the usage of XQuery and XPath primitives built into the DB2 query engine.
This makes for a very good fit in the Ruby on Rails runtime which provides one of the friendliest set of XML libraries
(for instance, REXML, ROXML), and allows for rapid integration of XML-based data stores into a Web application.
Download | Description | Name | Size | Download method |
|---|
| Team room sample code | Teamroom2.zip | 10KB | HTTP |
|---|
Resources Learn
-
"DB2 and Ruby on Rails, Part 1: Getting started with DB2 and Ruby on Rails"
(developerWorks, May 2007) introduces the Starter Toolkit for DB2 on Rails and various methods for installing the IBM_DB driver, as well as Rails migration with DB2.
-
"Ruby on Rails and J2EE:Is there room for both?"
(developerWorks, July 2005) compares two Web application frameworks: J2EE vs. Ruby on Rails.
-
"Crossing borders: Exploring Active Record"
(developerWorks, March 2006) explores Active Record, the persistence engine behind Ruby on Rails.
-
"Crossing borders: What's the secret sauce in Ruby on Rails?"
(developerWorks, May 2006) explores what makes Rails productive and looks at Rails-inspired ideas
that should get more attention within the Java community.
-
"An introduction to Ruby on Rails for DB2 developers"
(developerWorks, June 2006) is a step-by-step introductory article to Ruby on Rails.
-
"Crossing borders: Rails migrations"
(developerWorks, August 2006) provides a good overview of Rails migration and shows Rails migration with MySQL.
-
"Make Ruby on Rails easy with RadRails and Eclipse" (developerWorks, September 2006) introduces an Eclipse-based development tool for Ruby on Rails.
- See the ActiveRecord association documentation.
-
"From DAD to annotated XML schema decomposition"
by Mayank PradhanPradhan (developerWorks, Apr 2006) is a guide for migrating from XML Extender decomposition to annotated XML schema decomposition.
-
Ruby on Rails and XML
by Daniel Wintschel (developerWorks, April 2007) is a guide to manipulating XML data in Ruby.
- Make the DB2 9 pureXML Guide your DB2 9 pureXML reference.
- Browse the
technology bookstore
for books on these and other technical topics.
-
Visit the developerWorks resource page for DB2 for Linux, UNIX, and Windows to read articles and tutorials and connect to other resources to expand your DB2 skills.
-
Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for the community.
Get products and technologies
Discuss
About the authors  | |  | John Chun is a specialist of the DB2 Advanced Support team working in the area of application development and tooling.
He has worked in the IBM DBT Toronto lab for 7 years resolving DB2 application issues with various languages including
Java, C, C++, Perl, REXX, C# and others. John has worked on a number of projects involving the DB2 CLI and OLEDB driver,
as well as the .NET data provider. John is a DB2 Certified Solutions Expert and Certified Websphere Administrator. |
 | |  | Christine Law is senior DB2 specialist and an IBM Certified Expert at
the IBM Toronto Lab, where she is responsible for resolving DB2 applications
problems and defects. She has extensive application development experience on
Linux, UNIX and Windows platforms with different programming languages and
scripting languages, specializing in JDBC, SQLJ, stored procedures and
embedded SQL. Her recent interests includes Open Source technologies such
as AJAX and Ruby. |
 | |  | Salvador has been working at IBM since 2002 at the Silicon Valley Labs, in San Jose, CA, initially developing J2EE applications for the DB2 z/OS database product. He currently spends his time working on runtime and tooling technologies for IBM's data server products, including DB2 pureXML. |
 | |  | Alex Pitigoi is an advisory software engineer at the IBM Toronto Lab. He has worked on various software development projects in the Information Management since 1998, focusing on Web technologies and database administration. Most recently, he drove the development of the SQLModel project, now incorporated into the Eclipse Data Tools Project, as well as the overall architecture for the database administrative Web Tools across multiple IBM data servers. Alex also worked on the DB2 Satellite Administration Center, the IBM Express Runtime, and lead the development of the first set of Web Tools delivered for DB2. His current focus is IBM's data servers enablement for new open source technologies (Ruby, Python, PHP). |
Rate this page
|  |