WebSphere DataPower and DB2 pureXML, Part 1: XML schema and content validation using WebSphere DataPower and DB2 pureXML

Understand how IBM® DB2® pureXML™ and the IBM WebSphere® DataPower® SOA Appliance can complement each other to realize powerful applications, and provide flexible and speedy access to validated XML documents. The WebSphere DataPower Appliance performs XML validation, and the DB2 pureXML database manages XML storage, indexing, and querying.

Share:

Susan Malaika (malaika@us.ibm.com), Senior Technical Staff Member, IBM

Susan MalaikaSusan Malaika is a senior technical staff member in IBM's Information Management Group (part of IBM Software Group). Her specialties include XML, the Web, and databases. She has developed standards that support data for grid environments at the Global Grid Forum. She has also co-authored a book on the Web and published articles on transaction processing and XML. She is a member of the IBM Academy of Technology.



Christian Pichler (cpichle@us.ibm.com), Data Server Solutions (Co-op), IBM

Christian PichlerChristian Pichler is a co-op from the Technical University of Vienna in Austria, where he is working on his thesis for a double Master's degree in Computer Engineering and Computer Science with a focus on health care. For IBM, Christian is working on technologies for storing XML in DB2, and accessing it through Web services, feeds, and XForms. He is specializing in XML standards for health care.



29 May 2008

Also available in Vietnamese

Introduction

Industry formats are an important part of standardized information exchange between different information systems across the industry including healthcare, insurance, financial business. These formats are based on XML. XML Schema defines the structure of documents, to which all derived documents must comply. In addition to XML Schema, another language based on XML stylesheet transformations called Schematron can be used to specify rules to make assertions on the content of XML documents. Even though DB2 pureXML is capable of XML Schema registration, XML document validation and XML stylesheet transformations, WebSphere DataPower SOA Appliances can compliment DB2 pureXML solutions. For example, they can offload XML validation and transformation work from the DB2 processor by utilizing IBM WebSphere DataPower's XML capabilities in addition to DataPower's routing and security features.

This article demonstrates native storage of XML documents in a DB2 pureXML database after the documents have been successfully validated through a DataPower SOA Appliance, as shown in Figure 1. The validation performed through the DataPower box includes the validation of XML documents against their XML schema and content validation using Schematron. (For more details on Schematron, please refer to the Resources section of this article.)

Figure 1. Simplified scenario
Simplified scenario

A major benefit of this solution is that the WebSphere DataPower Appliance performs all validation steps, appropriate error handling, and the insertion of the XML document into the DB2 pureXML database, off-loading the validation steps from the database processor. Please note that the insertion is performed only if the document has successfully passed all validation steps.

This is the first of two articles in this series on WebSphere DataPower and DB2 pureXML. The second article will describe how DB2 pureXML can be used as an audit log that is easy to access and query for XML messages that are being routed, transformed, or validated through WebSphere DataPower.


Setting up the scenario

The following sections provide a detailed overview on how the scenario is set up, including a sample XML schema, sample XML documents, a Schematron example, a DB2 pureXML database, a Data Web Service, and the configuration of a WebSphere DataPower SOA Appliance.

Step 1: XML schema, XML documents, and Schematron

Any industry format based on XML can be used in this scenario, as they are, for example, used in the free and publicly available DB2 pureXML online demonstration "Industry Formats and Services with pureXML" (see Resources). This article uses a simple XML schema, as shown in Listing 1, and corresponding XML documents, as shown in Listings 2, 3, and 5 are created.

Listing 1. Sample XML schema (simple.xsd)
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="identification" type="xs:integer"/>
        <xs:element name="name">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="first" type="xs:string"/>
            <xs:element name="last" type="xs:string"/>
          </xs:sequence>
        </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Based on the XML schema defined above, the following two XML sample documents are provided:

  • The first sample document, as shown in Listing 2, is valid.
  • The second sample document, as shown in Listing 3, contains well-formed XML, but is an invalid document according to the corresponding XML schema since the required element <identification /> is missing.
Listing 2. Sample XML document 1 (simple_1.xml)
<?xml version="1.0" encoding="utf-8"?>
<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
        xsi:noNamespaceSchemaLocation="simple.xsd">
  <identification>1</identification>
  <name>
    <first>christian</first>
    <last>pichler</last>
  </name>		
</person>
Listing 3. Sample XML document 2 (simple_2.xml)
<?xml version="1.0" encoding="utf-8"?>
<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:noNamespaceSchemaLocation="simple.xsd">
  <name>
    <first>christian</first>
    <last>pichler</last>
  </name>		
</person>

To go a step further and validate the content of XML documents, a language called Schematron is used. Schematron is a declarative validation language that enables the checking and cross-checking of XML content through the specification of rules in XPath, and of custom error messages should the rules fail. This article does not cover Schematron details, but it is important to know that Schematron is driven by XML stylesheet transformations. First of all, Schematron rules need to be defined in an XML format, as shown in Listing 4. This "rules document" is transformed into an XSL stylesheet using the Schematron XSL stylesheet. The resulting, new XSL stylesheet is then applied to each XML document and will, if the content is not as expected, produce the custom error messages.

Listing 4. Schematron implementation (simple.sch)
<?xml version="1.0" encoding="utf-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
  <title>Simple Schematron Validation Example</title>
  <pattern name="Personal Information">
    <rule context="/person/name/first">
      <report test="text() = 'christian'">
        First name must not be 'christian'!
      </report>
    </rule>
  </pattern>
</schema>

The above example looks up the first name in an XML document, checks whether the first name equals "christian" or not, and prints a validation failure message if it does. In case of validation failure, the message would be First name must not be 'christian'!.

Finally, a sample XML document is created that is valid with respect to validation against the XML schema and that satisfies the Schematron rule defined above:

Listing 5. Sample XML document 3 (simple_3.xml)
<?xml version="1.0" encoding="utf-8"?>
<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="simple.xsd">
  <identification>3</identification>
  <name>
    <first>keith</first>
    <last>wells</last>
  </name>
</person>

Step 2: DB2 pureXML database and Data Web Services

This section describes the setup of the DB2 pureXML database that serves as a data storage after the XML documents have been validated. As shown in Listings 6 and 7, the setup consists of a database containing one table and one stored procedure only:

Listing 6. Setup of DB2 pureXML database (setup_environment.db2)
DROP DATABASE HOSPITAL@

CREATE DATABASE HOSPITAL USING CODESET UTF-8 TERRITORY US@

CONNECT TO HOSPITAL@

CREATE SCHEMA DB2ADMIN@

CREATE TABLE DB2ADMIN.PATIENT
	(ID INT PRIMARY KEY NOT NULL GENERATED ALWAYS AS IDENTITY, 
	 COMMENT VARCHAR(500), 
	 RECORD XML)@
Listing 7. Setup of the stored procedure to insert record into the Patient table
CREATE PROCEDURE insertPatient (IN xmlRecord XML)
	SPECIFIC insertPatient
	DYNAMIC RESULT SETS 1
P1: BEGIN
    
    INSERT INTO DB2ADMIN.PATIENT (COMMENT, RECORD) VALUES ('', xmlRecord);
END P1@

The stored procedure insertPatient is then exposed through a Data Web Service, as shown in Figure 2, which means that the stored procedure can be called by SOAP or REST requests.

Figure 2. DB2 pureXML database configuration overview
DB2 pureXML database configuration overview

This article does not cover anymore details on how to create Data Web Services. If you need more information, please read the article "Generate Web Services for DB2 9 pureXML" (developerWorks, June 2007).

Step 3: WebSphere DataPower SOA Appliance

A WebSphere DataPower SOA Appliance is a versatile device that can be used to, among many other functions, process XML documents in various ways. The features of the device that are discussed in this article include the validation of XML documents against an XML schema and XSL transformations.

Before going into details on the configuration itself, this article provides some theoretical background. The WebSphere DataPower SOA Appliance can serve in many different ways, including XML Firewall, Web Services Proxy, XSL Accelerator, and many others. The scenario in this article uses the XML Firewall. Every XML Firewall contains at least one processing policy, and all of those processing policies contain at least one processing rule. Within every processing rule, simple processing actions can be specified, which are, for example, XML schema validation, XPath-based routing, encryption, XML stylesheet transformations, and many others.

The first step is to configure XML schema validation. In other words, XML documents being sent to this policy on the DataPower Appliance are validated against a particular XML schema. Configuring XML schema validation is achieved by adding an XML schema validation processing action to the processing rule, as shown in Figure 3, Number 3:

Figure 3. XML Firewall configuration of the DataPower SOA Appliance
XML Firewall configuration of the DataPower SOA Appliance

If the validation action fails, the DataPower Appliance will respond to the request with a failure message and the HTTP 500 error code back to the client that initially sent the XML document. The standard error message for this case does not contain any specific information on why the validation action failed. To provide more information, this example includes an on-error action in the rule, as shown in Figure 3, Number 2. The on-error action causes the policy to call another rule named Rule #2 (shown in Figure 4, Number 1), if any fatal error occurs during any action in the rule:

Figure 4. XML Firewall configuration of the DataPower SOA Appliance
XML Firewall configuration of the DataPower SOA Appliance

If Rule #2 is called, it will execute the XSL stylesheet, as shown in Listing 8:

Listing 8. Sample XML document 3 (simple_3.xml)
<?xml version='1.0' encoding='UTF-8' ?>
<xsl:stylesheet version="1.0" 
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
	xmlns:dp="http://www.datapower.com/extensions" 
	extension-element-prefixes="dp" 
	exclude-result-prefixes="dp">
 <xsl:output method="text"/>
 <xsl:template match="/">
   <xsl:value-of select="dp:variable('var://service/error-message')"/>
 </xsl:template>
</xsl:stylesheet>

The XSL stylesheet shown in Listing 8 obtains the specific error message that explains why the validation action failed and returns it to the client that issued the request.

Now there are two items left that the DataPower Appliance needs to perform. The first one is applying the Schematron XSL stylesheet to the incoming request XML document. If the Schematron stylesheet action produced an error message, the error message needs to be sent back to the client that initially sent the request XML document. If there was no error, the DataPower Appliance should forward the XML document to the DB2 pureXML Data Web Service, which will then insert the valid XML document into the database. This is achieved through another XSL stylesheet transformation action, as shown in Figure 3, Number 4, which executes the XSL stylesheet, shown in Listing 9:

Listing 9. XSL stylesheet executing Schematron XSL stylesheet and performing content-based routing, based on the Schematron processing result (content_based_routing.xsl)
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
	xmlns:dp="http://www.datapower.com/extensions" 
	extension-element-prefixes="dp" 
	exclude-result-prefixes="dp">	

<xsl:output method="xml" />	

  <xsl:template match="/">	
    <xsl:variable name="schematronResult">
      <error>
        <xsl:value-of select="dp:transform('local:///simple.xsl', .)" />	
      </error>
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="$schematronResult/error/text()">
        <dp:send-error override="true">
          <xsl:copy-of select="$schematronResult" /> 
        </dp:send-error>
      </xsl:when>
      <xsl:otherwise>
      <dp:url-open 
          target="http://db2:8080/healthcarepatient/rest/patient/insertPatient"  
          response="xml" data-type="xml" content-type="text/xml">
        <q0:insertPatient xmlns:q0="urn:example">
          <q0:_xFFFF_xmlRecord>
            <xsl:copy-of select="." />
          </q0:_xFFFF_xmlRecord>
        </q0:insertPatient>
      </dp:url-open>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

It is also important to know whether the insert into the DB2 pureXML database through the Data Web Service was successful or not. Therefore, the DataPower XML Firewall will forward the response message from the DB2 Data Web Service to the client that initially sent the request XML document to the DataPower Appliance, indicating whether the insertion operation was successful or not.


Demonstration

cURL is a command-line tool to transfer files to a specific URL supporting various protocols including HTTP. The tool is available for download.

After successfully setting up this example, it's now time to see the DataPower SOA Appliance, DB2 pureXML, and Data Web Services together in action. The XML documents that have been previously defined in this article are used in this demonstration.

Example 1

Listing 10. Submitting the first XML document
cpichle@DAIRYFARM /tmp
$ curl --data-binary @simple_1.xml http://datapowerbox:2055/
<?xml version="1.0" encoding="UTF-8"?>
<error>First name must not be 'christian'!
</error>
cpichle@DAIRYFARM /tmp
$

Listing 10 shows that for the XML document simple_1.xml, the XML schema validation action must have been successful. However, the presence of the <error> tag indicates that the Schematron validation failed because the first name supplied was 'christian'.

Example 2

Listing 11. Submitting the second XML document
cpichle@DAIRYFARM /tmp
$ curl --data-binary @simple_2.xml http://datapowerbox:2055/
http://datapowerbox:2055/: cvc-particle 3.1: in element person with anonymous 
type, found <name> (in default namespace), but next item should be 
identification
cpichle@DAIRYFARM /tmp
$

Listing 11 shows how the DataPower Appliance responds to receiving an XML document that does not conform to the schema. The error message indicates that the XML document was not valid when compared against the XML schema, and details on why the schema validation failed are included in the error message returned by the appliance.

Example 3

Listing 12. Submitting the third XML document
cpichle@DAIRYFARM /tmp
$ curl --data-binary @simple_3.xml http://datapowerbox:2055/
<?xml version="1.0" encoding="UTF-8"?>
<ns1:insertPatientResponse xmlns:ns1="urn:example" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
cpichle@DAIRYFARM /tmp
$

After submitting the third example XML document (simple_3.xml), another response message is returned. The message returned has been generated by the Data Web Service, indicating that the insert of the XML document into the DB2 pureXML database was successful. This implies that the supplied document passed both the XML schema and Schematron validation steps.


Summary

This short and simple article has shown how DB2 pureXML and the WebSphere DataPower SOA Appliance can compliment each other to realize powerful applications, where the WebSphere DataPower appliance performs XML validation, and the DB2 pureXML database manages the XML storage, indexing and querying. Both XML structure validation (through XML schema) and content validation (through Schematron) have been described. The combination of the two products, WebSphere DataPower and DB2 pureXML, provides flexible and speedy access to validated XML documents.


Acknowledgement

Thank you to Bob Callaway and others who have contributed to this work by providing their knowledge and guiding advice.


Download

DescriptionNameSize
Downloads for this articledownload.zip3KB

Resources

Learn

Get products and technologies

  • Industry Formats and Services with pureXML: Download a great variety of examples, for free! Each example illustrates how to work with XML-based Industry Formats and pureXML. The examples show how to register an XML schema, how to perform validation of XML instance documents, how to query XML data using XQuery or SQL/XML, and much more.
  • IBM Data Studio: Download the development environment used to develop Data Web Services, for free.
  • DB2 Express-C: Download the free version of DB2, which includes the core functionality as the other Data Servers, such as the pureXML technology. DB2 Express-C is free to develop, deploy and distribute.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere, SOA and web services
ArticleID=310982
ArticleTitle=WebSphere DataPower and DB2 pureXML, Part 1: XML schema and content validation using WebSphere DataPower and DB2 pureXML
publish-date=05292008