Skip to main content
 
developerworks > Community >  Dashboard > ETTK Community > MiddleTier
developerWorks
Log In   View a printable version of the current page.
Overview New to Forums Wikis
MiddleTier
Added by db2xml42, last edited by db2xml42 on Jan 12, 2011  (view change)
Labels: 
(None)

Comparing XML Processing Performance in Middleware and Database: A Case Study

This page is the Web companion to the paper "Comparing XML Processing Performance in Middleware and Database".

Marcus Paradies, Susan Malaika, Matthias Nicola, Kevin Xie

The paper is available from the ACM

Your feedback is most welcome

Abstract

XML processing is at the core of many middleware systems. In recent years XML databases have become widely available. This article identifies three common XML processing use cases, and compares their performance when XML manipulation is performed in an XML database and when XML manipulation is part of middleware application code. The article concludes with guidelines for XML processing placement and identifies areas for further study.

The Three Use Cases - Supporting Materials

The use cases and measurements investigated in the companion paper reuse parts of the TPoX (Transaction Processing over XML) benchmark. TPoX is an application-level XML benchmark based on a financial application scenario. TPoX defines several XML document collections, XML queries and updates, and provides an XML data generator as well as a workload driver. The workload driver is a Java application that simulates and measures one or multiple concurrent users performing XML processing.

The use cases operate on 50,000 "custacc" documents from TPoX, each document describing a customer with his personal information and all of his one or more investments accounts. Each use case is implemented in two ways, (1) using SQL/XML to push the XML manipulation to a DB2 9.7 database instance, and (2) in Java with a DOM or SAX parser in the middle tier. Both options read or write XML to/from the database, only the location and implementation of the XML manipulation differs.

The three use cases investigated in this paper are summarized in the following table.

Use Cases Names Descriptions
Use Case 1.1 Extract5 Extract a small number of values from an XML document, often used when presenting information
Use Case 1.2 Extract50 As for Extract5 above, except a medium number of values (50%)
Use Case 1.3 Extract100 As for Extract5 above, except a large number of values (nearly 100%)
Use Case 2.1 Split500 Split a large XML document containing 500 concatenated XML fragments (commonly used for FTP or download) into individual XML documents
Use Case 2.2 Split2500 As for Split500 above, except containing 2500 XML fragments
Use Case 2.3 Split5000 As for Split500 above, except containing 50000 XML fragments
Use Case 3 Modify Insert, change, and delete XML elements in an XML document

The following sub-section describes the use cases in more detail. Each use case description will consist of two parts:

  • The use case as implemented in middle-tier, e.g., with DOM in a stand-alone XML parser
  • The use case as implemented in database, e.g., with SQL/XML inside a database

Use Case 1 - Extract

The first use case extracts information from XML documents and populates them into Java objects. The use case contains three extractions: Extract5, Extract50, and Extract100 as described in the table above.

Figure 1: Overview for Extracting XML elements with SAX in the Middletier

Figure 2: Overview for Extracting XML elements SQL/XML functions in the Database

For the middle-tier part, the code reads a BLOB from the database and parses the BLOB with a SAX parser in the application. During the parsing process the node values are extracted and stored in Java objects. For the database part, the XMLTABLE function from the SQL/XML standard is used to shred XML data to temporary relational tables. The result set is fetched and populated into Java objects. The SQL statement for Extract5 is illustrated below.

Note: The SAX pseudo code is very high level whereas the SQL portions include the complete code

Pseudo Code SAX
1: read BLOB from database;
2: while (!endOfDocument)
3:   element = readElement();
4:   populateIntoJava(element);
5:   endOfDocument = isEndofDocument();
Use Case 1.1 - Extracting 5% of all XML values from a document
SELECT T.*
FROM custacc_xml , XMLTABLE(XMLNAMESPACES(DEFAULT 'http://tpox-benchmark.com/custacc'),'$DOC/Customer'
COLUMNS
    firstname  VARCHAR(20) PATH 'Name/FirstName',
    lastname   VARCHAR(20) PATH 'Name/LastName',
    dayOfBirth DATE        PATH 'DateOfBirth') AS T
WHERE CUSTID = ?
Use Case 1.2 - Extracting 50% of all XML values from a document
SELECT T.*
FROM custacc_xml , XMLTABLE(XMLNAMESPACES(DEFAULT 'http://tpox-benchmark.com/custacc'),'$DOC/Customer'
COLUMNS
    id                  INTEGER      PATH '@id',
    mnemonic   		VARCHAR(45)  PATH 'Mnemonic',
    shortname 		VARCHAR(100) PATH 'string-join(ShortNames/ShortName,",")',
    middlename		VARCHAR(100) PATH 'string-join(Name/MiddleName,",")',
    title	   	VARCHAR(10)  PATH 'Name/Title',
    firstname  		VARCHAR(20)  PATH 'Name/FirstName',
    lastname   		VARCHAR(20)  PATH 'Name/LastName',
    suffix		VARCHAR(20)  PATH 'Name/Suffix',
    dayOfBirth 		DATE         PATH 'DateOfBirth',
    gender 		VARCHAR(10)  PATH 'Gender',
    nationality 	VARCHAR(30)  PATH 'Nationality',
    CountryOfResidence  VARCHAR(45)  PATH 'CountryOfResidence',
    language 		VARCHAR(100) PATH 'string-join(Languages/Language,",")',
    CustomerSince	DATE	     PATH 'BankingInfo/CustomerSince',
    PremiumCustomer     VARCHAR(5)   PATH 'BankingInfo/PremiumCustomer',
    CustomerStatus      VARCHAR(10)  PATH 'BankingInfo/CustomerStatus',
    LastContactDate     DATE	     PATH 'BankingInfo/LastContactDate',
    ReviewFrequency     VARCHAR(20)  PATH 'BankingInfo/ReviewFrequency',
    OnlineLogin		VARCHAR(20)  PATH 'BankingInfo/Online/Login',
    OnlinePin		VARCHAR(20)  PATH 'BankingInfo/Online/Pin',
    OnlinePwd		VARCHAR(20)  PATH 'BankingInfo/Online/Trading-password',
    TaxID		VARCHAR(20)  PATH 'BankingInfo/Tax/TaxID',
    SSN			VARCHAR(20)  PATH 'BankingInfo/Tax/SSN',
    TaxRate		DECIMAL(2,2) PATH 'BankingInfo/Tax/TaxRate',
    Currency		VARCHAR(20)  PATH 'BankingInfo/Currency'
) AS T
WHERE CUSTID = ?
Use Case 1.3 - Extracting 100% of all XML values from a document
SELECT T.*
FROM custacc_xml , XMLTABLE(XMLNAMESPACES(DEFAULT 'http://tpox-benchmark.com/custacc'),'$DOC/Customer'
COLUMNS
    id                  INTEGER      PATH '@id',
    mnemonic   		VARCHAR(45)  PATH 'Mnemonic',
    shortname 		VARCHAR(100) PATH 'string-join(ShortNames/ShortName,",")',
    middlename		VARCHAR(100) PATH 'string-join(Name/MiddleName,",")',
    title	   	VARCHAR(10)  PATH 'Name/Title',
    firstname  		VARCHAR(20)  PATH 'Name/FirstName',
    lastname   		VARCHAR(20)  PATH 'Name/LastName',
    suffix		VARCHAR(20)  PATH 'Name/Suffix',
    dayOfBirth 		DATE         PATH 'DateOfBirth',
    gender 		VARCHAR(10)  PATH 'Gender',
    nationality 	VARCHAR(30)  PATH 'Nationality',
    CountryOfResidence  VARCHAR(45)  PATH 'CountryOfResidence',
    language 		VARCHAR(100) PATH 'string-join(Languages/Language,",")',
    CustomerSince	DATE	     PATH 'BankingInfo/CustomerSince',
    PremiumCustomer     VARCHAR(5)   PATH 'BankingInfo/PremiumCustomer',
    CustomerStatus      VARCHAR(10)  PATH 'BankingInfo/CustomerStatus',
    LastContactDate     DATE	     PATH 'BankingInfo/LastContactDate',
    ReviewFrequency     VARCHAR(20)  PATH 'BankingInfo/ReviewFrequency',
    OnlineLogin		VARCHAR(20)  PATH 'BankingInfo/Online/Login',
    OnlinePin		VARCHAR(20)  PATH 'BankingInfo/Online/Pin',
    OnlinePwd		VARCHAR(20)  PATH 'BankingInfo/Online/Trading-password',
    TaxID		VARCHAR(20)  PATH 'BankingInfo/Tax/TaxID',
    SSN			VARCHAR(20)  PATH 'BankingInfo/Tax/SSN',
    TaxRate		DECIMAL(2,2) PATH 'BankingInfo/Tax/TaxRate',
    Currency		VARCHAR(20)  PATH 'BankingInfo/Currency'
) AS T
WHERE CUSTID = ?

SELECT T.* 
FROM custacc_xml , XMLTABLE(XMLNAMESPACES(DEFAULT 'http://tpox-benchmark.com/custacc'),
'$DOC/Customer/Addresses/Address'
COLUMNS 
    id 	   	INTEGER      PATH '../../@id',
    street	VARCHAR(200) PATH 'string-join(gStreet/Street," ")',
    pobox	VARCHAR(20)  PATH 'POBox', 
    city	VARCHAR(20)  PATH 'City',
    postalcode  VARCHAR(20)  PATH 'PostalCode',
    state	VARCHAR(20)  PATH 'State',
    country	VARCHAR(45)  PATH 'Country',
    citycountry VARCHAR(45)  PATH 'CityCountry',
    phone	VARCHAR(45)  PATH 'string-join(Phones/Phone,",")',
    emails	VARCHAR(150) PATH 'string-join(../EmailAddresses/Email,",")'
) AS T
WHERE CUSTID = ?

select T.* 
FROM custacc_xml , XMLTABLE(XMLNAMESPACES(DEFAULT 'http://tpox-benchmark.com/custacc'),
'$DOC/Customer/Accounts/Account/Holdings/Position'
COLUMNS 
id	         INTEGER      PATH '../../../../@id',
AccountID	 VARCHAR(20)  PATH '../../@id',
Category	 VARCHAR(20)  PATH '../../Category',
AccountTitle   	 VARCHAR(45)  PATH '../../AccountTitle',
ShortTitle	 VARCHAR(20)  PATH '../../ShortTitle',
OpeningDate	 VARCHAR(25)  PATH '../../OpeningDate',
AccountOfficer 	 VARCHAR(45)  PATH '../../AccountOfficer',
LastUpdate	 VARCHAR(25)  PATH '../../LastUpdate',
OnlineActualBal	 INTEGER      PATH '../../Balance/OnlineActualBal',
OnlineClearedBal INTEGER      PATH '../../Balance/OnlineClearedBal',
WorkingBalance	 INTEGER      PATH '../../Balance/WorkingBalance',
Passbook	 VARCHAR(10)  PATH '../../Passbook',
ChargeCcy	 VARCHAR(10)  PATH '../../ChargeCcy',
InterestCcy	 VARCHAR(10)  PATH '../../InterestCcy',
AllowNetting	 VARCHAR(10)  PATH '../../AllowNetting',
InPutter	 VARCHAR(500) PATH 'string-join(../../gInputter/Inputter,",")',
Position	 VARCHAR(500) PATH 'string-join((Symbol,Name,Type,Quantity),", ")'
) AS T
WHERE CUSTID = ?

Use Case 2 - Split

The second use case splits a large XML document into smaller documents (individual customer documents). The use case contains three splits: Split500, Split2500, and Split5000 as described in the table above. Splitting large XML batch files is a common scenario in business applications.

Figure 3: Overview for XML batch file splitting

Figure 4: Overview for XML batch file splitting with DOM in the Middletier

Figure 5: Overview for XML batch file splitting with SQL/XML functions in the database

Note: The DOM pseudo code is very high level whereas the SQL portions include the complete code

Use Case 2 - Splitting an XML batch file into smaller fragments in the Middletier
1:  file = FileInput('batch500.xml')
2:  dom = parse(file)
3:  vectorOfDocs = splitDocument(dom)
4:  FOR i in vectorOfDocs:
5:  	domString = serializeDOM(vectorOfDocs[i])
6:	executeUpdate(domString)
Use Case 2 - Splitting an XML batch file into smaller fragments in the Database
INSERT into custacc_xml(DOC)
   SELECT X.FRAG from XMLTABLE(
      XMLNAMESPACES(DEFAULT 'http://tpox-benchmark.com/custacc'),
        '$i/Customers/Customer' PASSING cast(? as xml) as "i"
            COLUMNS FRAG XML PATH 'document{.}'
   ) as X

To split a XML batch file in the middle-tier, the document is parsed (e.g. using DOM) and then split into smaller pieces. In use case 2, the XML document is read from the file system and parsed using DOM into a DOM instance. The DOM tree is split into smaller chunks of XML data and new DOM instances are created. Each DOM tree contains w one Customer element. After that, the DOM is serialized into a Java string and stored as XML document into the database. For the database part, the XML batch file is sent as a binary stream to the database. The binary stream is parsed and then split into smaller documents using XMLTABLE function. The split XML documents are inserted into an XML column.

Use Case 3 - Modify

  • Updating XML - modifying , deleting and inserting nodes
    • Middle tier: DOM
    • Database: XQuery Update

Figure 6: Overview for Modifying XML files with Xquery Updates in the Database

Figure 7: Overview for Modifying XML files with DOM in the Middletier

Note: The DOM pseudo code is very high level whereas the SQL portions include the complete code

Pseudo Code DOM Tree Modifying
1: read BLOB from database
2: dom = parse(BLOB)
3: dom = modify(dom) // modifications can be: insert, update and delete elements
4: domString = serializeDOM(dom)
5: executeUpdate(domString)
Use Case 3 - Inserting Email Element
UPDATE custacc_xml
SET doc = XMLQUERY('declare default element namespace "http://tpox-benchmark.com/custacc";
    copy $cust := $DOC
    modify do insert
        <Email primary="No">
{fn:concat(fn:string-join(($cust/Customer/Name/FirstName,$cust/Customer/Name/LastName),"."),"@domain.com")}
</Email>
      as last into $cust/Customer/Addresses/EmailAddresses
    return $cust
')
WHERE CUSTID = ? %
Use Case 3 - Updating Email Element
UPDATE custacc_xml
SET doc = XMLQUERY('declare default element namespace "http://tpox-benchmark.com/custacc";
    copy $cust := $DOC
    modify do replace
      value of $cust/Customer/Addresses/EmailAddresses/Email[last()]
      with fn:concat($cust/Customer/Name/FirstName,".",$cust/Customer/Name/LastName),"@newdomain.com")
    return $cust
')
WHERE CUSTID = ?
Use Case 3 - Deleting Email Element
UPDATE custacc_xml
SET doc = XMLQUERY('declare default element namespace "http://tpox-benchmark.com/custacc";
    copy $cust := $DOC
    modify for $i in $cust/Customer/Addresses/EmailAddresses/Email
    where (fn:ends-with($i,"newdomain.com"))
    return do delete
      $i
    return $cust
')
WHERE CUSTID = ? %