Example using XQuery transform in a partitioned database environment
The XQuery transform expression is supported in a partitioned database environment and can be used in scenarios that use extract, transform, and load (ETL) operations.
For example, when XML data is extracted from the result of a join between two source tables, ORDERS and PRODUCT, transformed into a wanted format, and inserted as XML data into a target table SALES, an XQuery transform expression can be used. However, a transform expression performs best if the input to the expression references only a single table rather than a join of several source tables.
When using an XQuery transform expression with multiple tables, writing a single statement in which the transform expression is applied directly to the join result generally yields acceptable performance if the join result is small relative to the size of the source tables. However, if the number of rows produced by the join is comparable to or larger than the number of rows in the source tables, you should consider splitting the join and the transform expression into two statements.
- Create a new intermediate table in the same partition group having the same distribution key as the target table. The intermediate table can be a declared global temporary table.
- Extract the data from the multiple source tables and insert it into the intermediate table.
- Transform the XML data from the intermediate table and insert the transformed data into the target table.
Both steps 2 and 3 can take advantage of the parallel processing available in a partitioned database environment to be scalable. A nested transform expression should be avoided in Step 3.
Tables and data used in the example
The example uses the source tables ORDERS and PRODUCTS, the target table SALES, and a declared global temporary table TEMPSALES. The example retrieves data from the ORDER table, joins the data with pricing information from the PRODUCTS table, formats the resulting data, and inserts the formatted data into the SALES table.
CREATE TABLE ORDERS(OID BIGINT, ORDERDETAIL XML)DISTRIBUTE BY HASH (OID);
The
following INSERT statement inserts a sample order into the ORDERS
table:INSERT into ORDERS
values (5003, '<order>
<cid>1001</cid>
<product>
<pid>2344</pid><qty>10</qty>
<delivery>Overnight</delivery>
</product>
<product>
<pid>537</pid><qty>3</qty>
<delivery>Ground</delivery>
</product>
</order>');
CREATE TABLE PRODUCTS(PID BIGINT, PRICE FLOAT, PRODDETAIL XML);
The
following INSERT statements inserts product data into the PRODUCTS
table:INSERT into PRODUCTS
values(2344, 4.99, '<product>
<name>10 D-Cell batteries</name>
<desc>D Cell battery, 10-pack</desc>
</product>')
INSERT into PRODUCTS
values(537, 8.99, '<product>
<name>Ice Scraper, small</name>
<desc>Basic ice scraper, 4 inches wide</desc>
</product>');
CREATE TABLE SALES(OID BIGINT, CID BIGINT, PID BIGINT, ITEMTOTAL FLOAT,
SALESDETAIL XML) DISTRIBUTE BY HASH (OID);
A table join and XQuery transform expression in a single statement
INSERT into SALES
select T.OID, T.CID, T.PID, T.ITEMTOTAL,
XMLQUERY('
copy $new := $temp
modify (do delete ($new/info/cid,
$new/info/product/pid,
$new/info/product/qty),
do insert <orderdate>{fn:current-date()}</orderdate>
as first into $new/info)
return $new' passing T.SALESDETAIL as "temp")
from(
SELECT O.OID, OX.CID, OX.PID, P.PRICE * OX.QTY, OX.SALESDETAIL
FROM PRODUCTS P,
ORDERS O,
XMLTABLE('for $i in $details/order/product
return document{<info> {$details/order/cid} {$i} </info>}'
passing O.ORDERDETAIL as "details"
columns
CID bigint path './info/cid',
PID bigint path './info/product/pid',
QTY int path './info/product/qty',
SALESDETAIL xml path '.') as OX
WHERE P.PID = OX.PID) as T(OID, CID, PID, ITEMTOTAL, SALESDETAIL);
The next section demonstrates how the table join and the XQuery transform performed in the previous statement can be performed in two separate statements.
A table join and XQuery transform expression in separate statements
If the result of the join in the second half of the statement in A table join and XQuery transform expression in a single statement is significant compared to the size of the ORDER and PRODUCT tables, splitting the statement into two statements improves performance. When splitting the single statement into two, the first statement inserts the results of the join between the ORDER and PRODUCT tables into a temporary table. The second statement applies the transform to the XML documents in the temporary table and inserts the updated documents into the SALES table.
DECLARE GLOBAL TEMPORARY TABLE SESSION.TEMPSALES LIKE SALES
DISTRIBUTE BY HASH (OID);
INSERT INTO SESSION.TEMPSALES
SELECT O.OID, OX.CID, OX.PID, P.PRICE * OX.QTY, OX.SALESDETAIL
FROM PRODUCTS P,
ORDERS O,
XMLTABLE('for $i in $details/order/product
return document{<info> {$details/order/cid} {$i} </info>}'
passing O.ORDERDETAIL as "details"
columns
CID bigint path './info/cid',
PID bigint path './info/product/pid',
QTY int path './info/product/qty',
SALESDETAIL xml path '.') as OX
WHERE P.PID = OX.PID;
The previous SELECT statement used by the INSERT statement does not contain an XQuery transform expression.
The temporary table contains the relational information needed for the SALES table. In the XML documents in the temporary table, the product information needs to be removed and an order date needs to be added.
INSERT into SALES
select T.OID, T.CID, T.PID, T.ITEMTOTAL,
XMLQUERY('
copy $new := $temp
modify (do delete ($new/info/cid,
$new/info/product/pid,
$new/info/product/qty),
do insert <orderdate>{fn:current-date()}</orderdate>
as first into $new/info)
return $new' passing T.SALESDETAIL as "temp")
from SESSION.TEMPSALES T;