One of the key differences of XML from the Large Object (LOB) types in DB2 is that XML data can be indexed. The XML indexes supported in DB2 9 are also called XML value indexes, and queries on XML can use these indexes for performance.
An XML index example
An XML index is used to provide a mapping from a node value to its location. An XPath expression is required to specify the nodes within a document to be indexed. Two data types are supported for XML indexes in DB2 9 for z/OS: DECFLOAT for numeric values and VARCHAR(n) for string values. For example, the following CREATE INDEX DDL creates an XML index on TransRefGUID of the REQUESTXML column of ACORD data, as VARCHAR(24):
CREATE INDEX ACORD.ACORDINDEX1 ON ACORD.REQUEST(REQUESTXML)GENERATE KEYS USING XMLPATTERN'declare default element namespace "http://ACORD.org/Standards/Life/2"; /TXLife/TXLifeRequest/TransRefGUID' as SQL VARCHAR(24)
DB2 will take the value of /TXLife/TXLifeRequest/TransRefGUID as the key, and map to its logical location (DOCID, NODEID), and physical location (RID), using the existing B+-tree index infrastructure.This index can be used for queries that search on TransRefGUID. For example, the following XMLEXISTS predicate can potentially use this index:
XMLExists('declare default element namespace "http://ACORD.org/Standards/Life/2"; /TXLife/TXLifeRequest[TransRefGUID="2004-1217-141016-000012"]'PASSING REQUEST.REQUESTXML)
From queries, it is easy to figure out what kind of XML indexes can speed up the queries - by concatenating path steps from within a predicate to the steps outside the predicate, and using data types consistent between the queries and indexes.
Before explaining the index access procedure, I'd like to review the basic XML storage scheme so you can understand it better.
Basic XML storage scheme in DB2 9 for z/OS
The following picture depicts the high-level storage scheme for XML data.
At high-level, the XML data is stored in a separate table space, just like LOB data. The real XML data is stored in the XMLDATA column of the internal XML table. It contains the hierarchical data in records that can fit in 16KB pages. In order to support free movement of data records, logical links using NODEIDs are used. That's why we need a NODEID index to link records for a document. Similarly, since utilities, such as REORG, can be applied to the base table space and the XML table space independently, XML indexes do not contain base table RIDs, but XML table RIDs. In order to get to base table rows from XML indexes, we need the DOCID index on the base table (see below). That's why DB2 always creates a DOCID index on the base table and a NODEID index on the XML table as part of the storage scheme, although they are for totally different purposes. By XML indexes, we refer to XML value indexes created by users.
Basic XML index access plans
If you use EXPLAIN for a query and select some key columns from the PLAN_TABLE, you will see some new access type in the ACCESSTYPE column for SQL/XML queries. They are the following:
- DX: DOCID list access. DB2 searches an XML index and retrieves all the qualified DOCIDs. Then the DOCIDs are sorted and duplicates are removed. The DOCID list is further converted to a RID list of the base table through the DOCID index. Then the base table rows are fetched.
- DI: DOCID ANDing. DB2 intersects two DOCID lists from two XML indexes for conjunction of two XPath predicates.
- DU: DOCID ORing. DB2 unions two DOCID lists from two XML indexes for disjunction of two XPath predicates.
- M: Multi-index access.
If you see "R" (R-Scan) in the ACCESSTYPE for a table with an XMLEXISTS predicate, then DOCSCAN is applied for the XML column. No new type was introduced for the scan.
Here is an example for an index ANDing plan:
+---------------------------------------------------------------------+ | PLANNO | ACCESSTYPE | MATCHCOLS | ACCESSNAME | MIXOPSEQ | +---------------------------------------------------------------------+1_| 1 | M | 0 | | 0 |2_| 1 | DX | 1 | ACORDINDEX2 | 1 |3_| 1 | DX | 1 | ACORDINDEX1 | 2 |4_| 1 | DI | 0 | | 3 | +---------------------------------------------------------------------+
The following diagram illustrate the process of using the index ANDing plan. Step 4 is where a DOCID index is always used.
These are the basic XML index access plans. For example, NODEID and RID from XML indexes are not used for queries today. We are enhancing plans for better query performance, so expect more methods in the future.
Somethings specific to XML indexes
The same principles for relational indexes apply to XML indexes, such as create indexes only needed by queries and use REBUILD INDEX. The following are some unique features of XML indexes:
- Depending on the XPath in the XMLPATTERN of the CREATE INDEX statement and XML documents, DB2 may generate zero, one or more key entries for a document in an XML index.
- For a numeric index (DECFLOAT type), if a node identified by the XMLPATTERN contains a value that cannot be cast to a number, it is ignored. But insert will be successful. For example, no index entry is generated for XL if the index is DECFLOAT type.
- However, for a string index (VARCHAR(n) type), if a node identified by the XMLPATTERN contains a value that is longer than n, then the document will be rejected, or CREATE INDEX will fail. The maximum n is 1000.
- You can create as many indexes as you like on a single XML column. You could even create multiple indexes with different types on the same XMLPATTERN. This provides much flexibility for XML.
- Use fully specified XPath whenever possible to reduce the cost of maintaining the indexes.
- Index leaf nodes or nodes close to leaves. For non-leaf nodes, the key values are concatenated values from descendants, and their usage is a bit weird. For example, index on name for <name><firstname>John<firstname><lastname>Doe<lastname><name> will result in "JohnDoe" as the key, and you have to use [name="JohnDoe"] in queries to qualify for the index-based search. In addition, DB2 9 does not support a node across record boundaries.
To produce consistent query results with or without XML indexes, DB2 tries to tolerate cast errors during XMLEXISTS predicate numeric comparison. For example, if a node size contains "XL", comparison [size > 10] will tolerate the "XL" value, which is equivalent to evaluate to false.
DECFLOAT is used instead of DOUBLE for the numeric index type due to its precision. Date and time are not yet supported, but you can use string indexes if you use ISO format (which is required in XML) without timezone or always use the same timezone for the data, and use string comparison in the queries to search the documents.
If you never search inside XML documents, but get XML data in and out as a whole, you probably don't need to use the XML type, since VARCHAR or VARBINARY or LOB types can serve the purpose.
To summarize, XML indexes use XPath to identify nodes to be indexed, and can be used for queries with XMLEXISTS and XMLTABLE predicates.
-Guogen (Gene) Zhang (GGZ)[Read More
We all know the benefits of Web services and Service-Oriented Architecture (SOA). One of the benefits is to revitalize and enable existing applications or functionality as components to participate in new applications, extending the life and value of "legacy" applications. To make DB2 applications a natural part of this architecture, DB2 provides two SOAPHTTP functions as UDFs - SOAPHTTPNV and SOAPHTTPNC (replacing SOAPHTTPV and SOAPHTTPC) to invoke web services. With SQL XML functions that handle XML data, an SQL/XML statement can consume XML responses from SOAP Web services directly.
An SQL/XML Example
Here is an example to get stock quote from a free public web service and to return a row:
SELECT *FROM XMLTABLE('/StockQuotes/Stock' PASSING XMLPARSE(DOCUMENT XMLCAST( XMLQUERY('declare namespace soap="http://schemas.xmlsoap.org/soap/envelope/"; declare default element namespace "http://www.webserviceX.NET/";/soap:Envelope/soap:Body/GetQuoteResponse/GetQuoteResult'PASSING XMLPARSE(DOCUMENTDB2XML.SOAPHTTPNC( 'http://www.webservicex.net/stockquote.asmx', 'http://www.webserviceX.NET/GetQuote', '<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <GetQuote xmlns="http://www.webserviceX.NET/"> <symbol>IBM</symbol> </GetQuote> </soap:Body></soap:Envelope>') ) ) as VARCHAR(2000)) ) COLUMNS "Symbol" VARCHAR(4), "Last" DECIMAL(6,2), "Date" VARCHAR(10), "Time" VARCHAR(8), "Change" VARCHAR(8), "Open" VARCHAR(8), "High" VARCHAR(8), "Low" VARCHAR(8), "Volume" VARCHAR(12) ) XT#
It returns a row like the following:
Symbol Last Date Time Change Open High Low VolumeIBM 84.70 1/9/2009 4:00pm -2.48 87.05 87.50 84.25 7440555 1 record(s) selected
You can combine the result with any complex joins to other data you can imagine without application programming in lower-level languages to consume the web services. It not only provides convenience with improved productivity, but also better performance in consuming web services, especially when you need to combine web service result with other DB2 data.
Setup of the SOAPHTTP functions
The two SOAP web service consumer UDFs SOAPHTTPNV and SOAPHTTPNC were provided through APAR PK48773, which had an APAR PK73324 PTF UK42369 for DB2 9. Also there is an APAR PK70932 PTF UK40600 for SQL XMLTABLE function. Apply the PTFs and then use the SQL DDLs like the following to create the UDFs with your WLM environment (WLMENV8 used below) and timeout limit (10 seconds below). For the complete four CREATE statements, see this file (CreateAlterSOAPUDFs.txt).
CREATE FUNCTION DB2XML.SOAPHTTPNV ( ENDPOINT_URL VARCHAR(256), SOAP_ACTION VARCHAR(256), SOAP_INPUT VARCHAR(32672)) RETURNS VARCHAR(32672)LANGUAGE CSPECIFIC SOAPHTTPNVIVOEXTERNAL NAME DSNWSCVVPARAMETER STYLE DB2SQLPARAMETER CCSID UNICODEPARAMETER VARCHAR STRUCTURESCRATCHPADFINAL CALLFENCEDNOT DETERMINISTICCALLED ON NULL INPUTNO SQLEXTERNAL ACTION RUN OPTIONS 'POSIX(ON),XPLINK(ON),ENVAR("DB2SOAP_TIMEOUT=10")'WLM ENVIRONMENT WLMENV8DBINFO;
After creation, if you need to modify the timeout limit, you can execute an ALTER statement like the following to increase the timeout to 20 seconds, for example:
ALTER SPECIFIC FUNCTION DB2XML.SOAPHTTPNVIVO RUN OPTIONS 'POSIX(ON),XPLINK(ON),ENVAR("DB2SOAP_TIMEOUT=20")';
See the same file for the other ALTER DDL examples.
Revisiting the Example
If you look at the above example carefully, you may wonder why it has two levels of XMLPARSE and processing using XMLQUERY and XMLTABLE. Let's examine the response example:
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><GetQuoteResponse xmlns="http://www.webserviceX.NET/"><GetQuoteResult><StockQuotes><Stock><Symbol>IBM</Symbol><Last>84.70</Last><Date>1/9/2009</Date><Time>4:00pm</Time><Change>-2.48</Change><Open>87.05</Open><High>87.50</High><Low>84.25</Low><Volume>7440555</Volume><MktCap>113.8B</MktCap><PreviousClose>87.18</PreviousClose><PercentageChange>-2.84%</PercentageChange><AnnRange>69.50 - 130.93</AnnRange><Earns>8.483</Earns><P-E>10.28</P-E><Name>INTL BUSINESS MAC</Name></Stock></StockQuotes></GetQuoteResult></GetQuoteResponse></soap:Body></soap:Envelope>
If you look closely, the soap:Body contains some XML data as text. The GetQuoteResult element contains many escaped entity references. In order to make sense of it, we need to cast it to a string, instead of serializing it, which is achieved by XMLCAST( . as VARCHAR(2000)). After casting, the content looks like the following, the plain XML data:
<StockQuotes> <Stock> <Symbol>IBM</Symbol> <Last>84.70</Last> <Date>1/9/2009</Date><Time>4:00pm</Time> <Change>-2.48</Change> <Open>87.05</Open> <High>87.50</High> <Low>84.25</Low> <Volume>7440555</Volume> <MktCap>113.8B</MktCap> <PreviousClose>87.18</PreviousClose> <PercentageChange>-2.84%</PercentageChange> <AnnRange>69.50 - 130.93</AnnRange> <Earns>8.483</Earns> <P-E>10.28</P-E> <Name>INTL BUSINESS MAC</Name> </Stock></StockQuotes>
Once you reveal the true value, you can perform XMLPARSE and XMLTABLE again, as shown in the above query example, to extract the row out of this data. This example also illustrates the difference between XMLSERIALIZE and XMLCAST.
The above example shows how easy you can invoke and consume web services in the same SQL statement. To see how to make DB2 SQL statements, stored procedures and applications web service providers, look into IBM Data Studio tooling for Data Web Services. You will be surprised how easy they are. Hope you will like the features.
Today is Chinese New Year's day for the year of ox. I wish you a Happy Niu Year! 恭喜发财!
-Guogen (Gene) Zhang (GGZ)[Read More
To make it easier for DB2 for z/OS friends to find pureXML information specific to DB2 for z/OS, we've just created a DB2 for z/OS pureXML wiki page under DB2 XML wiki page to list z/OS-specific information. It may contain cross-references to other areas, and has initial content right now. We expect the content to accumulate over the time.
In the new wiki page, I've just uploaded DB2 9 for z/OS pureXML podcast series, recorded by Guogen Zhang (that's me :-) ) in 2008. This is a 10-part introductory podcast series on pureXML business value and technical knowledge. The first part covers overview and business value, and the follow-on 8 parts cover different aspects of pureXML, from query, schema, to utilities and performance. The last part talks about best practices. Each part is about 10-15 minutes. The podcast is in mp3 audio format together with transcripts in PDF.
Give us feedback on whether this introductory podcast series is useful to you. If so, we will prepare follow-on podcast for more advanced topics of your interests. Thank you.
Check back the DB2 for z/OS pureXML wiki page often. You never know what will pop up there.
-Guogen (Gene) Zhang[Read More
The XMLTABLE() function is probably the most powerful function in manipulating XML data using SQL/XML. Some of its uses include, but not limited to, the following:
- to achieve XQuery capability using SQL/XML with XPath. See SQL/XML queries for XQuery usecases.
- to decompose a large document into smaller documents for storage.
- to generate relational view over stored XML data, to bridge the gap between XML storage and relational application software, and also provide reporting over XML data.
Since there are other articles giving detailed account of XMLTABLE function, for example, Matthias's two-part articles of "XMLTABLE By Examples" Part 1 and Part 2, I will just list the highlight on using the XMLTABLE function here:
- XMLTABLE and XMLCAST were delivered post V9 GA. Apply PK51571, PK51572, and PK51573 before trying the functions.
- In general, XMLTABLE() returns multiple rows of multiple columns. The column types can be non-XML SQL types and XML type.
- The column value of non-XML type has to contain at most one item only. Otherwise, an error sqlcode : -16003 sqlstate: 10507 will be returned. In this case, you need to lower the level of nodes the row expression identifies, and use "." (self) in the column path expression if necessary.
- Use XMLNAMESPACES to declare namespaces that are needed across the row path expression and column path expressions, like this: XMLTABLE(XMLNAMESPACES(), ...). The namespaces declared in the row path expression do not cover column path expressions (according to the SQL standard).
- Avoid using the parent axis ("..") due to its overhead. If you need to lower the level, try to cascade XMLTABLE functions. The first XMLTABLE() produces rows at a high level so that the parent axis can be avoided, and the next XMLTABLE function takes as the input the XML column output of the previous XMLTABLE, and generates other columns.
- If you use XMLTABLE to decompose a document into smaller documents, you need to use XMLDOCUMENT() constructor to wrap the pieces into a document to insert into an XML column.
- You can use XMLTABLE() to extract one row (a tuple) of multiple columns, in particular, multiple heterogeneous pieces from different portions of a document, to be used in constructing a new document.
- You can use XMLTABLE() to iterate through a sequence of homogeneous items that XMLQUERY was not able to separate them, each to be used in constructing a new document. XMLAGG() can be used to aggregate multiple parts back into one value.
- You can use XMLTABLE() to define a relational view so SQL queries can treat XML data as relational data, to bridge the gap between XML and relational. This can also achieve more than XQuery 1.0 can do for reporting and light analytics. For heavy analytics, you may need to materialize XMLTABLE result into tables. MQT support for XMLTABLE() is not available at this point.
- You can use XMLTABLE() and XMLPARSE() to consume Web services directly in SQL by using SOAPHTTPNV or SOAPHTTPNC UDFs in DB2 (more to come).
If you have any questions, don't hesitate to contact me or post questions at DB2 pureXML forum.
-Guogen (Gene) Zhang (GGZ)
In this first posting of the new year, I'm going to share something non-technical. We are looking forward to the uptake of the pureXML projects in many of our clients and ISVs. As for any new technology, pureXML adoption takes leadership and courage. The reward will be ample, especially in this unprecedented, tough global economy. With pureXML, you could support new products and services in reduced time to market, achieve more with the same resources, and improve the bottom-line of the businesses.
Having read the book "Primal Leadership - learning to lead with emotional intelligence" by Daniel Goleman, et al. before, I recently came across a series of three Harvard Business Review (HBR) articles that are pretty much short versions of the book. "Leadership that gets results" focuses on six leadership styles.
The resonant styles are:
- the visionary/authoritative styles,
- the coaching style,
- the affiliative style, and
- the democratic style.
And the dissonant styles are:
- the pacesetting style and
- the command/coercive style.
"The best leaders don’t know just one style of leadership — they’re skilled at several, and have the flexibility to switch between styles as the circumstances dictate."
"What makes a leader" focuses on the emotional intelligence that is necessary to be effective leaders. The four emotional intelligence domains/skills are:
- social awareness, and
- relationship management(empathy and motivation).
"Primal leadership - the hidden driver of great performance" focuses on a process of five steps to achieve greater emotional intelligence:
- What do you want to be?
- Who are you now?
- How do you get from here to there?
- How do you make change stick?
- Who can help you (in the above process)?
If you are not familiar with this research result, I highly recommend you search on the web to find some summary in more detail, or get the book or articles.
Another book that I read over ten years ago is also one of the best-sellers many know well: "The seven habits of highly effective people" by Stephen Covey. The seven habits are:
- be proactive;
- begin with the end in mind;
- put first things first;
- think win/win;
- seek first to understand, then to be understood;
- sharpen the saw.
"The key to success in business is action. But in most companies, people are rewarded for talking - and the longer, louder, and more confusingly, the better." (from HBR article "The smart-talk trap") Talking is probably better than deceiving and bad politics. And trustworthiness and integrity are essential in a leader's character. (I don't know why I'm writing these sentences.)
I wish you fulfill your new year resolution! And make pureXML one of your technology priorities this year.
- Guogen (Gene) Zhang (GGZ)[Read More
On this last day of year 2008, I'd like to share with you on how easy it is to generate XML from relational data using SQL in DB2 V8 and DB2 9 for z/OS.
To start with, I'd like to show a simple example of SQL functions that generates the following XML:
<name> <firstname>John</firstname> <lastname>Doe</lastname></name>
The SQL functions to construct this XML is the following:
XMLELEMENT(NAME "name", XMLELEMENT(NAME "firstname", 'John'), XMLELEMENT(NAME "lastname", 'Doe') )
As you can see, the SQL functions nest exactly corresponding to the XML data nesting structure. These functions are standard SQL XML constructors or sometimes called XML publishing functions. Notice that the "NAME" keyword is required by the standard to indicate that an SQL ID ("name") is used as a name for the element, as opposed to using the value of a column ("name") in other SQL functions. Double quotes are necessary to keep the name case-sensitive.
In the next example, I'd like to show how you can group a set of elements into one XML value and nest under an element using XMLAGG. This query assumes you have an employee table:
+---------+--------------+------------+--------+ | dept | firstname | lastname | ... | +---------+--------------+------------+--------+ Shipping White Smith ... Shipping Paul Taylor ... Shipping John Doe ... ...
And you want to group employees under a department element like the following:
<Department name="Shipping"> <Emp> <firstname>John</firstname> <lastname>Doe</lastname> </Emp> <Emp> <firstname>White</firstname> <lastname>Smith</lastname> </Emp> <Emp> <firstname>Paul</firstname> <lastname>Taylor</lastname> </Emp></Department>...
The query to generate this XML data is as follows:
SELECT XML2CLOB( XMLELEMENT(NAME "Department", XMLATTRIBUTES (e.dept AS "name" ), XMLAGG(XMLELEMENT(NAME "Emp", XMLELEMENT(NAME "firstname", e.firstname), XMLELEMENT(NAME "lastname", e.lastname) ) ORDER BY e.lastname ) ) ) AS "dept_list"FROM employees e GROUP BY dept;
This is what people typically call "de-normalization" (of relational data). Notice that XMLAGG takes ORDER BY clause to specify the order of elements within a group. In this example, it's to order the elements based on the lastname. This query should work on both V8 and V9. In DB2 9, additional constructors are provided to make them complete. V8 queries continue to work in DB2 9.
There are many advantages of using SQL to generate XML from relational data. Here are a few of them:
- It is easy to extend existing COBOL or PL/I applications to generate XML by using SQL. It works for non-Unicode data.
- It is much easier to generate XML than using DOM in programming languages like Java and C/C++, enabling higher productivity.
- It makes applications light-weight, with high-performance. It delivers the XML value in one chunk, instead of piece-by-piece.
What is a QName?
Before listing some additional resources, I'd like to trickle in some terminology related to XML for beginners (please skip if you already know). An element or attribute has a name, officially called a QName, for Qualified Name, such as "hr:name", where "hr" is a prefix and "name" is a local name. Both prefix and local name are NCNames, for No-Colon Names (names without a colon in them). The prefix is for the namespace, and needs to be declared with a namespace declaration like the following:
<hr:name xmlns:hr="http://www.example.com/humanresource"> ... </hr:name>
Where "xmlns" signifies a namespace declaration, and "hr" is the namespace prefix being declared. The value "http://www.example.com/humanresource" is a namespace name or namespace URI. The reasons that XML namespace is designed this way are the following:
- To avoid conflict (the purpose of a namespace), it has to be some unique identifiers across the Web. That will be the URI (Uniform Resource Identifier).
- A URI cannot be used directly in an element name or attribute name because it does not follow the syntax rules for element/attribute names in general.
- A prefix is used instead to make it follow the name syntax rules, and it is shorter usually.
An element or attribute is said to be in the scope of a namespace if it is between the start tag (inclusive) containing the namespace declaration and the corresponding end tag. And an element has a set of in-scope namespaces. A namespace declaration is said to be superfluous if the namespace being declared is already in the scope.
An NCName becomes a QName if the prefix is empty, that is, either there is no namespace (No Namespace is used), or a default namespace is used (a namespace is bound to the empty prefix). A colon is not used in these cases. The following example illustrates a default namespace declaration:
<name xmlns="http://www.example.com/humanresource"> ... </name>
For more SQL/XML constructor details and examples on generating XML from relational data, please look at this "old" presentation of mine from IM Tech Conference '05: Generating XML from Relational Data Using SQL with Ease. Some of the examples are included in the XMLQuickDemo 7XMLGEN1.CLP and 8XMLGEN2.CLP. I've also made XML4TABLE package public. This package contains two SQL PL stored procedures for DB2 9 for z/OS: XML4TABLE and XML4ENTITY1, which can generate XML for an entire table or a given entity from multiple RI-related tables. Read the readme.txt file in the package for details. Let me know if you have any feedback.
Also, check IBM Data Studio and Rational Data Architect for tool support in generating SQL/XML queries to generate XML from relational data. And don't forget to check SQL Reference and XML Guide, and other online resources.
I wish you a very happy and prosperous new year 2009!
-Guogen (Gene) Zhang[Read More
If you are a System Programmer or System DBA, you would find nothing significant that is really new in pureXML for DB2 9 for z/OS, thanks to our architectural design principle: leveraging mature optimized infra-structure in DB2 for XML data management. We use familiar table spaces (pages and records), indexes, buffer pools, and locking scheme for XML data, and provide DBAs the same utilities and tools to administrate the familiar objects, even though XML data model is hierarchical and even though we provide application developers "revolutionary" new weapons (SQL/XML with XPath) to process XML data in DB2.
If this is the first time you looked at XML in DB2 9 for z/OS, you can take a quick look at XMLQuickDemo, where the second CLP 2ListObjects.CLP finds the DB objects involved in storing an XML column data, for background.
You probably would be a bit surprised (and feel relieved?!) when you realized (or you are told now) that there are no new utilities specific for XML, and little setup to start with XML in DB2 9 for z/OS. I've looked again and again to see what's new specific to XML. I've found the following list.
- Implicitly created DB objects for XML – DBA cannot create explicitly.
- XML keyword in some utilities, such as LOAD, UNLOAD, LISTDEF etc.
- A new XML lock type, ID '35'x in traces, and it appears in IFCID 20, 21, 107, 150, 172, and 196.
- XMLDATA in XML tables contains StringIDs in a catalog table SYSIBM.SYSXMLSTRINGS (dictionary) specific to each catalog. Therefore, UNLOAD FROMCOPY is restricted. And copies from DSN1COPY cannot be moved to another DB2 system.
- XML indexes: XPath and keys - for Application DBAs.
- XML Schema registration - for Application DBAs.
In the following, I will briefly describe what is new in pureXML for system programmers and DBAs, and what remains the same.
Setup and Configuration
As I indicated in a previous posting, getting started with pureXML requires almost nothing to set up once you get to DB2 9 for z/OS NFM. You probably only need to take care of some authorization issues, business as usual.
When the application folks get serious, you may need to take care of the zparms that are related to XML: XMLVALA, XMLVALS, LOBVALA, LOBVALS - these four are related to virtual storage limit. LOB zparms are involved because LOB manager is used for XML data bind-in and bind-out. Also the default buffer pool for XML data (BP16k0) needs to be changed.
XML Schema Repository (XSR) setup is only needed if your applications use schema validation or annotated-schema-based decomposition. Only then will you need to take care of Java stored proc and WLM setup. You would deal with SDSNLOAD and SDSNLOD2 also.
Utilities and Tools
No new utilities for XML, and no new tools for XML performance monitoring. You use all the existing and familiar utilities and tools to cover XML objects. You include XML objects using LISTDEF for backup and recovery operations. Some minor restrictions may apply to XML objects in some utilities. XML performance problems can be analyzed through accounting traces and performance traces. Business as usual.
XML indexes and XPath, and Access Types
Indexes are critical for query performance. No difference for XML queries, they require XML indexes. This is something new to application DBAs and architects. XPath is used to index XML data that is searched frequently. Even the new XPath is involved, the index infra-structure remains the same. One XML document may generate zero, one, or more index entries for an XML index.
Also some minor new access types are introduced when using XML indexes. They are DX/DI/DU for DOCID list access (single index), DOCID list ANDing, and ORing. You still get "R" for DOCSCAN evaluation of XMLEXISTS predicate.
XML Schema Registration
In case your applications would like to use XML schemas, you need to set up XSR, and register XML schemas. Something new to application DBAs. There are tools to help, such as IBM Data Studio, in registering XML schemas. If you just want to use a simple tool, I recommend CLP (Command Line Processor).
I'd like also to emphasize that knowledge in XML, XPath, and SQL/XML will make you much more valuable at this SOA age, and in this tough economy. Take advantages of XML and pureXML in DB2 as many other people do! Let me know if you have any questions and concerns.
No matter how familiar one is with SQL, DB2, and relational data model, the first time getting exposed to XML and XPath, she would likely feel overwhelmed. Getting into the XML world is not a small step. But here I'd like to demystify XPath for SQL people who is just getting started with XML and XPath. I will give you a very gentle introduction to XPath and how you can use it to query XML data in SQL/XML.
It is actually very easy to start: we need to understand thatXML data model is a natural but significant extension to the relational model, it features two very powerful structural capabilities - nesting and repeating. First, it allows nesting, to any level (DB2 allows 128 levels of nesting maximum). Nesting is very common in data structures. For example, you want to separate a name into first name and last name, then you have two levels. You can view a table to have three levels: table level, row level, and column level. When you reference a column with tablename.columnname, you actually use tablename to reference each row in the table, then use the column name to get to a column in the row. In most programming languages, you use dot (.) to separate the field names at each level, such as name.firstname. You can have an XML document as simple as name with two sub-elements - firstname and lastname in an XML column like this:
<name> <firstname>Guogen</firstname> <lastname>Zhang</lastname></name>
Now how can you get the firstname and lastname in the document? XPath is the only solution in SQL: You can use /name/firstname to get the firstname, and /name/lastname to get the lastname. So what's the difference of XPath /name/firstname from name.firstname? You start with a slash (/), and replace the dot with slash. Each slash gets you to the next level. You can use that to get any field or structure in XML data.
The second feature is repeating, like an array, but very flexible. Take a simple example, you can list multiple phone numbers in one XML column, and also associate an attribute for each phone number element to tell it's a wired phone or cell phone, or work phone or home phone:
<phones name="Guogen Zhang"> <!-- fake numbers --> <phonenumber type="home">408-555-1234</phonenumber> <phonenumber type="cell">408-555-2345</phonenumber> <phonenumber type="work">408-463-2012</phonenumber></phones>
To get a specific phone number, you can use array index notation:/phones/phonenumber.  is a short-hand for [fn:position()=2]. You can generalize this predicate to other search condition: /phones/phonenumber[@type ="cell"]. This will get the second phone number also. Here @type is to refer to attribute "type". Attributes are those things in the start tag of an element (between < and >). And you use quotes around "cell" to mean a string comparison. In contrast, for a numeric comparison you would use numeric values without quotes. For example, size = 5.
Two more commonly seen "operators" are "//" and ".." (abbreviated axes). "//" is to look at any level under a certain level (roughly descendant-or-self). For example, //phonenubmer is looking for phonenumber in the entire document. ".." is going back up one level (parent axis). Now you should know why it's called XPath, as it's like file paths in a hierarchical file system.
Now let's look at where XPath is used in SQL/XML. XPath is used in the XMLEXISTS predicate, and the XMLQUERY and XMLTABLE functions. For example, the following query retrieves the quantities of "Baby Monitor" items from purchase order documents:
SELECT XMLQUERY('declare namespace ipo="http://www.example.com/IPO";/ipo:purchaseOrder/items/item[productName = "Baby Monitor"]/quantity' PASSING XMLPO)FROM PURCHASEORDERSWHERE XMLEXISTS('declare namespace ipo="http://www.example.com/IPO";/ipo:purchaseOrder/items/item[productName = "Baby Monitor"]' PASSING XMLPO)#
In the above, we assume we have a table PURCHASEORDERS containing XMLPO column of type XML. In XMLQUERY and XMLEXISTS, PASSING is the keyword to pass the XML column from the SQL world into the XML (XPath/XQuery) world. The XMLEXISTS in the above example is to search for purchaseOder with an item whose productName is "Baby Monitor", while XMLQUERY is a scalar function to extract quantity for the baby monitor items. XMLEXISTS is the only main predicate on XML data in the SQL world (the other predicate on XML is IS [NOT] NULL). You cannot use other SQL comparisons on an XML value, but you get many comparisons within the XML (XPath) world. Also in the above query example, I illustrated the use of XML namespaces. To see purchase order document examples and more query examples, see the XMLQuickDemo I referenced in the previous two blog entries.
Isn't XPath easy to start with? For more details and tutorials on XPath, you can Google "XPath tutorial" to get very good web sites on the top of the list, such as w3schools.com and zvon.com. Try some examples, you will soon become comfortable with XML and XPath! (well - you can pick up the terminology gradually.)
Yes, you almost need nothing to set up before you try out pureXML in DB2 9 for z/OS if you are in DB2 9 NFM and have SPUFI for DB2 9. Here is how: download XMLQuickDemo I just posted in my previous blog entry. And log on to TSO and get to SPUFI, and copy the SQL statements from the download, and run from there.
Within the downloaded zip file, statements from the files with a name starting with "1" to "8" can be tried out in sequence from SPUFI without WLM/Java/XSR setup. "ADropTable.CLP" is used to clean up after yourself.
Chances are that you will run successfully. Possible problems you may encounter:
- Authorization problem: you need the privilege to create table and use bufferpool BP16K0 for XML column data.
- SPUFI abend, most likely you need the correct SPUFI version for V9.
- SQL error complaining z/OS XML system services, most likely you are using z/OS R1.7 that requires APAR OA16303 for XML system services.
- DB2 complains syntax on XPath, most likely you need to set Editor case-sensitive, also pay attention to statement termination character as semi-colon may be used within XPath so # is used.
- Also set output record length long enough to see the output. Otherwise there will be truncation of XML result. Check continue after warning to get all rows after truncation.
When you try your own XML documents and queries, especially with large XML documents or results, you may encounter sqlcode: -904 sqlstate: 57011sqlerr Message: UNSUCCESSFUL EXECUTION CAUSED BY AN UNAVAILABLE RESOURCE. REASON 00C900D1, TYPE OF RESOURCE 00000907, AND RESOURCE NAME... which means you need to increase LOBVALA zparm setting, default was 1024KB. DB2 uses LOB functionality for XML data bind-in and bind-out, but no LOB used for XML database storage.
DB2 9 XML column type and built-in functions on XML type do not require XML schemas. Therefore, if you don't use XML schemas, you don't need XSR, which requires Java stored procedure setup and WLM setup. Only when you need XML schemas for schema validation or annotated-schema decomposition, you need the additional setup.
Once you go through these, you can try to embed SQL/XML queries in your COBOL or PL/I applications. You can just use LOB or VARCHAR host variables to hold XML data.
Happy playing with pureXML!
Last week, December 8 and 9, 2008, I visited Kansas City and Topeka with the help of Jeff Mucher, a DB2 advisor from IBM at Dallas. On the 12/8 afternoon, I presented Introduction to pureXML in DB2 9 for z/OS to the Heart of America DB2 Regional User Group at a hotel in Overland Park, with an audience of over 30 people. In the middle, I also did a live demo using CLP (Command Line Processor), connecting to the DB2 9 on a DemoNet native machine at Austin. I have put the CLP scripts and their output capture called XMLQuickDemo online so you can download to try out yourself. The presentation was well received with quite some interesting questions and discussions. At least one friend from DST told me right after the presentation that the DB2 for z/OS XML features are very impressive. I was happy to hear that!
Here are a few words I've heard from customers describing XML in DB2 9 for z/OS or the developers (including me :-) ): quite impressed/very impressive, brilliant, genius, clever. Impressive, huh?!
On the 12/9, Jeff and I visited a DB2 client at Topeka, KS. We met with about a dozen developers, DBAs and System Programmers to introduce DB2 XML features and discuss their application scenarios. We saw DB2 XML well fit in three scenarios for their applications:
- XML for object persistence.
- XML for flexible change representations.
- XML for event log for auditing/regulatory compliance.
The common theme in these usage scenarios is flexibility, flexibility, and flexibility. We saw more and more of these XML application patterns.
Beth wanted me to confirm if her description of XML is correct: it's more like a delivery truck, it could be UPS, FedEX, or any other truck, we don't care. We handle it based on its content. YES!
This trip had special meaning for me. I was attending KU for 2 years over 14 years ago, and this was the first time I got back to Kansas after so many years. And I was able to meet with one of my classmates not seen for over 14 years, and also one of our former interns, now a CS faculty member at KU. Also I seldom experienced snowy weather these years, and it was snowing on 12/9! What a colorful trip!
I'd like to describe one performance result at our lab of a workload that simulates an auditing application. It uses one XML column to store all kind of events in small XML documents. There are 210 XML indexes created on the XML column. For each XML document, there are about 10 indexes that will have keys generated, and the rest does not have hit. This large number of indexes enable efficient diverse queries on the event log.
The result is that the overhead of 210 XML indexes caused only 40% degradation compared with only 10 XML indexes that generates XML keys always. This is a pretty good result! We have an APAR PK75613 (overriding PK66218) to improve XML index keygen performance. Applying this will help reduce XML value index overhead in general.
By the way, if you'd like to know what APARs to apply for XML features in DB2 9 for z/OS, look into the info APAR II14426.