Last week one of our business partners, Thunderhead Ltd
that they were developing a new component (called Context Engine) for their Thunderhead NOW platform that is based on DB2 pureXML. They say that DB2 pureXML is "an ideal complement" for their product. But DB2 is not just a complement. Thunderhead is actually doing a deep embed of DB2 pureXML, which means that instead of two separate products, DB2 pureXML will actually be embedded into Thunderhead's new Context Engine component. This should make it easier for customers to purchase, since they don't have to buy two separate products, but more importantly, it means that customers will get tighter integration of the technologies.
This is not the first time that we have worked with Thunderhead around DB2 pureXML. Earlier this year, we worked with them to develop a solution for derivative trade confirmations. And before that, we worked with them on a solution for insurance correspondence automation.
So if you don't want to wait until the Context Engine comes out later this year to see Thunderhead with DB2 pureXML, check out these solutions. In fact, this week at Sibos (the world's premier financial services event, according to their website), the solution for the derivatives trade confirmations is being demoed at the IBM booth. So if you just happen to be there, stop by and take a look! (Oh, did I mention that Sibos was in Vienna this year?)
-- Kate Riley Tennant[Read More
On this past Monday and Tuesday, I attended the first DB2 9 for z/OS pureXML Proof of Technology (PoT) event in St. Louis, organized by Paul Bartak, one of our Senior Executive IT Specialists in the field, at the IBM Customer Briefing Center facility near St. Louis International Airport.
I stayed there for two days. The first day is for a group of IBMers (with one customer who couldn't come the second day). We listened to Paul's presentation and exercised through the labs, learning pureXML and testing the PoT. I had fun setting up the network for ThinkPad's with VMWare, and testing connections to the DEMOnet DB2 9 for z/OS on a native z machine at Austin in the morning. I was amazed that a bunch of software guys could handle the hardware and network wiring without too much trouble (well, with some remote assistance). We even identified a malfunctioning 5-port Ethernet switch.
On the second day, there were 12 participants, from nearby three DB2 for z/OS clients, including a university from Illinois. They are DBAs, application architects and developers, and data architects. Paul gave a presentation, about 75 minutes, covering the basics of XML and pureXML features, to prepare for the labs. Then everyone went through the labs on one's own pace. I was impressed how fast the participants did. It took me about 3.5 hours to finish the lab on my first-day trial. But I saw many of these customers finished the lab in a shorter time than I did! One DBA told me that she really liked it. She had the concept and now with the lab, it's really flowing. Oh, she thanked me for delivering XML capabilities in DB2 for z/OS.
I had chance to chat with some of the participants on their intended use of pureXML. They were thinking to use XML to keep logs for auditing, store diverse system configuration information, or store purchase orders, or even for displaying data structures during debugging, etc. And I also learned some features that would be nice to have in the future.
This pureXML PoT uses a real DB2 9 for z/OS system, and is unique in the following aspects:
- It covers all the pureXML main features, including DBA tasks, SQL/XML queries with XPath, schema validation and decomposition, with real examples.
- It uses sample tables to generate XML data so there is no need to prepare XML data from other places.
- It uses the familiar tools, SPUFI, SDSF, CLP and also IBM Data Studio for schemas, and OSC for query plans and tuning.
- It includes all the DDL, Schemas, and Queries so you can work further on your own after the PoT.
We are lucky to have someone like Paul, who knows what customers need, understands the technology, and put lot of efforts to deliver a PoT that people can really learn to understand the technology. I'd like to thank Mark Wilson for his contribution to this PoT also.
This pureXML PoT will be offered in different areas. If you are interested in (having your folks) attending such pureXML PoT, please contact your local DB2 advisor, or contact me or Paul. It will also be offered as a Hands-on Lab during IOD 2008 Conference at Las Vegas,
Session: HOL-2584A DB2 9 for z/OS pureXML for DBAs
Time: Wed, 29/Oct, 10:00 AM - 01:00 PM
Location: Mandalay Bay South Convention Center - Lagoon D
Take note if you are attending IOD and interested in playing pureXML on DB2 9 for z/OS there.
--Guogen (Gene) Zhang (GGZ)[Read More
Since DB2 9 for z/OS beta time in early 2006, I have been frequently asked by customers the question of the value of pureXML in DB2. Although more and more folks now realize the value of XML support in DB2, I'd like to touch upon this topic here.
XML is ubiquitous and is being used to represent all kinds of data in industries, government, and academics etc. If you are an application developer in the enterprise environment dealing with vastly increasing XML data asset and ever-changing business demand and are dissatisfied with the limitations of traditional relational databases, DB2 9 for z/OS provides XML management capability as well as relational data management. Unlike XML-enabled databases and other database offerings that rely on Large Object storage or transform XML into relational data, the pureXML technology is specifically designed and optimized for XML hierarchically structured data, and provides proven enterprise-class reliability, availability, scalability, performance and security for XML data that you have come to expect from DB2 for z/OS.
Since the XML data model is hierarchical, the same as IMS. Quite often, people ask what the differences are between XML and IMS. I think there are two: XML data is very flexible while IMS data structures are rigid; and XML has high-level query languages while IMS has procedural DL/I as its native language (although our IMS friends provided SQL and XQuery interface). SQL/XML with XPath provided in DB2 9 for z/OS makes application processing XML much easier.
We've seen the following XML usage scenarios in DB2, with more and more interesting application scenarios popping up. So this list won't be complete. Let us know if you have interesting scenarios to share.
- The first and most direct case is to process XML data, including industry standard XML format (such as FIXML, FpML, ACORD, UNIFI, MISMO etc.), forms and reports (such as XBRL), you can store it, and retrieve it, just like relational data. If you use COBOL or PL/I, to make up the weak XML processing capability in these languages, you can have applications to invoke SQL to process XML data. If you need to connect to the existing back-end systems, you can use the XMLTABLE function to convert XML data into relational views, while you can develop new applications on the XML data.
- Second, pureXML can help develop applications that handle versatile schemas that change frequently, and also help develop end-user customizable applications, which is particularly important for ISVs and IT service departments in large enterprises, where you frequently need to adjust the applications for end-users for diverse information, such as product specification and customer information.
- The third application scenario is to process sparse attribute values, such as medical records, or forms, where there are many fields overall, but only a few of them are applicable for each case. If you use relational approach, you would have to use a table very wide, with many many columns, but most of the column data are null. And you can use one XML column to handle that because XML can hold as many items as you want in a single document.
- The next usage is for object persistence. If you want to persist objects in an application, if you use a relational database, you would usually normalize objects into multiple tables, it's like you disassemble a car when you park in your garage overnight, certainly not convenient. Now with XML, it's much more flexible, you can use a single column to contain at least hiearchical data. More importantly, you can create indexes on these persistent objects. Unlike LOBs, you have no way to do that. So that makes XML for persistent data much more efficient when you need to search. Do you ever have an experience that you wish you could store an array into a single column yet you still can search it? Now you can achieve that using XML. In an extreme case, you could design all the tables with a primary key and a single XML column.
- Yet another application scenario is to migrate from legacy data models, such as network or hieararchical data model. If you migrate a hiearchical data model to a relational model, you need to introduce artificial keys, but with XML you don't need that, it's straight-forward, and you will have benefit of a high-level declarative query language.
- Next application scenario is to generate Web pages because you can use XHTML for web pages that can be generated directly from an SQL/XML statement.
- Last but not least, you can develop web services using DB2 applications to provide or consume web services directly because the web services use XML data. XML support in DB2 enables end-to-end XML solutions in an SOA environment.
In summary, DB2 pureXML makes XML data consumable, and provides the following business value:
- First, it will accelerate application development, reduce system complexity, and improve developer productivity. This will lead to improved time-to-market and reduce IT backlog.
- Second, it increases business agility, it will be able to help develop end-user customizable applications, easily accommodate the changes to data and schemas, and update applications rapidly and reduce the maintenance cost.
- Third, it can improve the business insight, help you develop applications that access information in otherwise unexploited documents, including for business intelligence and business monitoring.
- Last, you no longer need to store XML in a separate system, you can consolidate system resources onto System z, to reduce floor spaces, lower the energy consumption and people cost, and also use specialty engines for XML processing with low CPU cost, and increase security of critical XML data and simplify regulatory compliance.
If you have projects that you feel will benefit from pureXML, I urge youto consider starting with a pureXML PoT I mentioned last week.
The views expressed here are mine and do not necessarily reflect IBM's official position. Contact me at gzhang at us.ibm.com if you'd like a private conversation on the topic.
I was frequently asked by customers as well as IBMers, "what is missing without XQuery?" I'd say that using SQL/XML with XPath support in DB2 9 for z/OS, you miss almost nothing significant in terms of functionality (and performance), and you even have capabilities beyond XQuery 1.0 in SQL/XML! You can build sophisticated applications processing XML data using SQL/XML with XPath today.
XPath is used to navigate within an XML document, but it cannot construct a new document. XQuery includes XPath as its sub-language for the navigational capability. XQuery includes FLWOR expressions and document constructors as two other major features for more complex joins and document generation, among other expressions, such as if-then-else.
SQL/XML queries can embed XPath expressions. An XPath expression embedded inside the XMLQUERY() scalar function can extract pieces of documents. SQL XMLTABLE() function is one of the most powerful functions in processing XML, and with XPath, it provides the "FLW" (for-let-where) capability of a FLWOR expression with "tuples" as the result. SQL XMLAGG() function together with SQL XML constructor functions provides the "OR" (order by-return) part of a FLWOR expression. You can pretty much produce the same result as XQuery using SQL/XML. To prove that, I have converted almost all the queries in W3C XQuery use cases to SQL/XML with XPath (except for 2 due to missing fn:namespace-uri() from DB2 9 for z/OS). The queries are downloadable from here, and executable on DB2 9 for z/OS.
What about performance? You may ask. From DB2 for Linux, UNIX, and Windows (LUW) experiences, SQL/XML queries usually perform equally well or better. For example, when you need to have some grouping beyond the natural hierarchy in XML data, in XQuery, you need to use fn:distinct-values() first, and then for each value find interested aggregate data, which is expensive. In SQL, you can use explicit GROUP BY clause. This is where SQL/XML goes beyond XQuery. Furthermore, you can use XMLTABLE() function to create relational views on XML data, and apply all available SQL functionality over that data for reporting and analytics, including BI functions. This is another area where SQL/XML goes beyond XQuery 1.0. See there articles about XMLTABLE() part 1 and part 2 for more details of its usage.
Looking beyond the languages themselves, we see SQL has well established standardized application environment support. XQuery could be used in any environment with XML input and produce XML as output. However, the XQuery environment takes time to standardize, build and mature. If you want XQuery to access relational data or hybrid relational-XML data, there is no native interface of its own. Use XQuery as the primary (top-level) language embedded/invoked in other host languages? Not many vendors support that from the popular database servers, and no standardized interfaces except for Java, which did not gain much acceptance. XQuery supported by DB2 for LUW as a top-level language does not return rows and does not support host variables or parameter markers yet like in SQL. In contrast, SQL/XML makes extending existing applications with XML capability much easier.
All these reasons led to SQL/XML being the dominant language for XML database application development today. That being said, XQuery adopts a new language style to weave construction with other expressions, which is very nice for document generation. You will be able to use the XQuery features in our next major release.
DB2 for z/OS clients are very CPU-conscious. So are we the developers in DB2, and performance has been one of the most critical factors during our design and development for XML support. We probably have one of the most efficient XML parsers implemented in software in the world, co-designed and custom-made by our z/OS friends; and we have a high-performance XML schema validating parser also. As a result, DB2 for z/OS has very efficient insert, load, and update. (Deletion and retrieval of XML documents perform very well too but do not need XML parsing).
To reduce the CPU cost, DB2 workloads from DRDA can benefit from zIIP specialty engines, and XML parsing can now be redirected to zIIP or zAAP. I hope to provide some details here.
As part of DB2 workload from DRDA, XML processing is eligible for zIIP redirection, except for XML validation (that is a UDF not eligible for zIIP for the time being). In addition, recent enhancements made the z/OS XML parsing system service 100% eligible for redirection to zIIP or zAAP, depending on the mode under which it is running:
- For the z/OS XML parsing system service invoked from the TCB mode, it's 100% eligible for zAAP redirection.
- And for the z/OS XML parsing system service invoked from the enclave SRB mode, it's 100% eligible for zIIP redirection.
Before redirection of the z/OS XML parsing system service, DB2 9 XML workload (including XML parsing system service) from DRDA via TCPIP in the following scenarios run under the SRB mode is zIIP eligible:
- WebSphere for z/OS (JCC T4)
- WebSphere distributed (JCC T4)
- Distributed transactions using DB2 Connect
- Native SQL Stored Procedure
In addition, Index Build phase of LOAD is specially executed under SRB mode to make it eligible for zIIP also (applicable to XML indexes, but no XML parsing involved).
Now with the redirection of z/OS XML parsing system service, the XML parsing invoked for the above scenarios may qualify for additional redirection at 100% rate. For example, a workload that used to have 49.6% redirected to zIIP can now have 56.60% redirected to zIIP.
For DB2 9 XML workload in the following scenarios not from DRDA run under the TCB mode, z/OS XML parsing system service is 100% eligible for zAAP redirection:
- WebSphere® for z/OS (JCC T2)
- Stored Procedure/UDF/Trigger
- Native SQL PL stored procedure
- Call Attach (CAF)
- RRS Attach
- LOAD utility
In addition, XML parsing system service invoked by a stored procedure/UDF/trigger from DRDA via TCPIP run under the TCB mode is now also 100% eligible for zAAP. For example, we've observed up to 36% or even 48% of XML LOAD CPU redirected to zAAP. This is the first time that part of a DB2 workload can be redirected to a zAAP (through the system services).
DB2 9 APAR PK50575 adds support for accounting zAAP redirection in addition to that already provided for zIIP, and records CPU time on zAAP and zIIP engines when they exist.
In order to estimate redirection, RMF workload activity report will show specialty engine eligible CPU through APPL% AAPCP and APPL% IIPCP, when setting PROJECTCPU = YES in IEAOPTxx member in SYS1.PARMLIB.
Relevant z/OS levels and APARs to enable XML parsing system services 100% eligible for specialty engines are the following:
- zAAP - z/OS V1.9, APAR OA20308 for z/OS V1.8, V1.7.
- zIIP - z/OS V1.10, APAR OA23828 for z/OS V1.8, V1.9.
For more details about zIIP redirection and some XML performance numbers, see this whitepaper: DB2 9 and z/OS XML System Services Synergy Update.
At the moment, DB2 9 for z/OS XML schema validation does not invoke z/OS system services available in z/OS V1.10. As a future direction, when DB2 XML validation is enhanced to invoke the system services, it will be 100% eligible for zIIP and zAAP redirection also.
In summary, on z/OS, XML parsing is extremely efficient, and can be redirected 100% to zIIP or zAAP. DB2 XML processing takes advantage of the redirection to lower CPU cost.
Last Friday (October 17) was a long day for me as I returned to San Jose, California from Warsaw, Poland, after attending IDUG Europe 2008 Conference, held in Hilton Warsaw from October 13 to 16 (October 17 is for one-day seminars).
Here are some important impressions from my personal experiences at the conference.
- Motivating opening keynote presentation by Marc Woods, World Champion Paralympic Athlete. Inspiring, yet humorous stories on winning gold medals, especially the one for team melody swimming, emphasizing the importance of team work and leadership.
- Ever-increasing interests in pureXML.There were quite a few sessions about pureXML.Matthias presented two sessions about SQL/XML and best practices, and I presented one for DB2 for z/OS DBAs. I have some slight update on my presentation, available here:DB2 9 for z/OS pureXML - Survival Guide for DBAs.These three sessions were completely full, some had to be turned away. Next time, the planners should probably arrange larger rooms for XML sessions to accommodate more audience.
- We had one Special Interest Group (SIG) session on XML. Participants were Phil Grainger of CA, Jim Dee of BMC Software, Matthias Nicola of DB2 for LUW, and myself. Moderated by Hans Miseur. There were very interesting discussions.
- There were threads for application development that may be for z/OS or LUW or both. Not many attendees realized there was another session on the Java driver for XML, by Bilung (Brandon) Lee, in the application development thread. I hinted Brandon to include something like z/OS or XML in his title to attract more people.
- There was a presentation on MQTs on z/OS also. The room was full also.
- There were many other excellent sessions to attend, including Curt's closing keynote.
In one of Matthias' session, one of z/OS customers asked about pureXML samples. I have put IOD 2007 DB2 9 for z/OS XML Hands-on Lab material for download. This package contains the lab material used in IOD 2007 XML lab, including DDL, Java samples, query samples, examples for schema registration, validation, and decomposition. This material is for educational use only without any warrant, express or implied, as usual.
Some fun fringes follow.
The DB2 25th Anniversary Party on Wednesday October 15 night was held in the middle of nowhere. The bus drivers got lost, on very narrow country road, and the bus I was on took about 1.5 hours to get to the party place. It was amazing that the driver was able to turn the bus around in the dark on an extremely narrow road in the field. The Polish food was good. There was traffic congestion near 11pm on Warsaw streets on the way back to the hotel.
Jim Dee of BMC software was the moderator for my session. He made fun of me in his introduction - saying that I was a genius, and knew everything about XML in the world. That definitely helped my session evaluation! :-)
We were recommended by a US attendee (forgot his name) a restaurant near Hilton Hotel called The Inn under the Red Hog, which used to cater forcommunist dignitaries like Lenin, Castro, and Chairman Mao etc. Four of us (Chinese attendees) tried it out for Thursday dinner. We shared three main dishes: roasted pork ribs, a pork shank, and Mao's chicken (just like orange chicken). They were Polish dishes, heavy flavor (need beer to dilute), big portion, enough for us. Compared with Sekret in the old town, recommended by Hilton Hotel, the Red Hog seems to be better and less expensive. Reservation is recommended as we were rushed out by next reservation.
Looking ahead, next week, IOD Global Conference will be on at Las Vegas. We have many pureXML presentations and Hands-on Labs. If you'd like to try the hands-on labs, I urge you to come even if the enrollment was full. We will admit stand-byers after 5-10 minutes in case of no-shows.
Last week, just before the Thanksgiving holiday in the US, Cindy Saracco from our team, along with Tad Worley, one of our Infosphere Warehouse experts, published a new paper on developerWorks titled: "Create business reports for XML data with Cognos 8 BI and DB2 pureXML: Two techniques to help you get started." We have been getting more and more questions about using Cognos with DB2 pureXML and thought it would be good to get some information out there on the subject. Because XML messages frequently contain important business data, companies are increasingly interested in querying and reporting on this data. The paper takes you through two different methods, step by step, showing examples along the way. Cindy is one of the best technical writers that I know of (inside and outside IBM), so if this topic is of interest, be sure to check out the article.
And in January, we will have a related article coming out titled "Reporting on pureXML data with QMF/DataQuant" so look for that as well.
Kate Riley Tennant
Last week, December 8 and 9, 2008, I visited Kansas City and Topeka with the help of Jeff Mucher, a DB2 advisor from IBM at Dallas. On the 12/8 afternoon, I presented Introduction to pureXML in DB2 9 for z/OS to the Heart of America DB2 Regional User Group at a hotel in Overland Park, with an audience of over 30 people. In the middle, I also did a live demo using CLP (Command Line Processor), connecting to the DB2 9 on a DemoNet native machine at Austin. I have put the CLP scripts and their output capture called XMLQuickDemo online so you can download to try out yourself. The presentation was well received with quite some interesting questions and discussions. At least one friend from DST told me right after the presentation that the DB2 for z/OS XML features are very impressive. I was happy to hear that!
Here are a few words I've heard from customers describing XML in DB2 9 for z/OS or the developers (including me :-) ): quite impressed/very impressive, brilliant, genius, clever. Impressive, huh?!
On the 12/9, Jeff and I visited a DB2 client at Topeka, KS. We met with about a dozen developers, DBAs and System Programmers to introduce DB2 XML features and discuss their application scenarios. We saw DB2 XML well fit in three scenarios for their applications:
- XML for object persistence.
- XML for flexible change representations.
- XML for event log for auditing/regulatory compliance.
The common theme in these usage scenarios is flexibility, flexibility, and flexibility. We saw more and more of these XML application patterns.
Beth wanted me to confirm if her description of XML is correct: it's more like a delivery truck, it could be UPS, FedEX, or any other truck, we don't care. We handle it based on its content. YES!
This trip had special meaning for me. I was attending KU for 2 years over 14 years ago, and this was the first time I got back to Kansas after so many years. And I was able to meet with one of my classmates not seen for over 14 years, and also one of our former interns, now a CS faculty member at KU. Also I seldom experienced snowy weather these years, and it was snowing on 12/9! What a colorful trip!
I'd like to describe one performance result at our lab of a workload that simulates an auditing application. It uses one XML column to store all kind of events in small XML documents. There are 210 XML indexes created on the XML column. For each XML document, there are about 10 indexes that will have keys generated, and the rest does not have hit. This large number of indexes enable efficient diverse queries on the event log.
The result is that the overhead of 210 XML indexes caused only 40% degradation compared with only 10 XML indexes that generates XML keys always. This is a pretty good result! We have an APAR PK75613 (overriding PK66218) to improve XML index keygen performance. Applying this will help reduce XML value index overhead in general.
By the way, if you'd like to know what APARs to apply for XML features in DB2 9 for z/OS, look into the info APAR II14426.
Yes, you almost need nothing to set up before you try out pureXML in DB2 9 for z/OS if you are in DB2 9 NFM and have SPUFI for DB2 9. Here is how: download XMLQuickDemo I just posted in my previous blog entry. And log on to TSO and get to SPUFI, and copy the SQL statements from the download, and run from there.
Within the downloaded zip file, statements from the files with a name starting with "1" to "8" can be tried out in sequence from SPUFI without WLM/Java/XSR setup. "ADropTable.CLP" is used to clean up after yourself.
Chances are that you will run successfully. Possible problems you may encounter:
- Authorization problem: you need the privilege to create table and use bufferpool BP16K0 for XML column data.
- SPUFI abend, most likely you need the correct SPUFI version for V9.
- SQL error complaining z/OS XML system services, most likely you are using z/OS R1.7 that requires APAR OA16303 for XML system services.
- DB2 complains syntax on XPath, most likely you need to set Editor case-sensitive, also pay attention to statement termination character as semi-colon may be used within XPath so # is used.
- Also set output record length long enough to see the output. Otherwise there will be truncation of XML result. Check continue after warning to get all rows after truncation.
When you try your own XML documents and queries, especially with large XML documents or results, you may encounter sqlcode: -904 sqlstate: 57011sqlerr Message: UNSUCCESSFUL EXECUTION CAUSED BY AN UNAVAILABLE RESOURCE. REASON 00C900D1, TYPE OF RESOURCE 00000907, AND RESOURCE NAME... which means you need to increase LOBVALA zparm setting, default was 1024KB. DB2 uses LOB functionality for XML data bind-in and bind-out, but no LOB used for XML database storage.
DB2 9 XML column type and built-in functions on XML type do not require XML schemas. Therefore, if you don't use XML schemas, you don't need XSR, which requires Java stored procedure setup and WLM setup. Only when you need XML schemas for schema validation or annotated-schema decomposition, you need the additional setup.
Once you go through these, you can try to embed SQL/XML queries in your COBOL or PL/I applications. You can just use LOB or VARCHAR host variables to hold XML data.
Happy playing with pureXML!
No matter how familiar one is with SQL, DB2, and relational data model, the first time getting exposed to XML and XPath, she would likely feel overwhelmed. Getting into the XML world is not a small step. But here I'd like to demystify XPath for SQL people who is just getting started with XML and XPath. I will give you a very gentle introduction to XPath and how you can use it to query XML data in SQL/XML.
It is actually very easy to start: we need to understand thatXML data model is a natural but significant extension to the relational model, it features two very powerful structural capabilities - nesting and repeating. First, it allows nesting, to any level (DB2 allows 128 levels of nesting maximum). Nesting is very common in data structures. For example, you want to separate a name into first name and last name, then you have two levels. You can view a table to have three levels: table level, row level, and column level. When you reference a column with tablename.columnname, you actually use tablename to reference each row in the table, then use the column name to get to a column in the row. In most programming languages, you use dot (.) to separate the field names at each level, such as name.firstname. You can have an XML document as simple as name with two sub-elements - firstname and lastname in an XML column like this:
<name> <firstname>Guogen</firstname> <lastname>Zhang</lastname></name>
Now how can you get the firstname and lastname in the document? XPath is the only solution in SQL: You can use /name/firstname to get the firstname, and /name/lastname to get the lastname. So what's the difference of XPath /name/firstname from name.firstname? You start with a slash (/), and replace the dot with slash. Each slash gets you to the next level. You can use that to get any field or structure in XML data.
The second feature is repeating, like an array, but very flexible. Take a simple example, you can list multiple phone numbers in one XML column, and also associate an attribute for each phone number element to tell it's a wired phone or cell phone, or work phone or home phone:
<phones name="Guogen Zhang"> <!-- fake numbers --> <phonenumber type="home">408-555-1234</phonenumber> <phonenumber type="cell">408-555-2345</phonenumber> <phonenumber type="work">408-463-2012</phonenumber></phones>
To get a specific phone number, you can use array index notation:/phones/phonenumber.  is a short-hand for [fn:position()=2]. You can generalize this predicate to other search condition: /phones/phonenumber[@type ="cell"]. This will get the second phone number also. Here @type is to refer to attribute "type". Attributes are those things in the start tag of an element (between < and >). And you use quotes around "cell" to mean a string comparison. In contrast, for a numeric comparison you would use numeric values without quotes. For example, size = 5.
Two more commonly seen "operators" are "//" and ".." (abbreviated axes). "//" is to look at any level under a certain level (roughly descendant-or-self). For example, //phonenubmer is looking for phonenumber in the entire document. ".." is going back up one level (parent axis). Now you should know why it's called XPath, as it's like file paths in a hierarchical file system.
Now let's look at where XPath is used in SQL/XML. XPath is used in the XMLEXISTS predicate, and the XMLQUERY and XMLTABLE functions. For example, the following query retrieves the quantities of "Baby Monitor" items from purchase order documents:
SELECT XMLQUERY('declare namespace ipo="http://www.example.com/IPO";/ipo:purchaseOrder/items/item[productName = "Baby Monitor"]/quantity' PASSING XMLPO)FROM PURCHASEORDERSWHERE XMLEXISTS('declare namespace ipo="http://www.example.com/IPO";/ipo:purchaseOrder/items/item[productName = "Baby Monitor"]' PASSING XMLPO)#
In the above, we assume we have a table PURCHASEORDERS containing XMLPO column of type XML. In XMLQUERY and XMLEXISTS, PASSING is the keyword to pass the XML column from the SQL world into the XML (XPath/XQuery) world. The XMLEXISTS in the above example is to search for purchaseOder with an item whose productName is "Baby Monitor", while XMLQUERY is a scalar function to extract quantity for the baby monitor items. XMLEXISTS is the only main predicate on XML data in the SQL world (the other predicate on XML is IS [NOT] NULL). You cannot use other SQL comparisons on an XML value, but you get many comparisons within the XML (XPath) world. Also in the above query example, I illustrated the use of XML namespaces. To see purchase order document examples and more query examples, see the XMLQuickDemo I referenced in the previous two blog entries.
Isn't XPath easy to start with? For more details and tutorials on XPath, you can Google "XPath tutorial" to get very good web sites on the top of the list, such as w3schools.com and zvon.com. Try some examples, you will soon become comfortable with XML and XPath! (well - you can pick up the terminology gradually.)
If you are a System Programmer or System DBA, you would find nothing significant that is really new in pureXML for DB2 9 for z/OS, thanks to our architectural design principle: leveraging mature optimized infra-structure in DB2 for XML data management. We use familiar table spaces (pages and records), indexes, buffer pools, and locking scheme for XML data, and provide DBAs the same utilities and tools to administrate the familiar objects, even though XML data model is hierarchical and even though we provide application developers "revolutionary" new weapons (SQL/XML with XPath) to process XML data in DB2.
If this is the first time you looked at XML in DB2 9 for z/OS, you can take a quick look at XMLQuickDemo, where the second CLP 2ListObjects.CLP finds the DB objects involved in storing an XML column data, for background.
You probably would be a bit surprised (and feel relieved?!) when you realized (or you are told now) that there are no new utilities specific for XML, and little setup to start with XML in DB2 9 for z/OS. I've looked again and again to see what's new specific to XML. I've found the following list.
- Implicitly created DB objects for XML – DBA cannot create explicitly.
- XML keyword in some utilities, such as LOAD, UNLOAD, LISTDEF etc.
- A new XML lock type, ID '35'x in traces, and it appears in IFCID 20, 21, 107, 150, 172, and 196.
- XMLDATA in XML tables contains StringIDs in a catalog table SYSIBM.SYSXMLSTRINGS (dictionary) specific to each catalog. Therefore, UNLOAD FROMCOPY is restricted. And copies from DSN1COPY cannot be moved to another DB2 system.
- XML indexes: XPath and keys - for Application DBAs.
- XML Schema registration - for Application DBAs.
In the following, I will briefly describe what is new in pureXML for system programmers and DBAs, and what remains the same.
Setup and Configuration
As I indicated in a previous posting, getting started with pureXML requires almost nothing to set up once you get to DB2 9 for z/OS NFM. You probably only need to take care of some authorization issues, business as usual.
When the application folks get serious, you may need to take care of the zparms that are related to XML: XMLVALA, XMLVALS, LOBVALA, LOBVALS - these four are related to virtual storage limit. LOB zparms are involved because LOB manager is used for XML data bind-in and bind-out. Also the default buffer pool for XML data (BP16k0) needs to be changed.
XML Schema Repository (XSR) setup is only needed if your applications use schema validation or annotated-schema-based decomposition. Only then will you need to take care of Java stored proc and WLM setup. You would deal with SDSNLOAD and SDSNLOD2 also.
Utilities and Tools
No new utilities for XML, and no new tools for XML performance monitoring. You use all the existing and familiar utilities and tools to cover XML objects. You include XML objects using LISTDEF for backup and recovery operations. Some minor restrictions may apply to XML objects in some utilities. XML performance problems can be analyzed through accounting traces and performance traces. Business as usual.
XML indexes and XPath, and Access Types
Indexes are critical for query performance. No difference for XML queries, they require XML indexes. This is something new to application DBAs and architects. XPath is used to index XML data that is searched frequently. Even the new XPath is involved, the index infra-structure remains the same. One XML document may generate zero, one, or more index entries for an XML index.
Also some minor new access types are introduced when using XML indexes. They are DX/DI/DU for DOCID list access (single index), DOCID list ANDing, and ORing. You still get "R" for DOCSCAN evaluation of XMLEXISTS predicate.
XML Schema Registration
In case your applications would like to use XML schemas, you need to set up XSR, and register XML schemas. Something new to application DBAs. There are tools to help, such as IBM Data Studio, in registering XML schemas. If you just want to use a simple tool, I recommend CLP (Command Line Processor).
I'd like also to emphasize that knowledge in XML, XPath, and SQL/XML will make you much more valuable at this SOA age, and in this tough economy. Take advantages of XML and pureXML in DB2 as many other people do! Let me know if you have any questions and concerns.
In this first posting of the new year, I'm going to share something non-technical. We are looking forward to the uptake of the pureXML projects in many of our clients and ISVs. As for any new technology, pureXML adoption takes leadership and courage. The reward will be ample, especially in this unprecedented, tough global economy. With pureXML, you could support new products and services in reduced time to market, achieve more with the same resources, and improve the bottom-line of the businesses.
Having read the book "Primal Leadership - learning to lead with emotional intelligence" by Daniel Goleman, et al. before, I recently came across a series of three Harvard Business Review (HBR) articles that are pretty much short versions of the book. "Leadership that gets results" focuses on six leadership styles.
The resonant styles are:
- the visionary/authoritative styles,
- the coaching style,
- the affiliative style, and
- the democratic style.
And the dissonant styles are:
- the pacesetting style and
- the command/coercive style.
"The best leaders don’t know just one style of leadership — they’re skilled at several, and have the flexibility to switch between styles as the circumstances dictate."
"What makes a leader" focuses on the emotional intelligence that is necessary to be effective leaders. The four emotional intelligence domains/skills are:
- social awareness, and
- relationship management(empathy and motivation).
"Primal leadership - the hidden driver of great performance" focuses on a process of five steps to achieve greater emotional intelligence:
- What do you want to be?
- Who are you now?
- How do you get from here to there?
- How do you make change stick?
- Who can help you (in the above process)?
If you are not familiar with this research result, I highly recommend you search on the web to find some summary in more detail, or get the book or articles.
Another book that I read over ten years ago is also one of the best-sellers many know well: "The seven habits of highly effective people" by Stephen Covey. The seven habits are:
- be proactive;
- begin with the end in mind;
- put first things first;
- think win/win;
- seek first to understand, then to be understood;
- sharpen the saw.
"The key to success in business is action. But in most companies, people are rewarded for talking - and the longer, louder, and more confusingly, the better." (from HBR article "The smart-talk trap") Talking is probably better than deceiving and bad politics. And trustworthiness and integrity are essential in a leader's character. (I don't know why I'm writing these sentences.)
I wish you fulfill your new year resolution! And make pureXML one of your technology priorities this year.
- Guogen (Gene) Zhang (GGZ)[Read More
To make it easier for DB2 for z/OS friends to find pureXML information specific to DB2 for z/OS, we've just created a DB2 for z/OS pureXML wiki page under DB2 XML wiki page to list z/OS-specific information. It may contain cross-references to other areas, and has initial content right now. We expect the content to accumulate over the time.
In the new wiki page, I've just uploaded DB2 9 for z/OS pureXML podcast series, recorded by Guogen Zhang (that's me :-) ) in 2008. This is a 10-part introductory podcast series on pureXML business value and technical knowledge. The first part covers overview and business value, and the follow-on 8 parts cover different aspects of pureXML, from query, schema, to utilities and performance. The last part talks about best practices. Each part is about 10-15 minutes. The podcast is in mp3 audio format together with transcripts in PDF.
Give us feedback on whether this introductory podcast series is useful to you. If so, we will prepare follow-on podcast for more advanced topics of your interests. Thank you.
Check back the DB2 for z/OS pureXML wiki page often. You never know what will pop up there.
-Guogen (Gene) Zhang[Read More
One of the key differences of XML from the Large Object (LOB) types in DB2 is that XML data can be indexed. The XML indexes supported in DB2 9 are also called XML value indexes, and queries on XML can use these indexes for performance.
An XML index example
An XML index is used to provide a mapping from a node value to its location. An XPath expression is required to specify the nodes within a document to be indexed. Two data types are supported for XML indexes in DB2 9 for z/OS: DECFLOAT for numeric values and VARCHAR(n) for string values. For example, the following CREATE INDEX DDL creates an XML index on TransRefGUID of the REQUESTXML column of ACORD data, as VARCHAR(24):
CREATE INDEX ACORD.ACORDINDEX1 ON ACORD.REQUEST(REQUESTXML)GENERATE KEYS USING XMLPATTERN'declare default element namespace "http://ACORD.org/Standards/Life/2"; /TXLife/TXLifeRequest/TransRefGUID' as SQL VARCHAR(24)
DB2 will take the value of /TXLife/TXLifeRequest/TransRefGUID as the key, and map to its logical location (DOCID, NODEID), and physical location (RID), using the existing B+-tree index infrastructure.This index can be used for queries that search on TransRefGUID. For example, the following XMLEXISTS predicate can potentially use this index:
XMLExists('declare default element namespace "http://ACORD.org/Standards/Life/2"; /TXLife/TXLifeRequest[TransRefGUID="2004-1217-141016-000012"]'PASSING REQUEST.REQUESTXML)
From queries, it is easy to figure out what kind of XML indexes can speed up the queries - by concatenating path steps from within a predicate to the steps outside the predicate, and using data types consistent between the queries and indexes.
Before explaining the index access procedure, I'd like to review the basic XML storage scheme so you can understand it better.
Basic XML storage scheme in DB2 9 for z/OS
The following picture depicts the high-level storage scheme for XML data.
At high-level, the XML data is stored in a separate table space, just like LOB data. The real XML data is stored in the XMLDATA column of the internal XML table. It contains the hierarchical data in records that can fit in 16KB pages. In order to support free movement of data records, logical links using NODEIDs are used. That's why we need a NODEID index to link records for a document. Similarly, since utilities, such as REORG, can be applied to the base table space and the XML table space independently, XML indexes do not contain base table RIDs, but XML table RIDs. In order to get to base table rows from XML indexes, we need the DOCID index on the base table (see below). That's why DB2 always creates a DOCID index on the base table and a NODEID index on the XML table as part of the storage scheme, although they are for totally different purposes. By XML indexes, we refer to XML value indexes created by users.
Basic XML index access plans
If you use EXPLAIN for a query and select some key columns from the PLAN_TABLE, you will see some new access type in the ACCESSTYPE column for SQL/XML queries. They are the following:
- DX: DOCID list access. DB2 searches an XML index and retrieves all the qualified DOCIDs. Then the DOCIDs are sorted and duplicates are removed. The DOCID list is further converted to a RID list of the base table through the DOCID index. Then the base table rows are fetched.
- DI: DOCID ANDing. DB2 intersects two DOCID lists from two XML indexes for conjunction of two XPath predicates.
- DU: DOCID ORing. DB2 unions two DOCID lists from two XML indexes for disjunction of two XPath predicates.
- M: Multi-index access.
If you see "R" (R-Scan) in the ACCESSTYPE for a table with an XMLEXISTS predicate, then DOCSCAN is applied for the XML column. No new type was introduced for the scan.
Here is an example for an index ANDing plan:
+---------------------------------------------------------------------+ | PLANNO | ACCESSTYPE | MATCHCOLS | ACCESSNAME | MIXOPSEQ | +---------------------------------------------------------------------+1_| 1 | M | 0 | | 0 |2_| 1 | DX | 1 | ACORDINDEX2 | 1 |3_| 1 | DX | 1 | ACORDINDEX1 | 2 |4_| 1 | DI | 0 | | 3 | +---------------------------------------------------------------------+
The following diagram illustrate the process of using the index ANDing plan. Step 4 is where a DOCID index is always used.
These are the basic XML index access plans. For example, NODEID and RID from XML indexes are not used for queries today. We are enhancing plans for better query performance, so expect more methods in the future.
Somethings specific to XML indexes
The same principles for relational indexes apply to XML indexes, such as create indexes only needed by queries and use REBUILD INDEX. The following are some unique features of XML indexes:
- Depending on the XPath in the XMLPATTERN of the CREATE INDEX statement and XML documents, DB2 may generate zero, one or more key entries for a document in an XML index.
- For a numeric index (DECFLOAT type), if a node identified by the XMLPATTERN contains a value that cannot be cast to a number, it is ignored. But insert will be successful. For example, no index entry is generated for XL if the index is DECFLOAT type.
- However, for a string index (VARCHAR(n) type), if a node identified by the XMLPATTERN contains a value that is longer than n, then the document will be rejected, or CREATE INDEX will fail. The maximum n is 1000.
- You can create as many indexes as you like on a single XML column. You could even create multiple indexes with different types on the same XMLPATTERN. This provides much flexibility for XML.
- Use fully specified XPath whenever possible to reduce the cost of maintaining the indexes.
- Index leaf nodes or nodes close to leaves. For non-leaf nodes, the key values are concatenated values from descendants, and their usage is a bit weird. For example, index on name for <name><firstname>John<firstname><lastname>Doe<lastname><name> will result in "JohnDoe" as the key, and you have to use [name="JohnDoe"] in queries to qualify for the index-based search. In addition, DB2 9 does not support a node across record boundaries.
To produce consistent query results with or without XML indexes, DB2 tries to tolerate cast errors during XMLEXISTS predicate numeric comparison. For example, if a node size contains "XL", comparison [size > 10] will tolerate the "XL" value, which is equivalent to evaluate to false.
DECFLOAT is used instead of DOUBLE for the numeric index type due to its precision. Date and time are not yet supported, but you can use string indexes if you use ISO format (which is required in XML) without timezone or always use the same timezone for the data, and use string comparison in the queries to search the documents.
If you never search inside XML documents, but get XML data in and out as a whole, you probably don't need to use the XML type, since VARCHAR or VARBINARY or LOB types can serve the purpose.
To summarize, XML indexes use XPath to identify nodes to be indexed, and can be used for queries with XMLEXISTS and XMLTABLE predicates.
-Guogen (Gene) Zhang (GGZ)[Read More
XML Schema is a W3C recommendation to specify "schema"s for XML data, used to put constraints on otherwise extremely flexible XML data. The constraints include basic data types, structures, occurrences, uniqueness, and referential integrity, etc. DB2 9 for z/OS pureXML provides XML schema validation for XML data through a user-defined function (UDF) called DSN_XMLVALIDATE. The following provides some guidelines in using XML schema validation.
- An XML schema is not required for an XML column. An XML column can store any well-formed XML documents, and it contains untyped XML data, even if a document is validated using the DSN_XMLVALIDATE function. In other words, type annotations are not kept after schema validation.
- XML Schema validation can be performed in an application using any schema validation tools, or in DB2 by invoking DSN_XMLVALIDATE. To invoke schema validation in DB2, an XML schema has to be registered in DB2 XML Schema Repository (XSR). XSR consists of a set of DB2 tables storing XML schema info, and a set of stored procedures to manage the schemas. A special JDBC interface is also provided for schema registration.
- Setting up XSR requires Java V5, JDBC and Java stored procedure setup, which requires WLM setup. See the "XSR setup and troubleshooting" article in the DB2 for z/OS pureXML wiki page for references to the setup information and troubleshooting details.
- XSR tables and stored procedures are created during the installation or migration. Please customize XSR related settings by changing the installation and migration jobs DSNTIJSG and DSNTIJNX.
- XML Schema documents (XSDs) have to be provided physically to DB2 for schema registration. During registration, an SQL ID in the form of "SYSXSR.<name>" can be provided to reference the schema. The schema can also be referenced by the target namespace with optional schema location. Schema documents are compiled into a binary format for runtime use. The original schema documents and the binary representations generated at XSR_COMPLETE are stored in XSR table SYSIBM.XSROBJECTS and other XSR tables. Note that string IDs are used for XML target namespaces and schema locations in the tables.
- To validate a document against a schema, invoke DSN_XMLVALIDATE in one of the two forms: two-argument form and three-argument form. The two-argument form takes an instance document and the schema name (SQL ID), while the three-argument form takes an instance document and schema target namespace, with optional schema location. As of this writing, DSN_XMLVALIDATE function has to be wrapped in the XMLPARSE function, and has a 50MB size limit on a document. For example, the following INSERT statement inserts an XML document after validation against a schema named SYSXSR.PURCHASEORDER:
INSERT INTO MYTABLE VALUES(10, XMLPARSE(DOCUMENT DSN_XMLVALIDATE(xmllob, 'SYSXSR.PURCHASEORDER') ) );Validation may cause XML data to change if there are default values and normalization specified in the schema that are applicable to the instance document.
- DB2 invokes XLXP-C high-performance validating parser. However, validation is still 2 to 3 times as expensive as simple parsing in terms of CPU time. Avoid validation in DB2 to save CPU and improve performance if performance is a concern.
- To support XML schema versioning, either use a totally different target namespace for a new schema version (usually for major changes), or use the same target namespace with a different schema location (usually for minor changes). You have to register schema versions with different SQL names in the XSR. Usually target namespace and schema location are also used in instance documents to identify the schema version. DB2 does not have the capability to use the target namespace and schema location to identify a registered schema for validation yet.
- Use Command Line Processor (CLP) or IBM Data Studio to register and manage XML schemas. Either way invokes the underlying XSR stored procedures, and is more convenient than using the native XSR stored procedures. See XML QuickDemo for a simple example of using CLP to register a schema.
Contact us if you have any questions related to XML schema validation.
-Guogen (Gene) Zhang (GGZ)[Read More]