© 2002 International Business Machines Corporation. All rights reserved.
In her previous column, Susan talked about XML standards, making the point that there is a significant increase in the XML features in database management systems such as DB2 for publishing, storing, querying and transforming XML. In this column, Susan answers questions about XML facilities in DB2, providing a whirlwind tour of what's available at the time this interview took place.
DB2DD: Susan, last time you convinced us that database people should be in tune with what's happening in the XML world. What's your advice to database architects and adminstrators who are implementing systems that incoporate XML? Where should they begin?
Susan: The first step is determine whether the application they are working with needs to:
- Publish XML from relational data.
- Manipulate and transform XML.
- Store XML in DB2.
The next step is to investigate the appropriate option in more detail. I'll try to provide pointers to information relating to each option, as shown in Figure 1.
Figure 1. XML life cycle
DB2DD: What is the logical first step?
Susan: Publishing XML from relational data is a popular first step when applications start working with XML in order to transmit the relational data in a common format to another system or application, within an organization or externally.
DB2DD: So what are the choices for publishing XML from relational data in DB2?
Susan: There are a number of options for publishing XML documents from relational data held in DB2 tables including:
- Using the SQL/XML publishing functions.
These are functions that enable you produce a tagged XML document in a CLOB directly from a regular SQL SELECT statement. The SQL/XML publishing functions are available in DB2 V8.1 for UNIX® and Windows®.
- Using the DB2 XML Extender document composition stored procedures. These stored procedures enable you to publish one or more XML documents in a CLOB or table The particular tagging scheme and the shape of the document produced is governed by the contents of an XML file called the Document Access Definition (DAD). The contents of the DAD can be over-ridden at run-time, or indeed a completely new DAD can be provided on the fly. You can use WebSphere® Studio to help you build DADs to define documents you want to publish. An article that describes the process is XML and WebSphere Studio Application Developer -- Part 5: Exploring the RDB to XML Mapping Editor by David Russell.
- Using the DB2 Web Services support through DB2 WORF:
Web Services provide a way for programs to invoke other programs typically on the Web where the input parameters and output results are transmitted as XML. You can use the DADX (Document Access Definition Extension) definitions in DB2 WORF (Web Services Object Runtime Framework for DB2) to annotate the results from SQL SELECT statements and to deliver the generated document as a Web service. With DB2 WORF, you can even publish output parameters and result sets returned from stored procedures as an XML document.
There is information on how to build DADX files using WebSphere Studio in this IBM Systems Journal article entitled Developing XML Web services with WebSphere Studio Application Developer, by Christina Lau and Arthur Ryman.
- Tagging XML yourself or using a tool to help you.
For example WebSphere Studio provides assistance for publishing XML documents from relational data, e.g., as described in the article XML and WebSphere Studio Application Developer -- Part 3: SQL and XML Together by Joan Haggarty.
DB2DD:That's a lot of options! Which of these publishing options do you recommend?
Susan: If you are working with DB2 V8.1 on UNIX or Windows, do try the SQL/XML functions for publishing. We think you'll like them. You may also like to experiment by creating SQL/XML views so you can isolate the application from the particular tagging scheme used. You can also publish results of the SQL/XML publishing functions as Web services, in conjunction with the DB2 Web services support through DADX definitions. The SQL/XML publishing functions tutorial mentioned above includes DADX examples.
DB2DD: Are there any special considerations when publishing documents from relational data?
Susan: Here are a few points to think about:
- Do you want to produce one document or many documents? For example, the DB2 XML Extender document composition stored procedures and DAD permit both possibilities. Make sure you don't start producing massive documents that your tools can't process, such as documents that are gigabytes long.
- Do you want to produce a complete document with XML and encoding declarations? If so, do take care to put the appropriate encoding declaration in, especially if you want a general purpose parser to process the documents you have published. See Appendix B in DB2 XML Extender Administration and Programming for the encoding declarations supported with the XML Extender.
Do you want the documents you publish to be held in memory? In tables? In files? As Web services results? In WebSphere MQ queues? You can choose any of these options through SQL facilities provided by DB2 as shown in Figure 2.
- In memory: The default behaviour is to place the generated document in a CLOB in memory when you use SQL/XML or the DB2 XML Extender stored procedure dxxGenXMLCLOB and dxxRetrieveXMLCLOB.
- In files: You can use the DB2 XML Extender "content" function to place the generated documents (through SQL/XML or the XML Extender) in the local file system.
- In tables: You can combine a SQL INSERT statement with a SQL/XML SELECT to place the generated document in a table. The DB2 XML Extender has stored procedures that place the generated XML documents in tables, including dxxGenXML and dxxRetrieveXML.
- In WebSphere MQ: You can use the SQL WebSphere MQ functions to place documents produced through SQL/XML in WebSphere MQ queues. You can find out about these MQ functions in Volume 1 of the DB2 SQL Reference. The MQ functions have names that start with 'MQ', such as MQPUBLISH and MQSEND. See also Dan Wolfson's article and follow up on the topic.
In addition, the DB2 XML Extender has stored procedures that place generated XML documents in WebSphere MQ: dxxmqGen, dxxmqRetrieve, dxxmqGenClob, and dxxmqRetrieveClob. They are documented in the XML Extender manual and in this article by Morgan Tong and Dan Wolfson.
- As Web services results: As I mentioned earlier, you can generate XML documents with default annotations using the DADX, or you can combine with SQL/XML to produce output with specific XML tags.
Figure 2. Mechanisms for publishing XML
DB2DD: So which of these possibilities do you recommend?
Susan: The output method, memory, Web services or WebSphere MQ, is usually determined by the application requirements. So if you are working with DB2 V8.1 on UNIX or Windows, I would recommend using the SQL/XML functions in the following situations:
- For memory output.
- For file output, in conjunction with DB2 XML Extender content function.
- For relational table output, with SQL INSERT.
- For WebSphere MQ output, in conjunction with the MQ functions in DB2 such as MQSEND.
- For Web services output, in conjunction with DB2 Web services (the DADX).
DB2DD: OK, Lets move on to transforming. Let's say someone has sent me some XML documents - what are my options for transforming or manipulating these documents?
Susan: The DB2 XML Extender provides many functions for manipulating XML through SQL. Provided you enable your database or system to use the DB2 XML Extender, you will have access to these functions. If you are working with XML held in regular CLOBs or VARCHARs, where the functions operate on XMLCLOB type or XMLVARCHAR type, you can cast the arguments to regular CLOB or VARCHAR. Below is a summary list of the SQL functions in categories. You can find out more from the XML Extender documentation.
- Validating: To validate against an XML Schema (SVALIDATE) or DTD (DVALIDATE).
- Transforming: To transform XML through stylesheets, use XSLTransformToClob and XSLTransformToFile. You can transform XML to HTML using these SQL functions.
- Copying: To copy between the file system and SQL data types, use content, XMLVARCHARFromFile, XMLCLOBFromFile, XMLFILEFromVarchar, and XMLFILEFromCLOB.
- Extracting XML fragments: To extract from XML documents using scalar (extractCLOBs) and table (extractCLOB) functions.
- Extracting SQL values: To extract values from XML documents using functions such as extractInteger, extractChar, and so on.
- Updating: To modify element content or attribute values using the update function.
You can use these functions to process XML you've created, received, stored - and I've left out the SQL functions that operate on WebSphere MQ and XML.
DB2DD: That's a lot of functions even with the omissions! Let's move on to storing. What are my options for storing XML in DB2?
Susan: The first decision you need to make is whether you want to store your XML intact with the tags, or to strip off the tags leaving just the content and attribute values. The latter method is known as "shredding," or decomposition. There are various criteria that you can use to determine the most appropriate storage method:
- Reasons for storing intact XML:
- The XML documents already exist and you want to store the intact XML documents for audit purposes.
- You know that XML documents are frequently read but rarely updated, and the users always want to see the original document.
- You know which elements or attributes will be frequently searched and so you want to create XML Extender side tables.
- You want to keep XML documents external to DB2 on a file system through the DB2 XML Extender XMLFILE mechanism.
Reasons for storing shredded XML:
- You want to drive existing relational business systems using the document content.
- You need to update the XML document content often and the update performance is critical.
- You need to process document content with analytical software.
You can combine intact XML storage with shredding; for example, you could store XML insurance claim forms intact as well as drive your existing claims processing system with the XML claim form content. You can do a partial shred if you prefer, if you don't need all the content or if content is repeated.
DB2DD: When I shred a document, does that mean I can no longer reconstitute it?
Susan: Usually when you shred, you lose the white space and line endings: the material between is what database people consider to be the important content. Losing the white space affects the layout of the XML, which means that you may not be able to reconstitute the document precisely the way it looked when you received it. For data-driven XML rather than document-driven XML, that's usually fine.
DB2DD: What storage mechanisms can I use for storing?
Susan: In summary:
- If you want to store intact XML you use regular SQL INSERT statements into SQL CLOBs or into XML Extender XML types: XMLVARCHAR, XMLCLOB, XMLFILE.
- If you want to shred you can use the XML Extender shred stored procedures: dxxShredXML or dxxInsertXML along with a DAD to describe how the shredding should be done. The article by David Russell mentioned earlier shows you how to build a DAD through WebSphere Studio. "XML programming with SQL/XML and XQuery" also includes a high level description for building DADs.
DB2DD: What if the XML has come in through various sources? Can I store that?
Susan: Yes, as with publishing, you can store XML that has come in through a variety of ways, as shown in Figure 3:
- From memory: The default behavior is to store an XML document held in memory into a SQL CLOB or into a DB2 XML Extender XML data type through an SQL INSERT. If you are shredding, then you can use the XML Extender stored procedures dxxShredXML or dxxInsertXML.
- From files: You can use the DB2 XML Extender SQL functions to copy XML held in files into memory and then proceed with storing.
- From tables: You can combine an SQL SELECT statement with an SQL INSERT or with an XML Extender stored procedure to store XML documents that are already held in tables.
- From WebSphere MQ: You can use the SQL WebSphere MQ functions, such as MQREAD, MQREADCLOB, MQRECEIVE, MQRECEIVECLOB, to read documents from WebSphere MQ queues and store the documents intact through SQL INSERT statements. You can find out about these MQ functions in Volume 1 of the DB2 SQL Reference. There are DB2 XML Extender stored procedures that shred XML documents held in WebSphere MQ queues, e.g., dxxmqShredCLOB, dxxmqShredAllCLOB, dxxmqInsertCLOB. These stored procedures are described in the XML Extender documentation.
- From Web services input: For intact XML, you can include an SQL INSERT statement in a DADX and pass the XML document you want to store as an input parameter. Similarly, for shredding XML, you can invoke the XML Extender decomposition stored procedures, passing the document to be shredded as an input parameter through special DADX tag "storeXML."
Figure 3. Mechanisms for storing XML
DB2DD: There are a lot of possibilities. Can you clarify which functionality is available where?
Susan: Table 1 is a summary table of the availability that may help. Of course things will constantly be changing.
Table 1. Feature and platform summary
|Feature||DB2 on UNIX and Windows||DB2 on iSeriesTM||DB2 on zSeriesTM|
|XML Extender MQ functions and stored procedures||V7.2||Planned||Planned|
|MQ functions||V7||Planned||V7 via PTF UQ71197|
|Web Services through DB2 WORF||
V7.2 Web download|
|See the WebSphere product page||Planned|
DB2DD:Are there any special platform-specific issues when publishing, transforming, or storing XML?
Susan: Here are some points to be aware of when working with EBCDIC on the iSeries or zSeries platforms:
- Although most of the common EBCDIC characters have the same code points in the EBCDIC code pages, there are some variant characters in areas that affect XML, e.g., the encoding for exclamation mark "!" has different code points in different EBCDIC code pages. XML is sensitive to the exclamation mark character. For more information on variant characters, see the section entitled Variant Character Detail in the C/C++ Programming Guide. In general you cannot mix and match different EBCDIC code pages when using XML in EBCDIC without performing data conversions, such as between ibm-1047 and ibm-037. In a typical environment, DB2 performs data conversions to and from the appropriate DB2 code page for XML that is passed as a parameter, such as a stored procedure parameter.
- You can consume and generate XML in EBCDIC but be aware of XML encoding declarations and their impact. See Appendix C in the zSeries XML Extender documentation and in Appendix B in the iSeries XML Extender documentation.
Here are some points that apply to zSeries only:
- You must install the appropriate version of the XML Toolkit on your system. Please see info APARs ii13151 and ii13454 for more details and also the XML toolkit home page: http://www.ibm.com/servers/eserver/zseries/software/xml/ for more information
- The line endings [NEL] generated on zSeries in USS (UNIX Systems Services) are not permitted in XML 1.0. You need to use XML 1.1 to work with USS generated or stored documents, that contain line endings (new line characters). The XML Extender and the XML Toolkit do handle USS line endings but other general purpose XML software may not, until XML 1.1. See:
- The Getting Started chapter in the DB2 XML Extender samples and documentation for zSeries assumes USS however DB2 XML applications are not themselves required to use USS. USS is needed primarily for XML Extender internal use, in particular, the XML Extender uses the USS file system, e.g., for overflow occasionally when processing documents. So the HFS file system in USS has to be available.
DB2DD: Are there any hazards when working with XML?
Susan: "Hazard" may be too strong a word, but there are a number of areas that require special attention, including:
- Ensure that you have a good understanding of the size of each XML document your systems will process or generate, to ensure you don't hit any limits or produce documents that are difficult to handle. For example if you are using DOM to process a document, you may find you will run out of memory if the document is large. You may find that you don't have an editor that can view or modify a document that you produce.
- Try out your system with data that is similar to the production data. A rather obvious remark, but we've heard from people who have been surprised at the cost of XML validation or just regular XML parsing.
- Consider how long it will take initially load XML data into your system, whether it is shredded or stored intact. Try loading a subset of data that looks like the production data in terms of XML nesting levels, document size. etc, so you can realistically estimate load time.
- Think carefully about going over 15 nesting levels and or about having lots of duplicate element or attribute names, like a name value pair scheme where all your attributes are called either 'name' or 'value'. It just makes your XML applications harder to debug.
- Think carefully about DTD or schema validation if you don't really need it. Validation can soak up a lot of processor time.
- Try to use Unicode for your XML documents to reduce the number of data conversions that take place (XML parsers typically operate in Unicode).
DB2DD: Do you have any final remarks?
Susan: DB2 is much involved in XML. Significant directions include:
- We participate in and plan to implement he SQL/XML standard as the components of that standard stabilize, as we have already done with the SQL/XML publishing functions.
- We participate in and plan to implement the XQuery standard at the time the standard stabilizes.
- We are placing more XML functionality, such as components of the XML Extender, into the DB2 engine.
- We plan to support a variety of XML sources through various mechanisms including data federation.
All statements regarding IBM's future direction or intent are subject to change without notice, and represent goals and objectives only.
- SQL/XML publishing functions documentation in the SQL Reference
- DB2 Web services:
- DB2 XML Extender:
- IBM Systems Journal articles:
- IBM Systems Journal article:
- IBM Redbooks:
- zSeries resources:
- DB2 XML Extender Documentation for zOS
- XML Toolkit for z/OS
- DB2 for OS/390 and z/OS Powering the World's e-business Solutions, SG24-6257 (Chapter on XML Extender)
Dig deeper into Information management on developerWorks
Experiment with new directions in software development.
Read and subscribe for the best and latest technical info to help you deal with your development challenges.
Software development in the cloud. Register today and get free private projects through 2014.
Evaluate IBM software and solutions, and transform challenges into opportunities.