Shredding XML in DB2 with pureXML and XMLTable

Making XML relational

The IBM® DB2® XML processing capability makes it simpler to turn XML data into relational data. The built-in XMLTABLE function provides an easy-to-use and powerful mechanism to turn hierarchical XML data into parent-child relational data. Whether your tools do not support XML, or relational processing is required, XMLTABLE provides a means to bridge the gap.

Robert G. Wison (rwilson0@us.ibm.com), Certified IT Specialist, SWG Center of Excellence, IBM

Rob Wilson photoRob has been a subject-matter expert for DB2 Linux, UNIX, and Windows for more than 10 years. Recently he has been on projects involving pureXML in interesting ways. Rob has also been spending a lot of time with pureScale.



15 November 2012

Also available in Russian

Introduction

DB2 pureXML allows well-formed XML documents to be stored within a relational database in a native format. By using SQL and XQUERY languages, these documents can be manipulated within the database. Without this functionality an application would have to receive the full document, make changes, and return the full document to the database. For large documents or high volumes this can quickly saturate network connections.

The pureXML feature includes support for the SQL functions XMLQUERY, XMLTABLE, and XMLEXISTS. These are used in the SELECT, FROM, and WHERE clauses respectively. The XMLTABLE function allows multiple XPath or XQuery expressions to specify elements of a document that are then accessible as though they were in a relational table. One of the XQuery expressions in the XMLTABLE function acts as the row-generating expression, which means that the XMLTABLE function returns one row for each item returned by this XQuery expression. Using multiple XMLTABLE function calls it is possible to mimic foreign-key relationships that are available to SQL-based tools. For more detailed information on the XMLTABLE function you can read the developerWorks article XMLTABLE by example.

Sometimes transforming XML into relational data is called “shredding”. Using DB2 pureXML it is possible to shred the document without retrieving it from the database and to store the results in relational tables.

Messages

ASDI messages

To explore the decomposition or "shredding" of XML documents, the Aircraft Situation Display to Industry (ASDI) will be used for its schema. ASDI is a well-defined standard that uses XML to disseminate information about air traffic data to members of the aviation industry through data feeds. This information arrives in a stream comprised of multiple messages as shown in Figure 1. There are 11 common messages and these are made up of 81 core elements.

arrivalInformation Messages

One type of message in an ASDI feed is an arrival message. This has a type of arrivalInformation (Figure 2). This complex type is defined as two elements of type qualifiedAircraftIdType (Figure 3) and timeOfArrivalType. QualifiedAircraftIdType is made up of computerIdType (Figure 4), departure location information, and arrival location information.

Figure 1. Block diagram of typical message stream
Block diagram showing multiple arrivalInformation messages in an asdiMessage packet
Figure 2. ArrivalInformationType schema
Diagram of the ArrivalInformationType showing it is made up of qualifiedAircraftIdType and timeOfArrivalType
Figure 3. qualifiedAircraftIdType schema
Diagram of the qualifiedAircraftId Type showing it is made up of flightClass, aircraftIdType, flightComputerIdType, departurePointType, and arrivalPointType
Figure 4. computerIdType schema
Diagram of the computerId Type showing it is made up of facilityIdentifiedType, comptuerSystemIdType, and idNumberType

Decomposing documents

Scenario: An analyst wants to run a few queries to summarize arrival information. Only a subset of the available data is needed. That data will be aggregated for reports. This is a one-off process that will only be run occasionally and with a low priority.

Define important data elements

The application does not need to know about all elements in the document schema. The first step should be to compile a list of the important elements and the XPATH expression to locate them within the XML document. As a next step consider additional elements that may be needed for aggregation or filtering.

Once important elements have been defined, their relational data types can be determined from the XML schema, sample documents, and provided transformations. Closely related elements can then be grouped into tables and related tables similar to the methods used for traditional relational database modeling. It is helpful to have a listing such as in Table 1 that tracks the relational table, column, and data type with the XPATH expression for the source element. If a transformation is required to change how data is returned it can be noted in the listing as well.

Within the ASDI schema computerId has three elements that are strings and can be represented within a relational database as a single table.

Table 1. Table showing computerId mapping from relational column to XPATH expression
TableColumnData TypeXPATH
computeridfacilityIdVARCHAR(4)/adiMessage/arrivalInformation/nxcm:qualifiedAircraftId/nxce:computerId/nxce:facilityIdentifier
computeridcomputerSystemIdVARCHAR(5)/asdiMessage/arrivalInformation/nxcm:qualifiedAircraftId/nxce:computerId/nxce:computerSystemId
computerididNumberVARCHAR(3)/asdiMessage/arrivalInformation/nxcm:qualifiedAircraftId/nxce:computerId/nxce:idNumber
Listing 1. ComputerId Table
CREATE TABLE computerid (
	facilityID       VARCHAR(4),
	computersystemID VARCHAR(5),
	idnumber	 VARCHAR(3)
)
Listing 2. XMLTable Query to populate ComputerId table
INSERT INTO computerid (facilityID, computersystemID, idnumber)
SELECT X.facilityID, X.computersystemID, x.idnumber
FROM XMLTABLE (
    XMLNAMESPACES( 'http://tfm.faa.gov/tfms/NasXCoreElements'	AS "nxce",
                   'http://tfm.faa.gov/tfms/MessageMetaData'	AS "mmd",
                   'http://tfm.faa.gov/tfms/NasXCommonMessages'	AS "nxcm",
                   'http://www.w3.org/2001/XMLSchema-instance'	AS "xsi" ),
  '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation/
  	nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING CAST(? as XML) as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber'
) as X

Note: All other code listings will omit the XMLNAMESPACES declaration for brevity.

Linking elements in hierarchies

Scenario: The analyst now wants to perform more complex analysis across the computerId data and the qualifiedAircraftId data. Because it is possible for multiple computerId entries to exist for a qualifiedAircraftId entry, the entities require their own tables.

When the list of interesting elements is examined, a hierarchy should be apparent. Elements within logical groups or the same XML data type should now be grouped together.

For example, all elements within computerId can be grouped together under qualifiedAircraftId which is grouped under arrivalInformation. DeparturePoint is under qualifiedAircraftId but is unrelated to computerId and so should be grouped separately from the computerId elements. ArrivalPoint should similarly be grouped together but separately from departurePoint and computerId. Since arrivalPoint and departurePoint share all of the same elements they could be stored in a table together and use an additional column to declare which they are.

When these elements are separated into different tables they need a surrogate key to link the parent and children. The function GENERATE_UNIQUE() creates a universally unique identifier (UUID) that can be used as a surrogate key. Each level in the hierarchy that is stored will need to reference the level above it to maintain referential integrity within the document.

The data model illustrated as Figure 5 can be created with the SQL in Listing 3.

Figure 5. QualifiedAircraft and ComputerId Linked by UUIDs schema
Diagram of the schema of the qualifiedAircraft and ComputerId tables showing a foreign-key relationship on the generated UUID
CREATE TABLE computerid (
	parentUUID	 CHAR(13) FOR BIT DATA,
	facilityID	 VARCHAR(4),
	computersystemID VARCHAR(5),
	idnumber	 VARCHAR(3)
);

CREATE TABLE qualifiedAircraft (
	QA_UUID 	CHAR(13) FOR BIT DATA NOT NULL PRIMARY KEY,
	aircraftID	VARCHAR(6),
	flightClass	VARCHAR(16)
);
alter table computerid add  foreign key (parentUUID) 
	references qualifiedAircraft(QA_UUID);

The code in Listing 4 is used to populate the two tables and maintain the foreign key relationship based on the generated UUID.

Listing 4. Complex XMLTable Queries to populate Linked computerId and qualifiedAircraftId
-- The statement terminator character is @
begin atomic
declare QA_UUID CHAR(16) FOR BIT DATA;

SET (QA_UUID) = (
SELECT QA_UUID FROM NEW TABLE (
  INSERT INTO qualifiedaircraft (QA_UUID, aircraftID, flightClass)
  SELECT GENERATE_UNIQUE(), X.aircraftID, X.flightClass
  FROM XMLTABLE (
    '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation/nxcm:qualifiedAircraftId'
	PASSING CAST(? as XML) as "DOC"
    COLUMNS
	  aircraftID	VARCHAR(6)	PATH 'nxce:aircraftId',
	  flightClass	VARCHAR(16)	PATH '@flightClass'
  ) as X
)
);

INSERT INTO computerid (parentUUID, facilityID, computersystemID, idnumber)
SELECT QA_UUID, X.facilityID, X.computersystemID, x.idnumber
FROM XMLTABLE (
  '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation/
  	nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING XMLDOCUMENT(?) as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber'
) as X
;

end@

Listing 4 will properly maintain the foreign key relationship as long as there is a single qualifiedAircraftId entity in the document. If there were multiple computerid entries, each would point back to that qualifiedAircraftId entry. If there were multiple qualifiedAircraftId entities then a loop would have to be defined to perform the computerId inserts. This is demonstrated in Listing 13.

Planning for future decomposition

Scenario: The application owners have appreciated the reports the analyst was producing. They are now requesting more of the XML data be relational. They have not yet settled on which pieces are important but know a subset. They're eager to get started and find out what's missing from their users for the next phase.

Each element of interest within the document can be kept in its XML form for future decomposition (Figure 6). This can be done by defining an additional XML column in the relational table (Listing 5). Storing the elements in this manner can be an efficient way to populate any columns that may be added to the schema in the future.

Figure 6. QualifiedAircraft and ComputerId With Document fragments
Diagram of the schema of the qualifiedAircraft and ComputerId tables showing additiona of XML columns
Listing 5. ComputerId and qualifiedAircraft Tables With UUID and XML columns
CREATE TABLE computerid (
	parentUUID	 CHAR(16) FOR BIT DATA,
	facilityID	 VARCHAR(4),
	computersystemID VARCHAR(5),
	idnumber	 VARCHAR(3),
	doc		 XML
);

CREATE TABLE qualifiedAircraft (
	QA_UUID 	CHAR(16) FOR BIT DATA,
	aircraftID	VARCHAR(6),
	flightClass	VARCHAR(16),
	doc		XML
);

The previous query can be modified to populate the XML column in the two tables by adding a new expression to the COLUMNS clause of the XMLTABLE function. The result is Listing 6.

Listing 6. Complex XMLTable Queries to populate computerId and qualifiedAircraftId tables with XML data
-- The statement terminator character is @
begin atomic
declare V_UUID CHAR(16) FOR BIT DATA;

SET (V_UUID) = (
SELECT QA_UUID FROM NEW TABLE (
  INSERT INTO qualifiedaircraft (QA_UUID, aircraftID, flightClass, doc)
  SELECT GENERATE_UNIQUE(), X.aircraftID, X.flightClass, XMLDOCUMENT(X.doc)
  FROM XMLTABLE (
    '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation/nxcm:qualifiedAircraftId'
	PASSING CAST(? as XML) as "DOC"
    COLUMNS
	  aircraftID	VARCHAR(6)	PATH 'nxce:aircraftId',
	  flightClass	VARCHAR(16)	PATH '@flightClass',
	  DOC		XML		PATH '.'
  ) as X
 )
);

INSERT INTO computerid (parentUUID, facilityID, computersystemID, idnumber, doc)
SELECT QA.QA_UUID, X.facilityID, X.computersystemID, x.idnumber, XMLDOCUMENT(x.doc)
FROM qualifiedaircraft qa, XMLTABLE (
  '$DOC/nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING QA.DOC as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber',
	DOC		 XML		PATH '.'
 ) as X
WHERE QA.QA_UUID = V_UUID
;

end@

This example still uses the UUID generated for qualifiedAircraft and would not work for multiple qualifiedAircraft entities in the same document. An application could populate a temp table or an array with all the generated UUIDs and populate the computerid table in the same manner, substituting the appropriate logic for the V_UUID variable.

Other supporting tables and columns can be added to the schema to allow an application to queue and distribute its decomposition activities. For example, a timestamp column could be added to track the last time the DOC column was decomposed.

Performance considerations

The performance of XMLTABLE depends on what its inputs are. This can mean the size and complexity of the document to parse, the number of columns to return, and the XPATH expressions used to define those columns within the document. Some suggestions to maximize performance:

Use XMLQUERY when referencing higher elements in the hierarchy

When an element in the COLUMNS clause of XMLTABLE is higher in the hierarchy than the document's root reference element (the PATH expression starts with “..”) it can be more efficient to use XMLQUERY instead. This is especially noticeable when the XMLTABLE function results in multiple rows.

Listing 7. A query for the case where the computerID table should also have the qualifiedAircraft XML document.
INSERT INTO computerid (parentUUID, facilityID, computersystemID, idnumber, 
	computerdoc, qualifiedaircraftdoc)
SELECT QA.QA_UUID, X.facilityID, X.computersystemID, x.idnumber, 
	XMLDOCUMENT(x.computerdoc), XMLDOCUMENT(x.qualifiedAircraftDoc)
FROM qualifiedaircraft qa, XMLTABLE (
  '$DOC/nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING QA.DOC as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber',
	computerDOC	 XML		PATH '.',
	qualifiedAircraftDoc	XML	PATH '..'
) as X
WHERE QA.QA_UUID =  V_UUID

The qualifiedAircraftDoc element refers to a value higher in the hierarchy. This query can perform better if this element was moved to the SELECT clause.

Listing 8. For better performance when multiple computerId entries are generated use the statement.
INSERT INTO computerid (parentUUID, facilityID, computersystemID, idnumber, 
	computerdoc, qualifiedaircraftdoc)
SELECT QA.QA_UUID, X.facilityID, X.computersystemID, x.idnumber, 
	XMLDOCUMENT(x.computerdoc) as cdoc , 
XMLQUERY('declare namespace nxcm="http://tfm.faa.gov/tfms/NasXCommonMessages";
	$MDOC/nxcm:qualifiedAircraftId' PASSING QA.DOC as "MDOC") as qdoc
FROM qualifiedaircraft qa, XMLTABLE (
  '$DOC/nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING QA.DOC as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber',
	computerDOC	 XML		PATH '.'
) as X
WHERE QA.QA_UUID =  V_UUID

When generating multiple XML values

If the COLUMNS clause has multiple elements defined as type XML there can be noticeable performance impact. It may be better to combine these elements, break them out more fully, or process them in a different manner that is out of the main stream of processing such as queuing them for later processing.

Additional decomposition strategies

Using views

Scenario: The application is continuing to gain popularity and more data is being incorporated into it. Unfortunately disk is not being added to the system, so storing the new data in addition to the XML documents is no longer possible. You decide to implement views over the XML documents.

In Listing 9 computerId has been implemented as a view based on the qualifiedAircraft table from Listing 5.

Listing 9. ComputerId as a view
create view v_computerid as (
SELECT QA.QA_UUID as parentuuid, X.facilityID, X.computersystemID, x.idnumber, 
	XMLDOCUMENT(x.doc) as doc
FROM qualifiedaircraft qa, XMLTABLE (
  '$DOC/nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING QA.DOC as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber',
	DOC		 XML		PATH '.'
) as X
)

Views can be nested. By using the method of carrying the XML fragments a computerId view can be built on a qualifiedAircraft view.

Figure 7. QualifiedAircraft and ComputerId are views built on ArrivalInformation
Schema diagram of ArrivalInformation, QalifiedAircraft, and ComputerId showing nesting of views
Listing 10. ComputerId and qualifiedAircraft Id as views on arrivalInformation table
CREATE TABLE arrivalInformation (
	AI_UUID		CHAR(16) FOR BIT DATA,
	timeOfArrival	TIMESTAMP,
	timeEstimated	CHAR(1),
	DOC		XML
	);

INSERT INTO arrivalInformation (AI_UUID, timeOfArrival, timeEstimated, DOC)
SELECT generate_unique(), UDFS.ZULU_TO_TIMESTAMP(X.timeofarrival), 
      CASE UPPER(X.timeEstimated) WHEN 'TRUE' then 'T' else 'F' END, XMLDOCUMENT(X.DOC)
  FROM XMLTABLE (
    '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation'
	PASSING CAST(? as XML) as "DOC"
    COLUMNS
	  timeofarrival	VARCHAR(20)	PATH 'nxcm:timeOfArrival',
	  timeEstimated	VARCHAR(6)	PATH 'nxcm:timeOfArrival/@estimated',
	  DOC		XML		PATH '.'
  ) as X
;

create or replace view v_qualifiedaircraft as (
SELECT AI_UUID as parentUUID, GENERATE_UNIQUE() as QA_UUID, X.aircraftID, 
	X.flightClass, X.DOC
FROM arrivalInformation AI, XMLTABLE (
    '$DOC/*:arrivalInformation/nxcm:qualifiedAircraftId'
	PASSING AI.DOC as "DOC"
    COLUMNS
	  aircraftID	VARCHAR(6)	PATH 'nxce:aircraftId',
	  flightClass	VARCHAR(16)	PATH '@flightClass',
	  DOC		XML		PATH '.'
  ) as X
);

create or replace view v_computerid as (
 SELECT VQA.QA_UUID as parentuuid, X.facilityID, X.computersystemID, x.idnumber, 
	XMLDOCUMENT(x.doc) as computer
 FROM v_qualifiedaircraft vqa, XMLTABLE (
  '$DOC/ nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING VQA.DOC as "DOC"
  COLUMNS
	facilityID	 VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	idNumber	 VARCHAR(3)	PATH 'nxce:idNumber',
	DOC		 XML		PATH '.'
 ) as X
)
;

The computerId view is built on the qualifiedAircraft view that is built on the DOC column of the arrivalInformation table.

Listing 11. Joining the elements back together on the generated UUIDs.
select AI.timeOfArrival, AI.timeEstimated, VQA.aircraftID, VCI.facilityId
from arrivalInformation AI, v_qualifiedaircraft VQA, v_computerid VCI
WHERE VQA.parentuuid = AI.AI_UUID
  AND VCI.parentuuid = VQA.QA_UUID
Listing 12. The ZULU_TO_TIMESTAMP function used above to convert the timestamp from the ASDI messages to a DB2 timestamp.
CREATE OR REPLACE FUNCTION UDFS.ZULU_TO_TIMESTAMP ( inStr VARCHAR(22) )
  RETURNS TIMESTAMP LANGUAGE SQL SPECIFIC ZULU_TO_TIMESTAMP
  DETERMINISTIC CALLED ON NULL INPUT
  
   return values (timestamp(substr(inStr,1,10) || '-' || 
   	replace(substr(inStr,12,instr(inStr,'Z')-12),':','.')))

Using stored procedures

Stored procedures are a method to put business logic close to the data in the database. They can perform complex operations that are difficult or impossible in SQL. In this case the stored procedure can take a single XML document as an argument and insert its parts into multiple tables.

Listing 13. Table schema and stored procedure to populate multiple tables from one XML document.
CREATE TABLE arrivalInformation (
	AI_UUID		CHAR(13) FOR BIT DATA,
	timeOfArrival	TIMESTAMP,
	timeEstimated	CHAR(1),
	DOC		XML
	)@

CREATE TABLE qualifiedAircraft (
	parentuuid 	CHAR(13) FOR BIT DATA,
	QA_UUID 	CHAR(13) FOR BIT DATA,
	aircraftID	VARCHAR(6),
	flightClass	VARCHAR(16),
	DOC		XML
)
@
CREATE TABLE computerid (
	parentUUID	 CHAR(13) FOR BIT DATA,
	facilityID	 VARCHAR(4),
	computersystemID VARCHAR(5),
	idnumber	 VARCHAR(3),
	DOC		 XML
)
@


CREATE OR REPLACE PROCEDURE shred_flightPlanInfo (IN cFlightPlanInfo XML)
	SPECIFIC SHRED_FLIGHTPLANINFO
	LANGUAGE SQL
main: BEGIN

	DECLARE v_AI_uuid, v_QA_uuid	CHAR(13) for bit data;

	-- Loop over the ArrivalInformation

    FOR v_AI AS select UDFS.ZULU_TO_TIMESTAMP(X.timeofarrival) timeofarrival, 
	CASE UPPER(X.timeEstimated) WHEN 'TRUE' then 'T' else 'F' END timeestimated, 
	XMLDOCUMENT(X.DOC) DOC
	  FROM XMLTABLE (
	    '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation'
		PASSING cFlightPlanInfo as "DOC"
	    COLUMNS
		timeofarrival	VARCHAR(20)	PATH 'nxcm:timeOfArrival',
		timeEstimated	VARCHAR(6)	PATH 'nxcm:timeOfArrival/@estimated',
		DOC		XML		PATH '.'
		  ) as X
	DO
		SET v_AI_uuid = GENERATE_UNIQUE();
		INSERT INTO arrivalInformation (AI_UUID, timeOfArrival, 
			timeEstimated, doc) 
		  values (v_AI_uuid,
		  	  v_AI.timeOfArrival,
		  	  v_AI.timeEstimated,
		  	  v_AI.DOC );
		
		FOR v_QA AS SELECT GENERATE_UNIQUE() as QA_UUID, X.aircraftID, 
				X.flightClass, XMLDOCUMENT(X.DOC) DOC
			FROM XMLTABLE (
		'$DOC/*:arrivalInformation/nxcm:qualifiedAircraftId'
		PASSING v_AI.DOC as "DOC"
		COLUMNS
		  aircraftID	VARCHAR(6)	PATH 'nxce:aircraftId',
		  flightClass	VARCHAR(16)	PATH '@flightClass',
		  DOC		XML		PATH '.'
			  ) as X
		DO
			SET v_QA_uuid = GENERATE_UNIQUE();
			INSERT INTO qualifiedAircraft (parentUUID, QA_UUID, 
					aircraftID, flightClass, doc) 
				values (v_AI_uuid,
					v_QA_uuid,
					v_QA.aircraftID,
					v_QA.flightClass,
					v_QA.DOC );
			
		  INSERT INTO computerid (parentUUID, facilityID, computersystemid, 
		    idnumber, doc) 
		  SELECT v_QA_uuid, X.facilityID, X.computersystemID, x.idnumber, 
			XMLDOCUMENT(x.doc) as computer
		  FROM v_qualifiedaircraft vqa, XMLTABLE (
		    '$DOC/nxcm:qualifiedAircraftId/nxce:computerId'
			PASSING v_QA.DOC as "DOC"
			COLUMNS
			  facilityID	   VARCHAR(4)	PATH 'nxce:facilityIdentifier',
			  computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
			  idNumber	   VARCHAR(3)	PATH 'nxce:idNumber',
			  DOC		   XML		PATH '.'
			) as X;
		END FOR;
	END FOR;
	COMMIT;
END main 
@

Using triggers

If the ASDI schema was fully created in a relational database it would be apparent that multiple messages write to the same objects. Since an object decomposes the same regardless of its source the process could be handled as a set of triggers.

It is possible to define triggers on the relational tables to cause a cascading effect when a new row is inserted. There is currently a limitation on using the XML data type in transition variables within a trigger but that can be worked around by joining with the row that was just inserted using the generated ID in an AFTER trigger.

Listing 14. Trigger DDL using the schema from the stored procedure.
CREATE OR REPLACE TRIGGER trig_ins_qa AFTER INSERT ON qualifiedAircraft
REFERENCING NEW as NEWROW FOR EACH ROW
  INSERT INTO computerid (parentUUID, facilityID, computersystemID, idnumber, DOC)
    SELECT NEWROW.QA_UUID, X.facilityID, X.computersystemID, x.idnumber, 
    	XMLDOCUMENT(X.DOC)
    FROM qualifiedAircraft QA, XMLTABLE (
         '$DOC/nxcm:qualifiedAircraftId/nxce:computerId'
	PASSING QA.DOC as "DOC"
	COLUMNS
	 facilityID	  VARCHAR(4)	PATH 'nxce:facilityIdentifier',
	 computerSystemId VARCHAR(5)	PATH 'nxce:computerSystemId',
	 idNumber	  VARCHAR(3)	PATH 'nxce:idNumber',
	 DOC		  XML		PATH '.'
    ) as X
    WHERE QA.QA_UUID = NEWROW.QA_UUID
;

CREATE OR REPLACE TRIGGER trig_ins_ai AFTER INSERT ON arrivalInformation
REFERENCING NEW as NEWROW FOR EACH ROW
  INSERT INTO qualifiedAircraft (parentUUID, QA_UUID, aircraftId, flightClass, DOC)
   SELECT NEWROW.AI_UUID, GENERATE_UNIQUE(), X.aircraftId, X.flightClass, 
  	XMLDOCUMENT(X.DOC)
   FROM arrivalinformation AI, XMLTABLE (
	  '$DOC/*:arrivalInformation/nxcm:qualifiedAircraftId'
	PASSING AI.DOC as "DOC"
	COLUMNS
	 aircraftID	VARCHAR(6)	PATH 'nxce:aircraftId',
	 flightClass	VARCHAR(16)	PATH '@flightClass',
	 DOC		XML		PATH '.'
    ) as X
    WHERE AI.AI_UUID = NEWROW.AI_UUID
;

insert into arrivalInformation (AI_UUID, timeOfArrival, timeEstimated, DOC)
SELECT generate_unique(), UDFS.ZULU_TO_TIMESTAMP(X.timeOfArrival), 
    CASE UPPER(X.timeEstimated) WHEN 'TRUE' then 'T' else 'F' END, 
    XMLDOCUMENT(X.DOC)
FROM XMLTABLE (
	 '$DOC/*:asdiOutput/*:asdiMessage/*:arrivalInformation'
	PASSING CAST(? as XML) as "DOC"
	COLUMNS
	  timeofarrival	VARCHAR(20)	PATH 'nxcm:timeOfArrival',
	  timeEstimated	VARCHAR(6)	PATH 'nxcm:timeOfArrival/@estimated',
	  DOC		XML		PATH '.'
) as X

Summary

This article has provided some methods to turn XML data into relational data. A complex but well-defined document schema was used with the goal of illustrating the handling of multiple hierarchies and repeating elements. The article highlighted some items to watch out for such as repeating elements, relational hierarchies, and future considerations.

Appendix A – XML document used

Listing 15. An example arrivalInformation document fragment used for testing.
<?xml version="1.0" encoding="UTF-8"?><asdiOutput 
	xmlns="http://tfm.faa.gov/tfms/TFMS_XIS" 
	xmlns:nxce="http://tfm.faa.gov/tfms/NasXCoreElements"
	xmlns:mmd="http://tfm.faa.gov/tfms/MessageMetaData"
	xmlns:nxcm="http://tfm.faa.gov/tfms/NasXCommonMessages"
	xmlns:idr="http://tfm.faa.gov/tfms/TFMS_IDRS" 
	xmlns:xis="http://tfm.faa.gov/tfms/TFMS_XIS" 
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://tfm.faa.gov/tfms/TFMS_XIS 
	http://localhost:58489/tfms/schema/TFMS_XIS.xsd" 
	timestamp="2009-09-21T12:57:19Z">
  <asdiMessage sourceFacility="KZSE" sourceTimeStamp="2009-09-21T12:57:07Z" trigger="AZ">
    <arrivalInformation>
      <nxcm:qualifiedAircraftId>
        <nxce:aircraftId>QXE483</nxce:aircraftId>
        <nxce:computerId>
          <nxce:facilityIdentifier>KZSE</nxce:facilityIdentifier>
        </nxce:computerId>
        <nxce:departurePoint>
          <nxce:fix>
            <nxce:namedFix>KBLI</nxce:namedFix>
          </nxce:fix>
        </nxce:departurePoint>
        <nxce:arrivalPoint>
          <nxce:fix>
            <nxce:namedFix>KSEA</nxce:namedFix>
          </nxce:fix>
        </nxce:arrivalPoint>
      </nxcm:qualifiedAircraftId>
      <nxcm:timeOfArrival estimated="true">2009-09-21T12:54:00Z</nxcm:timeOfArrival>
    </arrivalInformation>
  </asdiMessage>
</asdiOutput>

Appendix B – Modified XML Document

Listing 16. A test document with multiple computerId entries to test looping logic.
<?xml version="1.0" encoding="UTF-8"?><asdiOutput 
	xmlns="http://tfm.faa.gov/tfms/TFMS_XIS" 
	xmlns:nxce="http://tfm.faa.gov/tfms/NasXCoreElements" 
	xmlns:mmd="http://tfm.faa.gov/tfms/MessageMetaData" 
	xmlns:nxcm="http://tfm.faa.gov/tfms/NasXCommonMessages" 
	xmlns:idr="http://tfm.faa.gov/tfms/TFMS_IDRS" 
	xmlns:xis="http://tfm.faa.gov/tfms/TFMS_XIS" 
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xsi:schemaLocation="http://tfm.faa.gov/tfms/TFMS_XIS 
	http://localhost:58489/tfms/schema/TFMS_XIS.xsd" 
	timestamp="2009-09-21T12:57:19Z">
  <asdiMessage sourceFacility="KZSE" 
  	sourceTimeStamp="2009-09-21T12:57:07Z" trigger="AZ">
    <arrivalInformation>
      <nxcm:qualifiedAircraftId>
        <nxce:aircraftId>QXE483</nxce:aircraftId>
        <nxce:computerId>
          <nxce:facilityIdentifier>KZSE</nxce:facilityIdentifier>
        </nxce:computerId>
        <nxce:computerId>
          <nxce:facilityIdentifier>RGW1</nxce:facilityIdentifier>
        </nxce:computerId>
        <nxce:computerId>
          <nxce:facilityIdentifier>JAW1</nxce:facilityIdentifier>
        </nxce:computerId>
        <nxce:departurePoint>
          <nxce:fix>
            <nxce:namedFix>KBLI</nxce:namedFix>
          </nxce:fix>
        </nxce:departurePoint>
        <nxce:arrivalPoint>
          <nxce:fix>
            <nxce:namedFix>KSEA</nxce:namedFix>
          </nxce:fix>
        </nxce:arrivalPoint>
      </nxcm:qualifiedAircraftId>
      <nxcm:timeOfArrival 
      	estimated="true">2009-09-21T12:54:00Z</nxcm:timeOfArrival>
    </arrivalInformation>
  </asdiMessage>
</asdiOutput>

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=845655
ArticleTitle=Shredding XML in DB2 with pureXML and XMLTable
publish-date=11152012