XML parsing and DTDs

If the input data contains an internal document type declaration (DTD), the XML parsing process also checks the syntax of those DTDs.

In addition, the parsing process:
  • Applies default values that are defined by the internal DTDs
  • Expands entity references
Example: implicit XML parsing File c8.xml contains the following document:
<customerinfo xml:space="preserve" xmlns="http://posample.org" Cid='1008'>
  <name>Kathy Smith</name>
  <addr country='Canada'>
  <street>14 Rosewood</street>
  <city>Toronto</city>
  <prov-state>Ontario</prov-state>
  <pcode-zip>M6W 1E6</pcode-zip>
  </addr>
  <phone type='work'>416-555-3333</phone>
</customerinfo>
In a JDBC application, read the XML document from the file, and insert the data into XML column Info of table MYCUSTOMER, which is a copy of the sample Customer table. Let the Db2 database server perform an implicit XML parse operation.
PreparedStatement insertStmt = null;
String sqls = null;
int cid = 1008;
sqls = "INSERT INTO MYCUSTOMER (Cid, Info) VALUES (?, ?)";
insertStmt = conn.prepareStatement(sqls);
insertStmt.setInt(1, cid);
File file = new File("c8.xml");
insertStmt.setBinaryStream(2, new FileInputStream(file), (int)file.length());
insertStmt.executeUpdate();
No whitespace handling is specified, so the default behavior of stripping whitespace is assumed. However, the document contains the xml:space="preserve" attribute, so whitespace is preserved. This means that, after end-of-line processing, the line feeds and spaces between the elements in the document remain.
If you retrieve the stored data, content looks like this:
<customerinfo xml:space="preserve" xmlns="http://posample.org" Cid='1008'>
  <name>Kathy Smith</name>
  <addr country='Canada'>
  <street>14 Rosewood</street>
  <city>Toronto</city>
  <prov-state>Ontario</prov-state>
  <pcode-zip>M6W 1E6</pcode-zip>
  </addr>
  <phone type='work'>416-555-3333</phone>
</customerinfo>
Example: explicit XML parsing Assume that the following document is in BLOB host variable blob_hostvar.
<customerinfo xml:space="default" xmlns="http://posample.org" Cid='1009'>
  <name>Kathy Smith</name>
  <addr country='Canada'>
  <street>15 Rosewood</street>
  <city>Toronto</city>
  <prov-state>Ontario</prov-state>
  <pcode-zip>M6W 1E6</pcode-zip>
  </addr>
  <phone type='work'>416-555-4444</phone>
</customerinfo>
In a static embedded C application, insert the document from the host variable into XML column Info of table MYCUSTOMER. The host variable is not an XML type, so you can execute XMLPARSE explicitly. Specify STRIP WHITESPACE to remove any boundary whitespace.
EXEC SQL BEGIN DECLARE SECTION;
   SQL TYPE BLOB (10K) blob_hostvar;
EXEC SQL END DECLARE SECTION;
…
EXEC SQL INSERT INTO MYCUSTOMER (Cid, Info) 
  VALUES (1009,
  XMLPARSE(DOCUMENT :blob_hostvar STRIP WHITESPACE));
The document contains the xml:space="default" attribute, so the XMLPARSE specification of STRIP WHITESPACE controls whitespace handling. This means that the carriage returns, line feeds, and spaces between the elements in the document are removed.
If you retrieve the stored data, you see a single line with the following content:
<customerinfo xml:space="default" xmlns="http://posample.org" Cid='1009'>
<name>Kathy Smith</name><addr country='Canada'><street>15 Rosewood</street>
<city>Toronto</city><prov-state>Ontario</prov-state><pcode-zip>M6W 1E6</pcode-zip>
</addr><phone type='work'>416-555-4444</phone></customerinfo>
Example: parsing of a document with an internal DTD In a C language application, host variable clob_hostvar contains the following document, which contains an internal DTD:
<!DOCTYPE prod [<!ELEMENT description (name,details,price,weight)> 
   <!ELEMENT name (#PCDATA)> 
   <!ELEMENT details (#PCDATA)> 
   <!ELEMENT price (#PCDATA)> 
   <!ELEMENT weight (#PCDATA)> 
   <!ENTITY desc  "Anvil">
]>
<product xmlns="http://posample.org" pid=''110-100-01'' >
 <description>
 <name>&desc;</name>
 <details>Very heavy</details>
 <price>         9.99            </price>
 <weight>1 kg</weight>
 </description>
</product>
Insert the data into table MyProduct, which is a copy of the sample Product table:
EXEC SQL BEGIN DECLARE SECTION;
   SQL TYPE CLOB (10K) clob_hostvar;
EXEC SQL END DECLARE SECTION;
…
EXEC SQL insert into 
  MyProduct ( pid, name, Price, PromoPrice, PromoStart, PromoEnd, description ) 
  values ( '110-100-01','Anvil', 9.99, 7.99, '11/02/2004','12/02/2004', 
  XMLPARSE ( DOCUMENT :clob_hostvar STRIP WHITESPACE ));
XMLPARSE specifies stripping of whitespace, so boundary whitespace within the document is removed. In addition, when the database server executes XMLPARSE, it replaces the entity reference &desc; with its value.
If you retrieve the stored data, you see a single line with the following content:
<product xmlns="http://posample.org" pid="110-100-01"><description><name>Anvil
</name><details>Very heavy</details><price>         9.99            </price>
<weight>1 kg</weight></description></product>