Differences in an XML document after storage and retrieval
When you store an XML document in a table and then retrieve that document, the retrieved document might not be the same as the original document.
This behavior is defined by the XML and SQL/XML standard.
- If you execute DSN_XMLVALIDATE, the database server strips ignorable whitespace from the input document.
- If you do not request XML schema validation, the database server:
- Strips boundary whitespace, if you do not request preservation
- Replaces all carriage return and line feed pairs (U+000D and U+000A), or carriage returns (U+000D), within the document with line feeds (U+000A)
- Performs attribute-value normalization, as specified in the XML
1.0 specification
This process causes line feed (U+000A) characters in attributes to be replaced with space characters (U+0020).
- If the data has an XML declaration before it is sent to the database
server, the XML declaration is not preserved.
With implicit serialization, for Db2 ODBC and embedded SQL applications, the Db2 database server adds an XML declaration, with the appropriate encoding specification, to the data. For Java™ and .NET applications, the Db2 database server does not add an XML declaration, but if you retrieve the data into a
DB2Xmlobject and use certain methods to retrieve the data from that object, the IBM® Data Server Driver for JDBC and SQLJ adds an XML declaration.If you execute the XMLSERIALIZE function, the Db2 database server adds an XML declaration with an encoding specification for UTF-8 encoding, if you specify the INCLUDING XMLDECLARATION option.
- Within the content of a document or in attribute values, certain
characters are replaced with entity references for their predefined
XML entities. Those characters and their predefined entities are:
Character Unicode value Entity reference AMPERSAND U+0026 & LESS-THAN SIGN U+003C < GREATER-THAN SIGN U+003E > - Within attribute values or text values, certain characters are
replaced with their character references for their numeric representations.
Those characters and their character references are:
Character Unicode value Character reference CHARACTER TABULATION U+0009 	 LINE FEED U+000A 
 CARRIAGE RETURN U+000D 
 NEXT LINE U+0085 … LINE SEPARATOR U+2028 
 - Within attribute values, the
QUOTATION MARK(U+0022) character is replaced with its predefined XML entity reference". - If the input document has a DTD declaration, the declaration is not preserved, and no markup based on the DTD is generated.
- If the input document contains CDATA sections, those sections are not preserved in the output.