Differences in an XML document after storage and retrieval

When you store an XML document in a table and then retrieve that document, the retrieved document might not be the same as the original document.

This behavior is defined by the XML and SQL/XML standard.

Some of the changes to the document occur when the document is stored. Those changes are:

If you execute DSN_XMLVALIDATE, the database server strips ignorable whitespace from the input document.
If you do not request XML schema validation, the database server:
- Strips boundary whitespace, if you do not request preservation
- Replaces all carriage return and line feed pairs (U+000D and U+000A), or carriage returns (U+000D), within the document with line feeds (U+000A)
- Performs attribute-value normalization, as specified in the XML 1.0 specification
  This process causes line feed (U+000A) characters in attributes to be replaced with space characters (U+0020).

Additional changes occur when you retrieve the data from an XML column. Those changes are:

If the data has an XML declaration before it is sent to the database server, the XML declaration is not preserved.
With implicit serialization, for Db2 ODBC and embedded SQL applications, the Db2 database server adds an XML declaration, with the appropriate encoding specification, to the data. For Java™ and .NET applications, the Db2 database server does not add an XML declaration, but if you retrieve the data into a DB2Xml object and use certain methods to retrieve the data from that object, the IBM® Data Server Driver for JDBC and SQLJ adds an XML declaration.

If you execute the XMLSERIALIZE function, the Db2 database server adds an XML declaration with an encoding specification for UTF-8 encoding, if you specify the INCLUDING XMLDECLARATION option.
Within the content of a document or in attribute values, certain characters are replaced with entity references for their predefined XML entities. Those characters and their predefined entities are:

Character Unicode value Entity reference

AMPERSAND U+0026 &

LESS-THAN SIGN U+003C <

GREATER-THAN SIGN U+003E >

Character	Unicode value	Entity reference
AMPERSAND	U+0026	&
LESS-THAN SIGN	U+003C	<
GREATER-THAN SIGN	U+003E	>

Within attribute values or text values, certain characters are replaced with their character references for their numeric representations. Those characters and their character references are:

Character	Unicode value	Character reference
CHARACTER TABULATION	U+0009
LINE FEED	U+000A
CARRIAGE RETURN	U+000D
NEXT LINE	U+0085
LINE SEPARATOR	U+2028

Within attribute values, the QUOTATION MARK (U+0022) character is replaced with its predefined XML entity reference ".
If the input document has a DTD declaration, the declaration is not preserved, and no markup based on the DTD is generated.
If the input document contains CDATA sections, those sections are not preserved in the output.