Differences in an XML document after storage and retrieval

After storing an XML document in a Db2® database, the retrieved document might not be exactly the same as the original document. This behavior is defined by the XML and SQL/XML standard and matches that of the Xerces open source XML parser.

Some of the changes to the document occur when the document is stored. Those changes are:

If you execute XMLVALIDATE, the database server:
- Adds default values from the XML schema that is specified in the XMLVALIDATE invocation to the input document
- Strips ignorable whitespace from the input document
If you do not request XML validation, the database server:
- Strips boundary whitespace, if you do not request preservation
- Performs end-of-line normalization, as specified in the XML 1.0 specification
- Performs attribute-value normalization, as specified in the XML 1.0 specification
  This process causes line feed (U+000A) characters in attributes to be replaced with space characters (U+0020).

Additional changes occur when you retrieve the data from an XML column. Those changes are:

If the data has an XML declaration before it is sent to the database server, the XML declaration is not preserved.
With implicit serialization, for CLI and embedded SQL applications, the Db2 database server adds an XML declaration, with the appropriate encoding specification, to the data. For Java™ and .NET applications, the Db2 database server does not add an XML declaration, but if you retrieve the data into a DB2Xml object and use certain methods to retrieve the data from that object, the IBM Data Server Driver for JDBC and SQLJ adds an XML declaration.

If you execute the XMLSERIALIZE function, the Db2 database server adds an XML declaration with an encoding specification for UTF-8 encoding, if you specify the INCLUDING XMLDECLARATION option.
Within the content of a document or in attribute values, certain characters are replaced with their predefined XML entities. Those characters and their predefined entities are:

Character Unicode value Entity representation

AMPERSAND U+0026 &

LESS-THAN SIGN U+003C <

GREATER-THAN SIGN U+003E >

Character	Unicode value	Entity representation
AMPERSAND	U+0026	&
LESS-THAN SIGN	U+003C	<
GREATER-THAN SIGN	U+003E	>

Within attribute values or text values, certain characters are replaced with their numeric representations. Those characters and their numeric representations are:

Character	Unicode value	Entity representation
CHARACTER TABULATION	U+0009
LINE FEED	U+000A
CARRIAGE RETURN	U+000D
NEXT LINE	U+0085
LINE SEPARATOR	U+2028

Within attribute values, the QUOTATION MARK (U+0022) character is replaced with its predefined XML entity ".
If the input document has a DTD declaration, the declaration is not preserved, and no markup based on the DTD is generated.
If the input document contains CDATA sections, those sections are not preserved in the output.