Differences in an XML document after storage and retrieval
After storing an XML document in a Db2® database, the retrieved document might not be exactly the same as the original document. This behavior is defined by the XML and SQL/XML standard and matches that of the Xerces open source XML parser.
- If you execute XMLVALIDATE, the database server:
- Adds default values from the XML schema that is specified in the XMLVALIDATE invocation to the input document
- Strips ignorable whitespace from the input document
- If you do not request XML validation, the database server:
- Strips boundary whitespace, if you do not request preservation
- Performs end-of-line normalization, as specified in the XML 1.0 specification
- Performs attribute-value normalization, as specified in the XML
1.0 specification
This process causes line feed (U+000A) characters in attributes to be replaced with space characters (U+0020).
- If the data has an XML declaration before it is sent to the database server, the XML declaration
is not preserved.
With implicit serialization, for CLI and embedded SQL applications, the Db2 database server adds an XML declaration, with the appropriate encoding specification, to the data. For Java™ and .NET applications, the Db2 database server does not add an XML declaration, but if you retrieve the data into a
DB2Xml
object and use certain methods to retrieve the data from that object, the IBM® Data Server Driver for JDBC and SQLJ adds an XML declaration.If you execute the XMLSERIALIZE function, the Db2 database server adds an XML declaration with an encoding specification for UTF-8 encoding, if you specify the INCLUDING XMLDECLARATION option.
- Within the content of a document or in attribute values, certain
characters are replaced with their predefined XML entities. Those
characters and their predefined entities are:
Character Unicode value Entity representation AMPERSAND U+0026 & LESS-THAN SIGN U+003C < GREATER-THAN SIGN U+003E > - Within attribute values or text values, certain characters are
replaced with their numeric representations. Those characters and
their numeric representations are:
Character Unicode value Entity representation CHARACTER TABULATION U+0009 	 LINE FEED U+000A 
 CARRIAGE RETURN U+000D 
 NEXT LINE U+0085 … LINE SEPARATOR U+2028 
 - Within attribute values, the QUOTATION MARK (U+0022) character is replaced with its predefined XML entity ".
- If the input document has a DTD declaration, the declaration is not preserved, and no markup based on the DTD is generated.
- If the input document contains CDATA sections, those sections are not preserved in the output.