XML parsing and whitespace handling

During explicit XML parsing, you can control the preservation or stripping of boundary whitespace characters when you store the data in the database.

According to the XML standard, whitespace is space characters (U+0020), carriage returns (U+000D), line feeds (U+000A), or tabs (U+0009) that are in the document to improve readability. When any of these characters appear as part of a text string, they are not considered to be whitespace.

Boundary whitespace is whitespace characters that appear between elements. For example, in the following document, the spaces between <a> and <b> and between </b> and </a> are boundary whitespace.
<a> <b> and between </b> </a>

With explicit invocation of XMLPARSE, you use the STRIP WHITESPACE or PRESERVE WHITESPACE option to control preservation of boundary whitespace. The default is stripping of boundary whitespace.

The XML standard specifies an xml:space attribute that controls the stripping or preservation of whitespace within XML data. Possible values are preserve or default. The Db2 database server ignores any other values. The preserve value causes boundary whitespace within an element to be preserved, regardless of application settings, such as the XMLPARSE whitespace setting. The default value causes application settings to be used for boundary whitespace handling. xml:space attributes override any whitespace settings for implicit or explicit XML parsing, except for end-of-line processing. For end-of-line processing, when a carriage return character and a line feed character appear together, they are replaced with a line feed character. A carriage return character that appears by itself is replaced with a line feed character. These replacements occur, regardless of the xml:space attribute.

For example, in the following document, the spaces immediately before and after <b> are always preserved, regardless of any XML parsing options, because the spaces are within a node with the attribute xml:space="preserve":
<a xml:space="preserve"> <b> <c>c</c>b </b></a>
However, in the following document, the spaces immediately before and after <b> can be controlled by the XML parsing options, because the spaces are within a node with the attribute xml:space="default":
<a xml:space="default"> <b> <c>c</c>b </b></a>