Internally encoded XML data
XML data in a binary application data type has internal encoding. With internal encoding, the content of the data determines the encoding. The Db2® database system derives the internal encoding from the document content according to the XML standard.
Internal encoding is derived from three components:
- Unicode Byte Order Mark (BOM)
- A byte sequence that consists of a Unicode character code at the beginning of XML data. The BOM indicates the byte order of the following text. The Db2 database manager recognizes a BOM only for XML data. For XML data that is stored in a non-XML column, the database manager treats a BOM value like any other character or binary value.
- XML declaration
- A processing instruction at the beginning of an XML document. The declaration provides specific details about the remainder of the XML.
- Encoding declaration
- An optional part of the XML declaration that specifies the encoding for the characters in the document.
The Db2 database
manager uses the following procedure to determine the encoding:
- If the data contains a Unicode BOM, the BOM determines the encoding.
The following table lists the BOM types and the resultant data encoding:
Table 1. Byte order marks and resultant document encoding BOM type BOM value Encoding UTF-8 X'EFBBBF' UTF-8 UTF-16 Big Endian X'FEFF' UTF-16 UTF-16 Little Endian X'FFFE' UTF-16 UTF-32 Big Endian X'0000FEFF' UTF-32 UTF-32 Little Endian X'FFFE0000' UTF-32 - If the data contains an XML declaration, the encoding depends
on whether there is an encoding declaration:
- If there is an encoding declaration, the encoding is the value
of the encoding attribute. For example, the encoding is EUC-JP for
XML data with the following XML declaration:
<?xml version="1.0" encoding="EUC-JP"?>
- If there is an encoding declaration and a BOM, the encoding declaration must match the encoding from the BOM. Otherwise, an error occurs.
- If there is no encoding declaration and no BOM, the database manager
determines the encoding from the encoding of the XML declaration:
- If the XML declaration is in single-byte ASCII characters, the encoding of the document is UTF-8.
- If the XML declaration is in double-byte ASCII characters, the encoding of the document is UTF-16.
- If there is an encoding declaration, the encoding is the value
of the encoding attribute. For example, the encoding is EUC-JP for
XML data with the following XML declaration:
- If there is no XML declaration and no BOM, the encoding of the document is UTF-8.