Encoding scenarios for retrieval of XML data with explicit XMLSERIALIZE
The target encoding scheme and application code page can affect data conversion, truncation, and internal encoding during XML data retrieval with an explicit XMLSERIALIZE invocation.
The following examples demonstrate these interactions.
Data loss does not occur during conversion of the XML data to the type that is specified in the XMLSERIALIZE function because the input and output data is Unicode data. Data loss can occur during conversion of the result of the XMLSERIALIZE operation to the application data type. This data loss results in an SQL warning.
- During conversion to the type that is specified in the XMLSERIALIZE
function
This truncation can occur because the size that you specify in the XMLSERIALIZE function for the output data type is too small. This truncation results in an SQL error.
- During conversion of the result of the XMLSERIALIZE operation
to the application data type.
This truncation can occur because the size that you specify for the host variable is too small. This truncation results in an SQL warning.
The following examples discuss only truncation that occurs because the size of a document increases during conversion to the output encoding.
Only scenario 1 and scenario 2 apply to Java™ applications, because the application code page for Java applications is always Unicode.
Scenario 1
Encoding source | Value |
---|---|
Target data encoding | UTF-8 Unicode |
Target application data type | Binary |
Application code page | Not applicable |
SELECT XMLSERIALIZE(XMLCOL AS BLOB(1M) INCLUDING XMLDECLARATION) FROM T1
Character conversion: None.
Data loss: None.
Truncation due to expansion: None.
<?xml version="1.0" encoding="UTF-8" ?>
Scenario 2
Encoding source | Value |
---|---|
Target data encoding | UTF-16 Unicode |
Target application data type | Graphic |
Application code page | CCSID 1208 |
SELECT XMLSERIALIZE(XMLCOL AS DBCLOB(1M) EXCLUDING XMLDECLARATION) FROM T1
Character conversion: Data is converted from UTF-8 to UTF-16.
Data loss: None.
Truncation due to expansion: Truncation can occur during conversion from UTF-8 to UTF-16, due to expansion.
Internal encoding in the textual XML data: None, because EXCLUDING XMLDECLARATION is specified. If INCLUDING XMLDECLARATION is specified, the internal encoding indicates UTF-8 instead of UTF-16. This can result in XML data that cannot be parsed by application processes that rely on the encoding name.
Scenario 3
Encoding source | Value |
---|---|
Target data encoding | ISO-8859-1 data |
Target application data type | Character |
Application code page | 819 |
SELECT XMLSERIALIZE(XMLCOL AS CLOB(1M) EXCLUDING XMLDECLARATION) FROM T1
Character conversion: Data is converted from UTF-8 to CCSID 819.
Data loss: Possible data loss. Some UTF-8 characters cannot be represented in CCSID 819. If a character cannot be represented in CCSID 819, the Db2 database manager inserts a substitution character in the output and issues a warning.
Truncation due to expansion: None.
Internal encoding in the textual XML data: None, because EXCLUDING XMLDECLARATION is specified. If INCLUDING XMLDECLARATION is specified, the database manager adds internal encoding for UTF-8 instead of ISO-8859-1. This can result in XML data that cannot be parsed by application processes that rely on the encoding name.
Scenario 4
Encoding source | Value |
---|---|
Target data encoding | Windows-31J data (superset of Shift_JIS) |
Target application data type | Graphic |
Application code page | 943 |
SELECT XMLSERIALIZE(XMLCOL AS CLOB(1M) EXCLUDING XMLDECLARATION) FROM T1
Character conversion: Data is converted from UTF-8 to CCSID 943.
Data loss: Possible data loss. Some UTF-8 characters cannot be represented in CCSID 943. If a character cannot be represented in CCSID 943, the database manager inserts a substitution character in the output and issues a warning.
Truncation due to expansion: Truncation can occur during conversion from UTF-8 to CCSID 943 due to expansion.
Internal encoding in the textual XML data: None, because EXCLUDING XMLDECLARATION is specified. If INCLUDING XMLDECLARATION is specified, the internal encoding indicates UTF-8 instead of Windows-31J. This can result in XML data that cannot be parsed by application processes that rely on the encoding name.