Encoding scenarios for retrieval of XML data with explicit XMLSERIALIZE
Examples demonstrate how the target encoding and application code page affect data conversion, truncation, and internal encoding during XML data retrieval with an explicit XMLSERIALIZE invocation.
Only scenario 1 and scenario 2 apply to Java™ and .NET applications, because the application code page for Java applications is always Unicode.
Scenario 1
Encoding source | Value |
---|---|
Target data encoding | UTF-8 Unicode |
Target application data type | Binary |
Application code page | Not applicable |
SELECT XMLSERIALIZE(XMLCOL AS BLOB(1M) INCLUDING XMLDECLARATION) FROM T1
Character conversion: None.
Data loss: None.
Truncation: None.
<?xml version="1.0" encoding="UTF-8" ?>
Scenario 2
Encoding source | Value |
---|---|
Target data encoding | UTF-16 Unicode |
Target application data type | Graphic |
Application code page | Any SBCS code page or CCSID 1208 |
SELECT XMLSERIALIZE(XMLCOL AS CLOB(1M) EXCLUDING XMLDECLARATION) FROM T1
Character conversion: Data is converted from UTF-8 to UTF-16.
Data loss: None.
Truncation: Truncation can occur during conversion from UTF-8 to UTF-16, due to expansion.
Internal encoding in the serialized data: None, because EXCLUDING XMLDECLARATION is specified. If INCLUDING XMLDECLARATION is specified, the internal encoding indicates UTF-8 instead of UTF-16. This can result in XML data that cannot be parsed by application processes that rely on the encoding name.
Scenario 3
Encoding source | Value |
---|---|
Target data encoding | ISO-8859-1 data |
Target application data type | Character |
Application code page | 819 |
SELECT XMLSERIALIZE(XMLCOL AS CLOB(1M) EXCLUDING XMLDECLARATION) FROM T1
Character conversion: Data is converted from UTF-8 to CCSID 819.
Data loss: Possible data loss. Some UTF-8 characters cannot be represented in CCSID 819. If a character cannot be represented in CCSID 819, the Db2® database manager inserts a substitution character in the output and issues a warning.
Truncation: None.
Internal encoding in the serialized data: None, because EXCLUDING XMLDECLARATION is specified. If INCLUDING XMLDECLARATION is specified, the database manager adds internal encoding for UTF-8 instead of ISO-8859-1. This can result in XML data that cannot be parsed by application processes that rely on the encoding name.
Scenario 4
Encoding source | Value |
---|---|
Target data encoding | Windows-31J data (superset of Shift_JIS) |
Target application data type | Graphic |
Application code page | 943 |
SELECT XMLSERIALIZE(XMLCOL AS CLOB(1M) EXCLUDING XMLDECLARATION) FROM T1
Character conversion: Data is converted from UTF-8 to CCSID 943.
Data loss: Possible data loss. Some UTF-8 characters cannot be represented in CCSID 943. If a character cannot be represented in CCSID 943, the database manager inserts a substitution character in the output and issues a warning.
Truncation: Truncation can occur during conversion from UTF-8 to CCSID 943 due to expansion.
Internal encoding in the serialized data: None, because EXCLUDING XMLDECLARATION is specified. If INCLUDING XMLDECLARATION is specified, the internal encoding indicates UTF-8 instead of Windows-31J. This can result in XML data that cannot be parsed by application processes that rely on the encoding name.