Character sets and code pages

Even with the same encoding scheme, different CCSIDs exist, and the same code point can represent a different character in different CCSIDs. Furthermore, a byte in a character string does not necessarily represent a character from a single-byte character set (SBCS).

The following figure shows how a typical character set might map to different code points in two different code pages.

Figure 1. Code page mappings for character set ss1 in ASCII and EBCDIC
Begin figure description. A table shows an ASCII code page and another table shows an EBCDIC code page. Character set ss1 maps to different code points for each code page. End figure description.
For Unicode, there is only one CCSID for UTF-8 and only one CCSID for UTF-16. The following figure shows how the first 127 single code points for UTF-8 are the same as ASCII with a CCSID of 367. For example, in both UTF-8 and ASCII CCSID 367, an A is X'41' and a 1 is X'31'.
Figure 2. Code point mapping for the first 127 code points for UTF-8 single-byte characters (CCSID 1208)
Begin figure description. A table that shows the code point mapping for the first 127 code points for UTF-8. End figure description.
The following figure shows a comparison of how some UTF-16 and UTF-8 code points map to some sample characters. The character for the eighth note musical symbol takes two 2 byte code points because it is a supplementary character.
Figure 3. A comparison of how some UTF-8 and UTF-16 code points map to some sample characters
Begin figure description. A table that shows the UTF-8 and UTF-16 code points for four sample characters. The additional description contains a literal description of the table.