UTF-8

UTF-8 converts Unicode data through a mathematical algorithm so that UTF-8 uses 8 data bits to encode the data, keeps all ASCII codes from 00 to 7F encoded as themselves, and contains nulls only when they are the intended characters.

For example, the string "ABC" in Unicode is "004100420043"x. However, in UTF-8 it is "414243".

Because UTF-8 allows Unicode data to flow over an 8-bit network without the network needing to know that it is Unicode, UTF-8 is used to store Unicode on several UNIX platforms and is used as the default encoding for most new internet standards.

UTF-8 is used mainly as a direct replacement for older MBCS encodings, which all use 8-bit code units, but it takes some more code to process it. It is a good encoding if 90% of your data is English, because all English letters use only one byte.

The IBM® i operating system supports UTF-8 encoding with CCSID 1208. Beginning with IBM i V5R3, CCSID 1208 is supported in database.