Unicode
The Unicode Standard provides a single character set that covers the languages of the world, and a small number of machine-friendly encoding forms and schemes to fit the needs of existing applications and protocols. It is designed for best interoperability with both ASCII and ISO-8859-1, the most widely used character sets, to make it easier for Unicode to be used in applications and protocols.
A special number is assigned to every single character. You can refer to The Unicode Consortium to find out more about Unicode. You can find the code charts, which show you which number is assigned to which character within the Unicode standard on the Unicode Character Code Charts. If you prefer to have a list of all available characters on your workstation so that you do not have to be online every time you want to find a specific character in the Unicode code page, you can download the Unibook Character Browser.
Currently, the following three forms of Unicode encoding are supported:
- UTF8:
-
Unicode Transformation Format in 8 bits that uses 1 to 4 bytes, depending on the character.
- UTF16:
-
Unicode Transformation Format in 16 bits that uses either 2 or 4 bytes to represent a character.
- UTF32:
-
Unicode Transformation Format in 32 bits.