Representation of Wide-Character Literals
Wide-character literals can be represented in various ways. These
representations and how they are handled are described below:
- Trigraphs
- Trigraph characters used as literals
are converted to their corresponding Unicode characters. For example:
wchar_t *wcs = L" ??(";is equivalent to:
For more information about trigraphs, see the ILE C/C++ Language Reference.wchar_t wcs[] = {0x0020, 0x005B, 0x0000}; - Character Escape Codes (\a, \b, \f, \n, \r, \t, \v, \', \", \?)
- Character escape codes are converted to their corresponding
Unicode escape codes. For example:
wchar_t *wcs = L" \t \n";is equivalent to:
wchar_t wcs[] = {0x0020, 0x0009, 0x0020, 0x000A, 0x0000}; - Numeric Escape Codes (\xnnnn, \ooo)
- Numeric escape codes
are not converted to Unicode. Instead, the numeric portion of the
literal is preserved and assumed to be a Unicode character represented
in hexadecimal or octal form. For example:
wchar_t *wcs = L" \x4145";is equivalent to:
Specifyingwchar_t wcs[] = {0x0020, 0x4145, 0x0000};\xnn in a wchar_t string literal is equivalent to specifying\x00nn.Hexadecimal constant values larger than0xFFare normally considered invalid. Setting the *LOCALEUCS2 option changes this to allow 2–byte hexadecimal initialization of wchar_t types only. For example:wchar_t wc = L'\x4145'; /* Valid only with *LOCALEUCS2 option, */ /* otherwise an out of bounds error */ /* will result. */ char c = '\x4145'; /* Not valid due to size restriction. */ /* Error will result with or without */ /* specifying *LOCALEUCS2 option. */Note: Numeric hexadecimal escape codes are not validated other than to ensure type size-limits are not exceeded. - DBCS Characters
- DBCS characters entered as hexadecimal escape sequences are not converted to Unicode. They are stored as received.