User-defined words with multibyte characters
When used in the context of user-defined words, the term multibyte refers to three types of words.
The three types of words are:
- Words formed of DBCS characters, possibly combined with single-byte characters
- Words formed of UTF-8 characters that are composed of one or more bytes
- Words formed of EUC characters that are composed of one or more bytes
These are the rules for forming user-defined words with multibyte characters:
- Contained characters
-
A user-defined word can consist of both
single-byte and multibyte characters. If a character exists in both
single-byte and multibyte forms, its single-byte and multibyte representations
are not equivalent.
The single-byte characters in the user-defined word are limited to the following characters:
- Latin letters uppercase A through Z
- Latin letters lowercase a through z
- digits 0 through 9
- - (hyphen)
- _ (underscore)
The single-byte encoded hyphen cannot appear as the first or last character in such words.
The single-byte encoded underscore cannot appear as the first character in such words.
- Uppercase and lowercase letters
- In COBOL words, each lowercase single-byte encoded character "a" through "z" is considered to be equivalent to its corresponding single-byte encoded uppercase character. Multibyte-encoded uppercase and lowercase letters are not equivalent.
- Value range
- Valid value ranges for multibyte characters depend on the specific code page being used.
- Maximum length
- 30 bytes. The number of characters that you can specify in 30 bytes varies depending on the source code page and the characters used in the user-defined word.
- Continuation
- Words formed with multibyte characters cannot be continued across lines.
- Use of shift-out and shift-in characters
- Applicable only when the dummy shift-out/shift-in (SOSI) compiler option is in effect. See SOSI in the COBOL for Linux® on x86 Programming Guide for details of the SOSI compiler option.