User-defined words with multibyte characters

When used in the context of user-defined words, the term multibyte refers to three types of words.

The three types of words are:

  • Words formed of DBCS characters, possibly combined with single-byte characters
  • Words formed of UTF-8 characters that are composed of one or more bytes
  • Words formed of EUC characters that are composed of one or more bytes

These are the rules for forming user-defined words with multibyte characters:

Contained characters
A user-defined word can consist of both single-byte and multibyte characters. If a character exists in both single-byte and multibyte forms, its single-byte and multibyte representations are not equivalent.

The single-byte characters in the user-defined word are limited to the following characters:

  • Latin letters uppercase A through Z
  • Latin letters lowercase a through z
  • digits 0 through 9
  • - (hyphen)
  • _ (underscore)

The single-byte encoded hyphen cannot appear as the first or last character in such words.

The single-byte encoded underscore cannot appear as the first character in such words.

Uppercase and lowercase letters
In COBOL words, each lowercase single-byte encoded character "a" through "z" is considered to be equivalent to its corresponding single-byte encoded uppercase character. Multibyte-encoded uppercase and lowercase letters are not equivalent.
Value range
Valid value ranges for multibyte characters depend on the specific code page being used.
Maximum length
30 bytes. The number of characters that you can specify in 30 bytes varies depending on the source code page and the characters used in the user-defined word.
Continuation
Words formed with multibyte characters cannot be continued across lines.
Use of shift-out and shift-in characters
Applicable only when the dummy shift-out/shift-in (SOSI) compiler option is in effect. See SOSI in the COBOL for Linux® on x86 Programming Guide for details of the SOSI compiler option.