Characters
The most basic and indivisible unit of the COBOL language is the character. The basic character set includes the letters of the Latin alphabet, digits, and special characters.
In the COBOL language, individual characters are joined to form character-strings and separators. Character-strings and separators, then, are used to form the words, literals, phrases, clauses, statements, and sentences that form the language.
For certain language elements, the basic character set is extended with the following character sets, depending on the code page used at compile time:
- The ASCII Double-Byte Character Set (DBCS). DBCS characters occupy 2 adjacent bytes to represent one character. Characters represented in multiple bytes in source code (including DBCS characters) are referred to in this document as multibyte characters. A character-string containing only DBCS characters is also called a DBCS character-string or double-byte character string.
- UTF-8, an encoding form of the Unicode character set. UTF-8 characters occupy one-to-four bytes per character. UTF-8 characters that occupy 2 or more bytes are referred to in this document as multibyte characters.
- Extended UNIX Code (EUC). EUC characters occupy 1 byte to 4 bytes per character (or 1 byte to 3 bytes, depending on the code page). EUC characters that occupy 2 or more bytes are referred to in this document as multibyte characters.
Multibyte characters can be used in forming user-defined words.
The content of alphanumeric literals, comment lines, and comment entries can include any of the characters in the computer's compile-time character set, and can include both single-byte and multibyte characters.
Runtime data can include any characters from the runtime character set of the computer. The runtime character set of the computer can include alphanumeric characters, multibyte characters, and national characters. National characters are represented in UTF-16, a 16-bit encoding form of Unicode.
When the NSYMBOL (NATIONAL) compiler option is in effect, literals identified by the opening delimiter N" or N' are national literals and can contain any single-byte or multibyte characters, or both, that are valid for the compile-time code page. Characters contained in national literals are represented as national characters at run time.
For details, see User-defined words with multibyte characters, DBCS literals, and National literals.
Character | Meaning | Use | Example |
---|---|---|---|
Space | Punctuation character |
|
|
+ | Plus sign | Arithmetic operator |
|
Editing character |
|
||
- | Minus sign or hyphen | Arithmetic operator |
|
Editing character |
|
||
Continuation character |
|
||
COBOL word element |
|
||
* | Asterisk | Arithmetic operator |
|
Editing character |
|
||
Comment character |
|
||
/ | Forward slash or solidus | Arithmetic operator |
|
Editing character |
|
||
Continuation character |
|
||
= | Equal sign | Assignment character |
|
Relation character |
|
||
$ | Currency sign | Editing character |
|
, | Comma | Editing character |
|
Punctuation character |
|
||
; | Semicolon | Punctuation character |
|
.
|
Decimal point or period | Editing character |
|
Punctuation character |
|
||
"
|
Quotation mark | Punctuation character |
|
' |
Apostrophe | Punctuation character |
|
( | Left parenthesis | Punctuation character |
|
) | Right parenthesis | Punctuation character |
|
> | Greater than | Relation character |
|
< | Less than | Relation character |
|
: | Colon | Relation character |
|
_ | Underscore | User-defined word element |
|
A - Z | Alphabet (uppercase) | Alphabetic characters | / |
a - z | Alphabet (lowercase) | Alphabetic characters | / |
0 - 9 | Numeric characters | Numeric characters | / |