Characters

The most basic and indivisible unit of the COBOL language is the character. The basic character set includes the letters of the Latin alphabet, digits, and special characters.

In the COBOL language, individual characters are joined to form character-strings and separators. Character-strings and separators, then, are used to form the words, literals, phrases, clauses, statements, and sentences that form the language.

The basic characters used in forming character-strings and separators in source code are shown in Table 1.

For certain language elements, the basic character set is extended with the following character sets, depending on the code page used at compile time:

  • The ASCII Double-Byte Character Set (DBCS). DBCS characters occupy 2 adjacent bytes to represent one character. Characters represented in multiple bytes in source code (including DBCS characters) are referred to in this document as multibyte characters. A character-string containing only DBCS characters is also called a DBCS character-string or double-byte character string.
  • UTF-8, an encoding form of the Unicode character set. UTF-8 characters occupy one-to-four bytes per character. UTF-8 characters that occupy 2 or more bytes are referred to in this document as multibyte characters.
  • Extended UNIX Code (EUC). EUC characters occupy 1 byte to 4 bytes per character (or 1 byte to 3 bytes, depending on the code page). EUC characters that occupy 2 or more bytes are referred to in this document as multibyte characters.

Multibyte characters can be used in forming user-defined words.

The content of alphanumeric literals, comment lines, and comment entries can include any of the characters in the computer's compile-time character set, and can include both single-byte and multibyte characters.

Runtime data can include any characters from the runtime character set of the computer. The runtime character set of the computer can include alphanumeric characters, multibyte characters, and national characters. National characters are represented in UTF-16, a 16-bit encoding form of Unicode.

When the NSYMBOL (NATIONAL) compiler option is in effect, literals identified by the opening delimiter N" or N' are national literals and can contain any single-byte or multibyte characters, or both, that are valid for the compile-time code page. Characters contained in national literals are represented as national characters at run time.

For details, see User-defined words with multibyte characters, DBCS literals, and National literals.

Table 1. Basic COBOL character set. This table lists basic COBOL character set.
Character Meaning Use Example
  Space Punctuation character
01 WS-A PIC X(10).
+ Plus sign Arithmetic operator
COMPUTE WS-A = WS-B + WS-C.
Editing character
01 WS-A PIC +9(3).
- Minus sign or hyphen Arithmetic operator
COMPUTE WS-A = WS-B - WS-C.
Editing character
01 WS-A PIC -9(3).
Continuation character

  01 WS-VAR  PIC X(27) VALUE 
-      'THIS MULTI-LINE TEXT'. 
COBOL word element
01 WS-A PIC 9(3).
* Asterisk Arithmetic operator
COMPUTE WS-A = WS-B * WS-C.
Editing character
01 WS-A PIC **9.
Comment character
* THIS IS COMMENT LINE. 
/ Forward slash or solidus Arithmetic operator
COMPUTE WS-A = WS-B / WS-C.
Editing character
01 WS-DATE PIC 99/99/99.
Continuation character

/01 WS-VAR  PIC X(27) VALUE 
/     'THIS MULTI-LINE TEXT'. 
= Equal sign Assignment character
COMPUTE WS-A = WS-B / WS-C.
Relation character
IF WS-A = 10
$ Currency sign Editing character
01 WS-DATE PIC $$99.
, Comma Editing character
01 WS-DATE PIC 99,999.
Punctuation character
MOVE 10 TO WS-A, WS-B.
; Semicolon Punctuation character
MOVE 10 TO WS-A; WS-B.
. Decimal point or period Editing character
01 WS-DATE PIC 99.999.
Punctuation character
MOVE 10 TO WS-A, WS-B.
" Quotation mark Punctuation character
01 WS-VAR PIC X(5) VALUE "HELLO".
' Apostrophe Punctuation character
01 WS-VAR PIC X(5) VALUE 'HELLO'.
( Left parenthesis Punctuation character
IF (WS-A = 10) AND (WS-B = 5)
) Right parenthesis Punctuation character
IF (WS-A = 10) AND (WS-B = 5)
> Greater than Relation character
IF WS-A > 10
< Less than Relation character
IF WS-A < 10
: Colon Relation character
MOVE WS-VAR(1:10) TO WS-VAR1.
_ Underscore User-defined word element
01 WS_VAR PIC X(10).
A - Z Alphabet (uppercase) Alphabetic characters /
a - z Alphabet (lowercase) Alphabetic characters /
0 - 9 Numeric characters Numeric characters /