Structure and general syntax

It is advisable to start REXX programs with a comment. A REXX program is built from a series of clauses.

REXX program structure

REXX programs are recommended to start with a comment. REXX for CICS® does not require that REXX programs start with a comment. However, for portability reasons, it is advisable to start each REXX program with a comment that begins on the first line and includes the word REXX, as shown in the following example.

Figure 1. Example of using the REXX program identifier

/* REXX program */
   ...
   ...
   ...
EXIT

A REXX program is built from a series of clauses that are composed of:

Zero or more blanks (which are ignored)
A sequence of tokens
Zero or more blanks (which are ignored)
A semicolon (;) delimiter that can be implied by line-end, certain keywords, or the colon (:)

Conceptually, each clause is scanned from left to right before processing, and the tokens that compose the clause are identified. At this stage, instruction keywords are recognized, comments are removed, and multiple blanks (except in literal strings) are converted to single blanks. Blanks adjacent to operator characters and special characters are also removed.

Implementation maximum: The length of a clause cannot exceed 16K.

Characters

A character is a member of a defined set of elements that is used for the control or representation of data.

You can usually enter a character with a single keystroke. The coded representation of a character is its representation in digital form. A character, the letter A, for example, differs from its coded representation or encoding. Various coded character sets (such as ASCII and EBCDIC) use different encodings for the letter A (decimal values 65 and 193, respectively). This information uses characters to convey meanings and not to imply a specific character code, except where otherwise stated. The exceptions are certain built-in functions that convert between characters and their representations. The functions C2D, C2X, D2C, X2C, and XRANGE have a dependence on the character set in use.

A code page specifies the encodings for each character in a set. Be aware that:

Some code pages do not contain all characters that REXX defines as valid (for example, ¬, the logical NOT character).
Some characters that REXX defines as valid have different encodings in different code pages (for example, !, the exclamation point).

Comments

A comment is a sequence of characters (on one or more lines) delimited by /* and */. Any characters are allowed within these delimiters.

Comments can contain other comments, as long as each begins and ends with the necessary delimiters. They are called nested comments. Comments can be anywhere and can be of any length. They have no effect on the program, but they do act as separators. Two tokens with only a comment in between are not treated as a single token.

/* This is an example of a valid REXX comment */

Take special care when commenting out lines of code that contain /* or */ as part of a literal string. Consider the following program segment:

01    parse pull input
02    if substr(input,1,5) = '/*123'
03      then call process
04    dept = substr(input,32,5)

To comment out lines 2 and 3, the following change would be incorrect because the language processor would interpret the /* that is part of the literal string /*123 as the start of a nested comment. It would not process the rest of the program because it would be looking for a matching comment end (*/).:

01    parse pull input
02 /* if substr(input,1,5) = '/*123'
03      then call process
04 */ dept = substr(input,32,5)

You can avoid this type of problem by using concatenation for literal strings that contain /* or */; line 2 would be:

if substr(input,1,5) = '/' || '*123'

You could comment out lines 2 and 3 correctly as follows:

01    parse pull input
02 /* if substr(input,1,5) = '/' || '*123'
03      then call process
04 */ dept = substr(input,32,5)

Tokens

A token is the unit of low-level syntax from which clauses are built. Programs written in REXX are composed of tokens. They are separated by blanks or comments or by the nature of the tokens themselves.

The classes of tokens are as follows:

Literal strings

A literal string is a sequence that includes any characters and that is delimited by the single quotation mark ' or the double quotation mark ". Use two consecutive double quotation marks "" to represent a " character in a string that is delimited by double quotation marks. Similarly, use two consecutive single quotation marks '' to represent a ' character in a string that is delimited by single quotation marks. A literal string is a constant and its contents are never modified when it is processed.

A literal string with no characters (that is, a string of length 0) is called a null string.

The following are examples of valid strings:

'Fred'
"Don't Panic!"
'You shouldn''t'        /* Same as "You shouldn't" */
''                      /* The null string         */

A string that is followed immediately by a ( is considered to be the name of a function.
A string that is followed immediately by the symbol X or x is considered to be a hexadecimal string.
A string that is followed immediately by the symbol B or b is considered to be a binary string.

Implementation maximum: A literal string can contain up to 250 characters. The length of computed results is limited only by the amount of storage available. See the note in REXX general concepts for more information.

Hexadecimal strings

A hexadecimal string is a literal string, expressed using a hexadecimal notation of its encoding. It is any sequence of zero or more hexadecimal digits (0-9, a-f, A-F), grouped in pairs. A single leading 0 is assumed, if necessary, at the front of the string to make an even number of hexadecimal digits. The groups of digits are optionally separated by one or more blanks, and the whole sequence is delimited by single or double quotation marks, and immediately followed by the symbol X or x. (x or X cannot be part of a longer symbol.) The blanks, which can be present only at byte boundaries (and not at the beginning or end of the string), are to aid readability. The language processor ignores them. A hexadecimal string is a literal string formed by packing the hexadecimal digits given. Packing the hexadecimal digits removes blanks and converts each pair of hexadecimal digits into its equivalent character, for example: 'C1'X to A.

You can use hexadecimal strings to include characters in a program even if you cannot directly enter the characters themselves. The following are examples of valid hexadecimal strings:

'ABCD'x
"1d ec f8"X
"1 d8"x

Note: A hexadecimal string is not a representation of a number. Rather, it is an escape mechanism so that a user can describe a character in terms of its encoding (and, therefore, is machine-dependent). In EBCDIC, '40'X is the encoding for a blank. In every case, a string of the form '.....'x is simply an alternative to a straightforward string. In EBCDIC 'C1'x and 'A' are identical, as are '40'x and a blank, and must be treated identically.

Also, be aware that in Assembler language, hexadecimal numbers are represented with the X in front of the number. REXX only accepts hexadecimal numbers in the format described previously. This information might show hexadecimal numbers represented in both ways, but when you code a hexadecimal string in REXX, place the X after the number.

Implementation maximum: The packed length of a hexadecimal string (the string with blanks removed) cannot exceed 250 bytes.

Binary strings

A binary string is a literal string, expressed using a binary representation of its encoding. It is any sequence of zero or more binary digits (0 or 1) in groups of 8 (bytes) or 4 (nibbles). The first group can have fewer than four digits; in this case, up to three 0 digits are assumed to the left of the first digit, making a total of four digits. The groups of digits are optionally separated by one or more blanks, and the whole sequence is delimited by matching single or double quotation marks and immediately followed by the symbol b or B. (b or B cannot be part of a longer symbol.) The blanks, which can be present only at byte or nibble boundaries (and not at the beginning or end of the string), are to aid readability. The language processor ignores them.

A binary string is a literal string formed by packing the binary digits given. If the number of binary digits is not a multiple of eight, leading zeros are added on the left to make a multiple of eight before packing. Binary strings allow you to specify characters explicitly, bit by bit.

The following are examples of valid binary strings:

'11110000'b        /* == 'f0'x                  */
"101 1101"b        /* == '5d'x                  */
'1'b               /* == '00000001'b and '01'x  */
'10000 10101010'b  /* == '0001 0000 1010 1010'b */
''b                /* == ''                     */

Implementation maximum: The packed length of a hexadecimal string (the string with blanks removed) cannot exceed 250 bytes.

Symbols

Symbols are groups of characters, selected from the following character sets:

English alphabetic characters (A-Z and a-z)
Some code pages do not include lowercase English characters a-z
Numeric characters (0-9)
Characters . ! ? and _ (underscore).
The encoding of the exclamation point character ! depends on the code page in use.
Double-Byte Character Set (DBCS) characters (X'41'-X'FE'). ETMODE must be in effect for these characters to be valid in symbols.

Any lowercase alphabetic character in a symbol is translated to uppercase (that is, lowercase a-z to uppercase A-Z) before use.

The following are examples of valid symbols:

Fred
Albert.Hall
WHERE?
<.H.E.L.L.O>               /* This is DBCS */

If a symbol does not begin with a digit or a period, you can use it as a variable and can assign it a value. If you have not assigned it a value, its value is the characters of the symbol itself, translated to uppercase (that is, lowercase a-z to uppercase A-Z). Symbols that begin with a number or a period are constant symbols and cannot be assigned a value.

One other form of symbol is allowed to support the representation of numbers in exponential format. The symbol starts with a digit (0-9) or a period, and ends with the sequence E or e, followed immediately by an optional sign (- or +), followed immediately by one or more digits (which cannot be followed by any other symbol characters). The sign in this context is part of the symbol and is not an operator.

The following are examples of valid numbers in exponential notation:

17.3E-12
.03e+9

Implementation maximum: A symbol can consist of up to 250 characters. Its value, if it is a variable, is limited only by the amount of storage available. See the note in REXX general concepts for more information.

Numbers

These are character strings that consist of one or more decimal digits, with an optional prefix of a plus or minus sign, and optionally including a single period (.) that represents a decimal point. A number can also have a power of 10 suffixed in conventional exponential notation: an E (uppercase or lowercase), followed optionally by a plus or minus sign, then followed by one or more decimal digits defining the power of 10. Whenever a character string is used as a number, rounding may occur to a precision specified by the NUMERIC DIGITS instruction (default nine digits). See Numbers and arithmetic operations for a full definition of numbers.

Numbers can have leading blanks (before and after the sign, if any) and can have trailing blanks. Blanks may not be embedded among the digits of a number or in the exponential part. Note that a symbol or a literal string might be a number. A number cannot be the name of a variable.

The following are examples of valid numbers:

12
'-17.9'
127.0650
73e+128
' + 7.9E5 '
'0E000'

You can specify numbers with or without quotation marks around them. Note that the sequence -17.9 (without quotation marks) in an expression is not simply a number. It is a minus operator (which may be prefix minus if no term is to the left of it) followed by a positive number. The result of the operation is a number.

A whole number is a number that has a zero (or no) decimal part and that the language processor would not usually express in exponential notation. That is, it has no more digits before the decimal point than the current setting of NUMERIC DIGITS (the default is 9).

Implementation maximum: The exponent of a number expressed in exponential notation can have up to nine digits.

Operator characters

The characters + - \ / % * | & = ¬ > < and the sequences

>=
<= \> \< \= >< <> == \== // && || **

¬> ¬< ¬= ¬== >> << >>= \<< ¬<< \>> ¬>> <<= /= /== indicate operations (see Operators). A few of these are also used in parsing templates, and the equal sign is also used to indicate assignment. Blanks adjacent to operator characters are removed. Therefore, the following are identical in meaning:

345>=123
345 >=123
345 >= 123
345 > = 123

Some of these characters might not be available in all character sets. In this situation, you can use appropriate translations. In particular, the vertical bar (|) or character is often shown as a split vertical bar.

Throughout the language, the not character, ¬, is synonymous with the backslash (\). You can use the two characters interchangeably according to availability and personal preference.

Special Characters

The following characters, together with the individual characters from the operators, have special significance when found outside of literal strings:

,   ;   :   )   (

These characters constitute the set of special characters. They all act as token delimiters, and blanks adjacent to any of these are removed. The exception is a blank adjacent to the outside of a parenthesis, which is deleted only if it is also adjacent to another special character (unless the character is a parenthesis and the blank is outside it, too). For example, the language processor does not remove the blank in A (Z). This is a concatenation that is not equivalent to A(Z), a function call. The language processor does remove the blanks in (A) + (Z) because this is equivalent to (A)+(Z).

The following example shows how a clause is composed of tokens.

'REPEAT'   A + 3;

This is composed of six tokens:

a literal string ('REPEAT')
a blank operator
a symbol (A, which can have a value)
an operator (+)
a second symbol (3, which is a number and a symbol)
the clause delimiter (;)

The blanks between the A and the + and between the + and the 3 are removed. However, one of the blanks between the 'REPEAT' and the A remains as an operator. Thus, this clause is treated as though written:

'REPEAT' A+3;

Implied semicolons

The last element in a clause is the semicolon delimiter. The language processor implies the semicolon: at a line-end, after certain keywords, and after a colon if it follows a single symbol. This means that you need to include semicolons only when there is more than one clause on a line, or to end an instruction whose last character is a comma.

A line-end usually marks the end of a clause and, thus, REXX implies a semicolon at most end of lines. However, there are the following exceptions:

The line ends in the middle of a string.
The line ends in the middle of a comment. The clause continues on to the next line.
The last token was the continuation character (a comma) and the line does not end in the middle of a comment. (Note that a comment is not a token.)

REXX automatically implies semicolons after colons (when following a single symbol, a label) and after certain keywords when they are in the correct context. The keywords that have this effect are: ELSE, OTHERWISE, and THEN. These special cases reduce typographical errors significantly.

Note: The two characters forming the comment delimiters, /* and */, must not be split by a line-end (that is, / and * should not appear on different lines) because they could not then be recognized correctly; an implied semicolon would be added. The two consecutive characters forming a literal quotation mark within a string are also subject to this line-end ruling.

Continuations

One way to continue a clause onto the next line is to use the comma, which is referred to as the continuation character.

The comma is functionally replaced by a blank, and, thus, no semicolon is implied. One or more comments can follow the continuation character before the end of the line. The continuation character cannot be used in the middle of a string or it will be processed as part of the string itself. The same situation holds true for comments. Note that the comma remains in execution traces.

The following example shows how to use the continuation character to continue a clause.

say 'You can use a comma',
    'to continue this clause.'

This displays:

You can use a comma to continue this clause.