Character self-defining term

A character self-defining term consists of 1-to-4 characters enclosed in apostrophes, and must be preceded by the letter C. All letters, decimal digits, and special characters can be used in a character self-defining term. In addition, any of the remaining EBCDIC characters can be designated in a character self-defining term. Examples of character self-defining terms are:
C'/'
C' ' (space)
C'ABC'
C'13'
Because of the use of apostrophes in the assembler language and ampersands in the macro language as syntactic characters, the following rule must be observed when using these characters in a character self-defining term:
  • For each apostrophe or ampersand you want in a character self-defining term, two apostrophes or ampersands must be written. For example, the character value A'# is written as 'A''#', while a single apostrophe followed by a space and another apostrophe is written as ''' '''.

For C-type character self-defining terms, each character in the character sequence is assembled as its 8 bit code equivalent.

The two apostrophes or ampersands that must be used to represent an apostrophe or ampersand within the character sequence are assembled as an apostrophe or ampersand. Double-byte data can appear in a character self-defining term, if the DBCS assembler option is specified. The assembled value includes the SO and SI delimiters. Hence a character self-defining term containing double-byte data is limited to one double-byte character delimited by SO and SI. For example, C'<.A>'.

Since the SO and SI are stored, the null double-byte character string, C'<>', is also a valid character self-defining term.

A character self-defining term in an assembler expression may be automatically translated to a different CCSID (identifying the encoding scheme and code page). It may use a type extension to specify explicitly the target scheme, using type CA for ASCII, CE for EBCDIC and CU for Unicode. Translation for EBCDIC constants is optional, and is performed only if the CE option specifies a different target CCSID from the source EBCDIC option. A term with no type extension (type C) is also translated as specified by the CE option, and may be further translated using the old TRANSLATE option, which is supported primarily for compatibility.

The target CCSIDs for translation are specified using the options CA, CE and CU as at the start of the assembly, ignoring any changes specified in ACONTROL statements (because expression evaluation may be deferred or may be performed more than once for the same expression, so a self-defining term must have the same value if evaluated at different points in the assembly process). The source CCSID is normally specified by the EBCDIC option, but for type CU the CODEPAGE option may still be used, which is supported for compatibility only.

Character ASCII self-defining term: For Character ASCII (CA) terms the character string is converted to ASCII. Any paired occurrences of ampersands and apostrophes are converted to a single occurrence of such a character prior to conversion. The assembler then maps each EBCDIC character into its ASCII equivalent.

Character EBCDIC self-defining term: For Character EBCDIC (CE) terms and terms without a type extension (C), the character string is translated from the source CCSID specified by the EBCDIC option to the target CCSID specified by the initial value of the CE option, if different. If no type extension is present and the TRANSLATE option is used in conjunction with the option COMPAT(TRANSDT), the term is then further translated using the specified table.

Character Unicode self-defining term: For Character Unicode (CU) terms the character string is converted to the Unicode CCSID specified by the initial value of the CU option, which may be 1200 (UTF-16BE), 1202 (UTF-16LE) or 1208 (UTF-8). Any paired occurrences of ampersands and apostrophes are converted to a single occurrence of such a character prior to conversion. The assembler then maps each EBCDIC character into its Unicode equivalent and converts the result to the format specified by the CU option. For UTF-16, each character is translated to two bytes and for UTF-8 each character translates to up to three bytes: ASCII characters require one byte, accented letters typically require two bytes and the Euro symbol requires three bytes. The overall translated value for a self-defining term is limited to four bytes, so for UTF-16 only one or two characters may be specified and for UTF-8 up to four characters may be specified depending on the translated length.

For Unicode conversion, it is recommended to use the option CODEPAGE(LOCAL) to indicate that no CODEPAGE table is to be loaded and that the source CCSID is specified by the EBCDIC option. In this case a standard internal conversion table is used which correctly converts all characters from the source EBCDIC CCSID. Otherwise, the conversion mapping is defined by the selected CODEPAGE table, identified by its source CCSID. If that matches the CCSID specified by the EBCDIC option (or the Euro equivalent) the conversion is applied directly to the source code, otherwise the source is first converted to the CCSID specified by the initial CE option, if different. If the CODEPAGE CCSID matches neither the EBCDIC option nor the CE option, a warning is issued for the first use of a type CU self-defining term or constant.

The following Invariant characters have the same encoding (binary value) in all EBCDIC code pages. When you enter an invariant character you can be sure that the resulting binary value does not depend on which EBCDIC code page your input device (editor) is using. It will display or print as the same character regardless of which EBCDIC code page the output device (display or printer) is using.

  • space
  • decimal digits
  • upper-case and lower-case letters A through Z
  • these special characters:
    + < = > % & * " ' ( ) , _ - . / : ; ?