The EBCDIC character set

z/OS® data sets are encoded in the Extended Binary Coded Decimal Interchange (EBCDIC) character set. This is a character set that was developed before ASCII (American Standard Code for Information Interchange) became commonly used.

Most systems that you are familiar with use ASCII. In addition, z/OS UNIX® files are encoded in ASCII.

You need to be aware of the differences in encoding schemes when moving data from ASCII-based systems to EBCDIC-encoded systems. Generally the conversion is handled internally, for example when text is sent from a 3270 emulator running on a PC to a TSO session. However, when transferring programs, these must not normally be translated and a binary transfer must be specified. Occasionally, even when transferring text, there are problems with certain characters such as the OR sign (|) or the logical NOT sign, and the programmer must look at the actual value of the translated character.

ASCII and EBCDIC are both 8-bit character sets. The difference is the way they assign bits for specific characters. The following are a few examples:

Character	EBCDIC	ASCII
A	11000001 (x'C1')	01000001 (x'41')
B	11000010 (x'C2')	01000010 (x'42')
a	10000001 (x'81')	01100001 (x'61')
1	11110001 (x'F1')	00110001 (x'31')
space	01000000 (x'40')	00100000 (x'20')

Although the ASCII arrangement might seem more logical, the huge amount of existing data in EBCDIC and the large number of programs that are sensitive to the character set make it impractical to convert all existing data and programs to ASCII.

A character set has a collating sequence, corresponding to the binary value of the character bits. For example, A has a lower value than B in both ASCII and EBCDIC. The collating sequence is important for sorting and for almost any program that scans and manipulates character strings. The general collating sequence for common characters in the two character sets is as follows:

	EBCDIC	ASCII
Lowest value:	space	space
	punctuation	punctuation
	lower case	numbers
	upper case	upper case
Highest value:	numbers	lower case

For example, "a" is less than "A" in EBCDIC, but "a" is greater than "A" in ASCII. Numeric characters are less than any alphabetic letter in ASCII but are greater than any letter in EBCDIC. A-Z and a-z are two contiguous sequences in ASCII. In EBCDIC there are gaps between some letters. If we subtract A from Z in ASCII we have 25. If we subtract A from Z in EBCDIC we have 40 (due to the gaps in binary values between some letters).

Converting simple character strings between ASCII and EBCDIC is trivial. The situation is more difficult if the character being converted is not present in the standard character set of the target code. A good example is a logical not symbol that is used in a major mainframe programming language (PL/I); there is no corresponding character in the ASCII set. Likewise, some ASCII characters used for C programming were not present in the original EBCDIC character set, although these were later added to EBCDIC. There is still some confusion about the cent sign (¢) and the hat symbol (^), and a few more obscure symbols.

Mainframes also use several versions of double-byte character sets (DBCS), mostly for Asian languages. The same character sets are used by some PC programs.

Traditional mainframe programming does not use special characters to terminate fields. In particular, nulls and new line characters (or CL/LF character pairs) are not used. There is no concept of a binary versus a text file. Bytes can be interpreted as EBCDIC or ASCII or something else if programmed properly. If such files are sent to a mainframe printer, it will attempt to interpret them as EBCDIC characters because the printer is sensitive to the character set. The z/OS Web server routinely stores ASCII files because the data will be interpreted by a PC browser program that expects ASCII data. Providing that no one attempts to print the ASCII files on a mainframe printer (or display them on a 3270), the system does not care what character set is being used.