CODEPAGE

Use CODEPAGE to specify the coded character set identifier (CCSID) for an EBCDIC code page for processing compile-time and runtime COBOL operations that are sensitive to character encoding.

CODEPAGE option syntax

Default is: CODEPAGE(1140)

Abbreviations are: CP(ccsid)

ccsid must be an integer that represents a valid CCSID for an EBCDIC code page.

The default CCSID 1140 is the equivalent of CCSID 37 (COM EUROPE EBCDIC), but additionally includes the euro symbol.

ccsid specifies these encodings:

The encoding for alphanumeric, national, UTF-8, and DBCS literals in a COBOL source program
The default encoding of the content of alphanumeric and DBCS data items at run time
The encoding for DBCS user-defined words when processed by an XML GENERATE statement to create XML element and attribute names
The default encoding of an XML document created by an XML GENERATE statement if the receiving data item for the document is alphanumeric
The default encoding assumed for an XML document in an alphanumeric data item when the document is processed by an XML PARSE statement

The CODEPAGE ccsid is used when code-page-sensitive operations are performed at compile time or run time, and an explicit CCSID that overrides the default code page is not specified. Such operations include:

Conversion of literal values to Unicode
Conversion of alphanumeric data to and from national (Unicode) data and UTF-8 (Unicode) data as part of move operations, comparison, or the intrinsic functions DISPLAY-OF and NATIONAL-OF
Object-oriented language such as INVOKE statements or class definitions and method definitions
XML parsing
XML generation
Processing of DBCS names as part of XML generation at run time
Processing of SQL string host variables if the SQLCCSID option is in effect
Processing of source code for EXEC SQL statements
Processing of source code for EXEC SQLIMS statements

However, the encoding of the following items in a COBOL source program is not affected by the CODEPAGE compiler option:

Data items that have USAGE NATIONAL
These items are always encoded in UTF-16 in big-endian format, CCSID 1200.
Data items that have USAGE UTGF-8
These items are always encoded in UTF-8 format, CCSID 1208.
Characters from the basic COBOL character set (see the table of these characters in the related reference below about characters)
Though the encoding of the basic COBOL characters default currency sign ($), quotation mark ("), and the lowercase Latin letters varies in different EBCDIC code pages, the compiler always interprets these characters using the EBCDIC code page 1140 encoding. In particular, the default currency sign is always the character with value X'5B' (unless changed by the CURRENCY compiler option or the CURRENCY SIGN clause in the SPECIAL-NAMES paragraph), and the quotation mark is always the character with value X'7F'.

Some COBOL operations can override the CODEPAGE ccsid by using an explicit encoding specification, for example:

DISPLAY-OF and NATIONAL-OF intrinsic functions that specify a code page as the second argument
XML PARSE statements that specify the WITH ENCODING phrase
XML GENERATE statements that specify the WITH ENCODING phrase

Additionally, you can use the CURRENCY compiler option or the CURRENCY SIGN clause in the SPECIAL-NAMES paragraph to override:

The default currency symbol used in the PICTURE character-strings for numeric-edited data items in your source program
The currency sign value used in the content of numeric-edited data items at run time

DBCS code pages:

Compile your COBOL program using the CODEPAGE option with the ccsid set to one of the EBCDIC multibyte character set (MBCS) CCSIDs shown in the table below if the program contains any of the following items:

User-defined words formed with DBCS characters
DBCS (USAGE DISPLAY-1) data items
DBCS literals

All of the CCSIDs in the table below identify mixed code pages that refer to a combination of SBCS and DBCS coded character sets. These are also the CCSIDs that are supported for mixed data by Db2®.

Table 1. **EBCDIC multibyte coded character set identifiers**
National language	MBCS CCSID	SBCS CCSID component	DBCS CCSID component
Japanese (Katakana-Kanji)	930	290	300
Japanese (Katakana-Kanji with euro)	1390	8482	16684
Japanese (Katakana-Kanji)	5026	290	4396
Japanese (Latin-Kanji)	939	1027	300
Japanese (Latin-Kanji with euro)	1399	5123	16684
Japanese (Latin-Kanji)	5035	1027	4396
Korean	933	833	834
Korean	1364	13121	4930
Simplified Chinese	935	836	837
Simplified Chinese	1388	13124	4933
Traditional Chinese	937	28709	835

Note: If you specify the TEST option, you must set the CODEPAGE option to the CCSID that is used for the COBOL source program. In particular, programs that use Japanese characters in DBCS literals or DBCS user-defined words must be compiled with the CODEPAGE option set to a Japanese codepage CCSID.

Note for Db2 users: IBM® recommends that you use the COBOL CCSID value the same as Db2 DSNHDECP value and/or the value for precompiler CCSID option.

Related concepts
COBOL and Db2 CCSID determination

Related references
CURRENCY
SQLCCSID
TEST
The encoding of XML documents
Characters (Enterprise COBOL for z/OS Language Reference)