Processing data in an international environment
Enterprise COBOL supports the UTF-16 and UTF-8 Unicode encodings for character data at run time. UTF-16 is a fixed-width Unicode encoding that provides a consistent and efficient way to encode plain text. Using UTF-16, you can develop software that will work with various national languages.
About this task
Use these COBOL facilities to code and compile programs that process national data:
- Data types and literals:
- Character data types, defined with the
USAGE NATIONALclause and aPICTUREclause that defines data of category national, national-edited, or numeric-edited - Numeric data types, defined with the
USAGE NATIONALclause and aPICTUREclause that defines a numeric data item (a national decimal item) or an external floating-point data item (a national floating-point item) - National literals, specified with literal prefix
NorNX - Figurative constant
ALLnational-literal - Figurative constants
QUOTE,SPACE,HIGH-VALUE,LOW-VALUE, orZERO, which have national character (UTF-16) values when used in national-character contexts
- Character data types, defined with the
- The COBOL statements shown in the related reference below about COBOL statements and national data
- Intrinsic functions:
NATIONAL-OFto convert an alphanumeric or double-byte character set (DBCS) character string toUSAGE NATIONAL(UTF-16)DISPLAY-OFto convert a national character string toUSAGE DISPLAYin a selected code page (EBCDIC, ASCII, EUC, or UTF-8)- The other intrinsic functions shown in the related reference below about intrinsic functions and national data
- The
GROUP-USAGE NATIONALclause to define groups that contain onlyUSAGE NATIONALdata items and that behave like elementary category national items in most operations - Compiler options:
CODEPAGEto specify the code page to use for alphanumeric and DBCS data in your programNSYMBOLto control whether national or DBCS processing is used for theNsymbol in literals andPICTUREclauses
You can also take advantage of implicit conversions of alphanumeric or DBCS data items to national representation. The compiler performs such conversions (in most cases) when you move these items to national data items, or compare these items with national data items.
UTF-8 is a variable-width Unicode encoding that is popular for data transmission and World Wide Web-related data formats such as HTML and JSON. Using UTF-8, you can develop software that will work with various national languages.
- Data types and literals:
- Character data types, defined with the
USAGE UTF-8clause and aPICTUREclause that defines data of category UTF-8 - UTF-8 literals, specified with literal prefix
UorUXUliterals might contain a special Unicode escape sequence \uhhhh or \U00hhhhhh to specify an individual Unicode code point, where hhhh and hhhhhhh are the Unicode code point value specified as a sequence of hexadecimal digits.
- Figurative constant
ALLutf-8-literal - Figurative constants
QUOTE,SPACE,HIGH-VALUE,LOW-VALUE, orZERO, which have UTF-8 character (UTF-8) values when used in UTF-8 character contexts
- Character data types, defined with the
- The COBOL statements shown in the related reference about COBOL statements and UTF-8 data
- Intrinsic functions:
NATIONAL-OFto convert a UTF-8 character string toUSAGE NATIONAL(UTF-16)DISPLAY-OFto convert a UTF-8 character string toUSAGE DISPLAYin a selected code page (EBCDIC, ASCII, EUC) and to convert national data toUSAGE UTF-8(UTF-8)- The other intrinsic functions shown in the related reference about intrinsic functions and UTF-8 data
- The
GROUP-USAGE UTF-8clause to define groups that contain onlyUSAGE UTF-8data items and that behave like elementary category UTF-8 items in most operations
- Intrinsic functions:
You can also take advantage of implicit conversions of alphanumeric or national data items to UTF-8 representation. The compiler performs such conversions (in most cases) when you move these items to UTF-8 data items, or compare these items with UTF-8 data items.
Using national data (Unicode) in COBOL
Converting to or from national (Unicode) representation
Processing UTF-8 data using UTF-16 (national) data types
Processing Chinese GB 18030 data
Comparing national (UTF-16) data
Coding for use of DBCS support
Converting double-byte character set (DBCS) data
COBOL statements and national data
Intrinsic functions and national data
CODEPAGE
NSYMBOL
Classes and categories of data (Enterprise COBOL for z/OS Language Reference)
Data categories and PICTURE rules
(Enterprise COBOL for z/OS Language Reference)
MOVE statement (Enterprise COBOL for z/OS Language Reference)
General relation conditions (Enterprise COBOL for z/OS Language Reference)