Processing data in an international environment

Enterprise COBOL supports Start of changethe UTF-16 and UTF-8 Unicode encodingsEnd of change for character data at run time. UTF-16 Start of changeis a fixed-width Unicode encoding thatEnd of change provides a consistent and efficient way to encode plain text. Using UTF-16, you can develop software that will work with various national languages.

Use these COBOL facilities to code and compile programs that process national data:

  • Data types and literals:
    • Character data types, defined with the USAGE NATIONAL clause and a PICTURE clause that defines data of category national, national-edited, or numeric-edited
    • Numeric data types, defined with the USAGE NATIONAL clause and a PICTURE clause that defines a numeric data item (a national decimal item) or an external floating-point data item (a national floating-point item)
    • National literals, specified with literal prefix N or NX
    • Figurative constant ALL national-literal
    • Figurative constants QUOTE, SPACE, HIGH-VALUE, LOW-VALUE, or ZERO, which have national character (UTF-16) values when used in national-character contexts
  • The COBOL statements shown in the related reference below about COBOL statements and national data
  • Intrinsic functions:
    • NATIONAL-OF to convert an alphanumeric or double-byte character set (DBCS) character string to USAGE NATIONAL (UTF-16)
    • DISPLAY-OF to convert a national character string to USAGE DISPLAY in a selected code page (EBCDIC, ASCII, EUC, or UTF-8)
    • The other intrinsic functions shown in the related reference below about intrinsic functions and national data
  • The GROUP-USAGE NATIONAL clause to define groups that contain only USAGE NATIONAL data items and that behave like elementary category national items in most operations
  • Compiler options:
    • CODEPAGE to specify the code page to use for alphanumeric and DBCS data in your program
    • NSYMBOL to control whether national or DBCS processing is used for the N symbol in literals and PICTURE clauses

You can also take advantage of implicit conversions of alphanumeric or DBCS data items to national representation. The compiler performs such conversions (in most cases) when you move these items to national data items, or compare these items with national data items.

Start of changeUTF-8 is a variable-width Unicode encoding that is popular for data transmission and World Wide Web-related data formats such as HTML and JSON. Using UTF-8, you can develop software that will work with various national languages. End of change

Start of changeUse these COBOL facilities to code and compile programs that process UTF-8 data:
  • Data types and literals:
    • Character data types, defined with the USAGE UTF-8 clause and a PICTURE clause that defines data of category UTF-8
    • UTF-8 literals, specified with literal prefix U or UX
      • U literals might contain a special Unicode escape sequence \uhhhh or \U00hhhhhh to specify an individual Unicode code point, where hhhh and hhhhhhh are the Unicode code point value specified as a sequence of hexadecimal digits.
    • Figurative constant ALL utf-8-literal
    • Figurative constants QUOTE, SPACE, HIGH-VALUE, LOW-VALUE, or ZERO, which have UTF-8 character (UTF-8) values when used in UTF-8 character contexts
  • The COBOL statements shown in the related reference about COBOL statements and UTF-8 data
    • Intrinsic functions:
      • NATIONAL-OF to convert a UTF-8 character string to USAGE NATIONAL (UTF-16)
      • DISPLAY-OF to convert a UTF-8 character string to USAGE DISPLAY in a selected code page (EBCDIC, ASCII, EUC) and to convert national data to USAGE UTF-8 (UTF-8)
      • The other intrinsic functions shown in the related reference about intrinsic functions and UTF-8 data
    • The GROUP-USAGE UTF-8 clause to define groups that contain only USAGE UTF-8 data items and that behave like elementary category UTF-8 items in most operations
End of change

Start of changeYou can also take advantage of implicit conversions of alphanumeric or national data items to UTF-8 representation. The compiler performs such conversions (in most cases) when you move these items to UTF-8 data items, or compare these items with UTF-8 data items.End of change

Related references  
COBOL statements and national data  
Intrinsic functions and national data  
CODEPAGE
  
NSYMBOL  
Classes and categories of data (Enterprise COBOL for z/OS Language Reference)  
Data categories and PICTURE rules
(Enterprise COBOL for z/OS Language Reference)  
MOVE statement (Enterprise COBOL for z/OS Language Reference)  
General relation conditions (Enterprise COBOL for z/OS Language Reference)