Processing Unicode data in COBOL applications

COBOL supports UTF-16 data. COBOL has no support for UTF-8 data.

About this task

DB2® for z/OS®, however, supports both UTF-8 and UTF-16 data.

Procedure

To process Unicode data in COBOL applications for DB2 for z/OS, perform the following recommended actions:

  • Use one of the national data types for Unicode data. For example, use the COBOL PIC N(n) USAGE NATIONAL data type for Unicode character data. These data types are UTF-16 and enable COBOL to support Unicode data.

    Although COBOL does not have a native UTF-8 data type, you can still use a COBOL application to retrieve UTF-8 data from DB2. DB2 converts the output to the format that is required by the application. For example, if you query the DB2 catalog, DB2 converts the data for the COBOL application from UTF-8 to either UTF-16 (for PIC N USAGE NATIONAL variables) or EBCDIC (for PIC X variables). However, you should not store unconverted UTF-8 data in a COBOL variable. For example, if you have UTF-8 data in a PIC X variable, COBOL thinks that the data is EBCDIC and the data could get corrupted. Even something as simple as moving this UTF-8 value from one variable to another variable could corrupt the data, because COBOL pads the variable with X'40' for EBCDIC instead of X'20' for UTF-8.

  • Store your data in DB2 in UTF-16. This format often requires more space than UTF-8. However, you gain CPU savings in processing because DB2 and COBOL are both using UTF-16 data, and no conversions are needed.
  • Use the DB2 coprocessor to prepare your application.
  • Specify the appropriate CCSID for your COBOL application source and data according to the instructions in Specifying a CCSID for your application.
    Recommendation: Use the ENCODING bind option to specify the CCSID of the data. This option typically yields the best performance. However, depending on the situation, you might consider the other options for Specifying a CCSID for your application.
  • Do not specify ENCODING UNICODE as a bind option if your program uses PIC X variables and specifies the COBOL compiler option NOSQLCCSID. If you do specify ENCODING UNICODE in this situation, DB2 interprets these character variables as UTF-8, but COBOL does not support UTF-8. Thus, DB2 might misinterpret the data.