Processing Unicode data in COBOL applications
COBOL and Db2 for z/OS® support UTF-16 data and UTF-8 data.
Procedure
To process Unicode data in COBOL applications for Db2 for z/OS, perform the following recommended actions:
- For Unicode UTF-16 data, use one of the national data types. For example, specify PIC N(10) USAGE NATIONAL. For Unicode UTF-8 data, use the BYTE-LENGTH phrase of the PICTURE clause and the UTF-8 phrase of the USAGE clause. For example, specify PIC U BYTE-LENGTH 10 USAGE UTF-8.
If you use a COBOL application to retrieve UTF-8 data from Db2 into a variable other than a UTF-8 variable, Db2 converts the output to the format that is required by the application. For example, if you query the Db2 catalog, Db2 converts the data for the COBOL application from UTF-8 to either UTF-16 (for PIC N USAGE NATIONAL variables) or EBCDIC (for PIC X variables). However, you should not store unconverted UTF-8 data in a COBOL variable. For example, if you have UTF-8 data in a PIC X variable, COBOL thinks that the data is EBCDIC and the data could get corrupted. Even something as simple as moving this UTF-8 value from one variable to another variable could corrupt the data, because COBOL pads the variable with X'40' for EBCDIC instead of X'20' for UTF-8.
- Store your data in Db2 using the same encoding scheme as the input data. You gain CPU savings in processing because Db2 and COBOL are both using UTF-16 data or are both using UTF-8 data, and no conversions are needed.
- Use the Db2 coprocessor to prepare your application.
-
Specify the appropriate CCSID for your COBOL application source
and data according to the instructions in Specifying a CCSID for your application. Recommendation: Use the ENCODING bind option to specify the CCSID of the data. This option typically yields the best performance. However, depending on the situation, you might consider the other options for Specifying a CCSID for your application.
- Specify ENCODING UNICODE as a bind option in these situations:
- Your program uses Db2 variables that are declared as Unicode UTF-16 variables, and specifies the COBOL compiler option SQLCCSID.
- Your program uses Db2 variables that are declared as Unicode UTF-8 variables.