Application programming with Unicode data and multiple CCSIDs

If your application handles Unicode data or data that is in different encoding schemes, you should be aware of several programming techniques and recommendations in DB2®.

DB2 always returns data to your application in the CCSID that your application uses for data. This CCSID is called the application encoding scheme.

Recommendations: Use the following general recommendations to guide you in writing and preparing your application programs:
  • If possible, use either Unicode or EBCDIC data, but not both. If you do choose to use multiple encoding schemes, consider the following possible implications for data loss and performance:
    • Managing multiple CCSIDs in your application can be difficult. To ensure that data is not lost, you have to control where the data goes, a path that potentially includes many modules.
    • Many environments, such as CICS® Transaction Gateway and WebSphere® MQ are message-based. In these cases, the entire message must be in a single encoding scheme. Because the entire message is in one encoding, flowing some data through the application in EBCDIC and some in Unicode makes little sense. You still have to convert all of it to a single encoding, such as Unicode, right before the putting the message on the wire.
    • DB2 tables must be in the same encoding scheme. You cannot make some columns Unicode and some EBCDIC. If your application processes some columns in Unicode and others in EBCDIC, character conversion occurs, which likely increases the performance overhead.
  • If you are using Unicode data in COBOL or PL/I applications, use the coprocessor.
  • If your COBOL, PL/I, C/C++ , or Assembler application handles Unicode data, do not place literals in the source code of the application. Because these language compilers do not support Unicode source code, they could misinterpret these literal values. Instead, place these literal values in a file or DB2 table that can be accessed at the start of the program to load the values. (Files and host variables are not precompiled and compiled as application source code.)
  • If an expanding or contracting conversion occurs on your data, the length of the data might change. Be aware of these length changes when you use the LENGTH function, CHARACTER_LENGTH function, SUBSTRING function, and SUBSTR function on the converted string. For CHARACTER_LENGTH and SUBSTRING, use the CODEUNITS16 and CODEUNITS32 options to specify how you want DB2 to calculate the length.
  • If you need to represent characters from multiple Latin-based character sets, such as Latin-1 and Latin-4, consider using Unicode for your application encoding scheme. An SBCS CCSID does not have enough code points to represent all of the characters that the combination of the two character sets require. For example, assume that your application uses an EBCDIC CCSID, such as 277 or 1069. You might have some data that is represented in the database in Unicode but that cannot be retrieved by the application without substitution. If your application needs to handle only one language at a time, you can set up your infrastructure in one of the following ways:
    • Have one version of your application that uses CCSID 277 and another version that uses CCSID 1069. Also have two corresponding subsystems, one that uses CCSID 277 and another that uses CCSID 1069. (You cannot have multiple EBCDIC CCSIDs in one DB2 subsystem.)
    • Store the data in Unicode and have one version of your application that uses CCSID 277 and another version that uses CCSID 1069. Then bind these applications with different values for the ENCODING bind option.
    • Store the data in Unicode and have one version of your application that uses an EBCDIC CCSID and another version that uses Unicode.
    However, if you require that a single version of the application handle both Latin-1 and Latin-4 character sets, your application needs to process data in Unicode.