Deciding whether to store data as UTF-8 or UTF-16
If you create a Unicode database in DB2® for z/OS®, you need to decide whether to use UTF-8 or UTF-16. DB2 for z/OS does not support storing data as UTF-32. UTF-8 and UTF-16 can both represent any Unicode character that you need to represent, but each format has advantages and disadvantages depending on your situation.
Procedure
To decide whether to store data as UTF-8 or UTF-16:
- Performance recommendation: Store your data in DB2 in the same format as your application.
This setup ensures optimal performance, because character conversion
is avoided.
This recommendation is especially important when the application is written in a language that runs on z/OS (for example COBOL on z/OS), because the CPU cost of character conversion on z/OS can be very expensive.
Examples:- COBOL and PL/I on z/OS use UTF-16 for Unicode data. Neither language supports UTF-8. So if you are using COBOL or PL/I applications on z/OS that process Unicode data, the optimal situation is to store your data in DB2 in UTF-16. In this case, even though UTF-16 data can potentially take more storage than UTF-8 data , no conversion occurs. Thus you avoid a significant performance impact.
- For Java applications that use the type 4 z/OS driver, which sends the data in UTF-8, store your data in DB2 as UTF-8 data.
- Storage
recommendation: After you consider performance, consider your
storage requirements. Store the data in the format that requires the
least space for your data.
UTF-16 does not always require more storage than UTF-8. The amount of storage that is required depends on your data. For example, Latin-1 characters always take 1 byte in UTF-8 and 2 bytes in UTF-16. However, Japanese characters take 3 to 4 bytes in UTF-8 and 2 to 4 bytes in UTF-16.
Example: DB2 for z/OS uses UTF-8 for the catalog. Because the catalog contains mostly Latin-1 characters, this format uses considerably less space than UTF-16. - Recommendation for MQ, CICS® Transaction Gateway, and IMS™ Connect messages: When messages are passed from one technology to another, everything in the message is usually converted to characters. You should consider the size of these messages when you decide when and where to use certain UTFs. For example, suppose that you have COBOL applications, which use UTF-16, but you are concerned about the size of the messages. You might decide to convert the messages to UTF-8 before you put them on the wire. This setup compresses the messages.