Tips for handling any extra storage that Unicode data might require

Unicode data often requires more storage than EBCDIC or ASCII data, but not always. The amount of extra storage that is required depends on the type of data and whether it is stored in UTF-8 or UTF-16 format.

Unicode data almost never requires double the amount of storage as EBCDIC or ASCII data. That amount of extra storage is the extreme worst-case scenario. To figure out how much space your Unicode data requires, consider the following two factors:

The type of data that you plan to store in Db2
How many character fields do you have? Any increased storage requirement affects mostly character fields. So if you convert an existing Db2 database to Unicode, look at the character fields that are defined in your existing database to get an idea of how much the database expands when you convert it to Unicode.

Is the data Latin-1, Japanese, Chinese, or something else? For example, the first 128 Latin-1 code points of UTF-8 take up only 1 byte. Those code points include the characters A-Z, a-z, and 0-9. Thus, these characters do not take up any more space in UTF-8 than they do in EBCDIC or ASCII. Also, consider that Chinese characters can take up less space in Unicode than EBCDIC.

The UTF format
Are you using UTF-8 or UTF-16? UTF-8 characters can take 1, 2, 3, or 4 bytes. UTF-16 characters can take 2 or 4 bytes. Even though UTF-16 often takes more storage, UTF-16 is sometimes a wiser choice for performance reasons. Also, in some cases, UTF-16 takes up less space. For example, Japanese characters are 3 or 4 bytes in UTF-8, but 2 or 4 bytes in UTF-16.

If possible, use the following general recommendations to minimize the storage impact of Unicode data:

  • Use data compression.
  • Use non-padded indexes. If you are converting data that has padded indexes to Unicode, change those indexes to be non-padded. This type of index can save index storage space.
  • If a column length is more than 18 bytes, use variable length data types.
  • Use 8-KB pages instead of the default 4-KB pages by increasing the size of the buffer pools. (The buffer pool in which you define the table space determines the page size.)