Considerations for using UTF-8 support

You should consider the following points when deciding whether or not to use UTF-8 support for a Content Manager OnDemand instance.

About this task

UTF-8 support, also known as Unicode support, is enabled by using the Database files CCSID (DBFCCSID) parameter of the Create Instance (CRTINSTOND) command. If the DBFCCSID parameter value is specified as *UTF8 when you run the CRTINSTOND command, a parameter entry of ARS_IBMI_UTF8_TABLES=1 is added to the ARS.CFG configuration file for the instance. This parameter setting causes all character fields in all database files in the instance to be created with a CCSID setting of 1208. The ARS_IBMI_UTF8_TABLES parameter value must not be changed after the instance is created. The ARS_IBMI_UTF8_TABLES parameter must not be added manually to the ARS.CFG file for any new or existing instances. Note that the instance is still created with specific Language ID (LANGID) and Locale (LOCALE) parameter settings. The instance server job will run in the specified language and locale.

Note: At V7.6 and later, the default for the DBFCCSID parameter is *UTF8.
Consider the following before creating a new instance, to determine if the instance should be created with UTF-8 support enabled.
  • UTF-8 support is valid only for new instances. Conversion of existing instances to UTF-8 is not supported.
  • Jobs loading data into a UTF-8 instance must NOT have a CCSID value of 65535. If the job that is loading data has a CCSID value of 65535, the load will fail with message OND2049 - Application group not found. To resolve the problem, you can change the job to have a different CCSID value, change the CCSID setting of the user profile, or change the Coded character set identifier (QCCSID) system value.
  • When storing indexes in a UTF-8 instance, it is important to note that some characters will use more than one byte when stored in a UTF-8 field. Latin lowercase and uppercase characters [a-z] [A-Z] and Arabic numerals [0-9] use only one byte. Accented characters might use two bytes. DBCS characters might use two, three, or four bytes.
  • When using the graphical indexer of the OnDemand Administrator client with a UTF-8 instance, the Indexer Properties dialog will be presented before your sample data is displayed. You must set the Code Page to the value that matches the data being indexed.
  • When indexes are stored, they are converted from the Code Page specified on the Indexer Properties dialog to UTF-8 (CCSID 1208). String conversion between code pages might result in an increase in the length of the string when data is loaded on the server. For example, the OnDemand Administrator client might require two bytes to display a double-byte character, yet the server might require three bytes to store the character in the database.
  • When selecting a string, the graphical indexer increases the string length to a size that is sufficient to hold the data that you have selected. If you expect that other possible values for the field might require more space than the graphical indexer calculated, you can override the length by typing a different number in the space provided.
  • When storing data for languages such as Greek, Russian, and Arabic, it is recommended that you create application group string fields that are double the length you would use if the instance did not support UTF-8. For other languages, if your index values contain accented characters, you will need to make the fields longer.