Planning for MBCS data

The use of multi-byte character set (MBCS) data with Oracle, DB2, or IBM® DB2® for z/OS® has specific database considerations, which are covered in the Cúram Third-Party Tools Installation Guide for Windows and Cúram Third-Party Tools Installation Guide for UNIX. Specific Cúram configuration is required when using MBCS data with DB2 or DB2 for z/OS so that the Data Manager functions compatibly. This configuration is enabled for Cúram as it is configured initially.

Cúram support for MBCS data with DB2 and DB2 for z/OS is enabled in its initial configuration to ensure error-free operation for users with languages that require MBCS data and for users who find they require MBCS data when copying or pasting data from other applications. This support entails expanding the size of string columns in the database because DB2 column sizes are based on bytes, which is not necessarily the length that is required when MBCS data is used. This procedure is explained in more detail in the Cúram Third-Party Tools Installation Guide for Windows and Cúram Third-Party Tools Installation Guide for UNIX. However, these default expansion settings might not be appropriate in the following circumstances:

  • If your data requirements do not necessitate the maximum expansion (as explained as follows) you can reduce the amount of expansion.
  • If you are using only single-byte data (a Western language, such as English) and not using any other MBCS data (for example, by a browser copy or paste), disable multi-byte expansion support. However, this procedure is not recommended due to the likelihood of MBCS data that is introduced from external sources (for example, browser copy or paste) and later causing errors.

Whether database expansion is applied by the Data Manager is controlled by the curam.db.multibyte.expansion property in Bootstrap.properties. The amount of expansion (a factor of 1.0 to 4.0) is set with the curam.db.multibyte.default.factor property in Bootstrap.properties. These properties are described in Cúram configuration parameters.

To be certain of not receiving any processing errors when processing MBCS data, the default expansion factor is set to the maximum. However, for many languages and data profiles it is unlikely that every database column character would require MBCS data or that all characters would require the maximum size of 4 bytes. A cost is associated with using the maximum expansion factor in terms of disk space used, network processor usage, memory usage, buffer pool performance, CPU usage, and so on. Therefore, it is best to use an expansion factor that balances resource usage and performance while avoiding or minimizing the possibility of application errors caused by data overruns. There are no strict rules for achieving this balance between resource usage and the possibility of application errors, but considerations, such as those that follow, can help you choose a reasonable expansion factor and your testing should confirm your choice.

Depending on your language, locale, and encoding, the number of required MBCS characters vary. For instance, if you are using English with only a few special characters (for example, smart quotation marks), you require little expansion. Or, if you are using a language that shares the Latin alphabet with some additional characters (for example, German), then you need more space for MBCS data. A language (for example, Chinese) that uses characters at the higher end of the Unicode range requires more space per character, which needs to be tempered by the number of characters that are required per word; that is, the language might convey more information in each character than a typical Latin alphabetic character. In other words, consider the average bytes required per character, word, and so on. Typically this average is only a rough estimate because, as studies show, character usage can vary depending on a number of factors; for example, data context, data that is more numeric (phone numbers), versus more textual data (names) and even free-form comments. So, some additional safety factor needs to be considered in choosing your expansion factor.

You also are able to control the expansion factor at a more fine-grained level in the modeling environment by specifying theMultibyte_Expansion_Factor option for a string domain, an entity string attribute, or both, which might be appropriate for your customizations. For more information, see the Cúram Modeling Reference Guide for setting these options. You might need to set these fine-grained expansions at this level due to various limits within DB2 and DB2 for z/OS regarding the size of rows, indexes, and so on, that can be exceeded by large expansion factors.

For more information on these limits, refer to the relevant DB2 or DB2 for z/OS SQL reference.