Best practices for using a Unicode Collation Algorithm (UCA)-collated database
When updating a Db2 database that uses UCA collation, special care needs to be taken when working with character type data. Each updated version of the UCA-based International Components for Unicode (ICU), that is used by Db2, can include bug fixes, security fixes and updated CLDR data and algorithms that describe a language. For this reason, updating ICU versions can impact sorting and comparison rules on objects using character data.
Character data under UCA collation is sorted using rules that define the locale-based character sequence, with options for specifying case-sensitivity, accent marks, text normalization, variable Top attribute controls, and handling of punctuations. For information on UCA collation attributes, see Table 1 in Unicode Collation Algorithm-based collations.
Different versions of the ICU can use different versions of CLDR data and collation algorithm, which can impact sorting and comparison rules (collation) on objects using string types. As of Db2® 12.1, a particular version of Db2 only supports one version of ICU and does not maintain compatibility with previous versions.
As a best practice, avoid using string columns of char and varchar data type on database object that are affected by collation of the database. The behavior of such data types can change when the ICU support level of Db2 is refreshed. Avoiding these typed columns in object improves the compatibility, efficiency, long-term maintenance, and ease of operation when upgrading to a new mod pack or release.
- Indexes
- Indexes on character columns use the database collation as its default collation. Changes in sorting order can cause incorrect ordering and comparison that can require the rebuilding of indexes when moving to a version of Db2 that refreshes its ICU support level.
- Materialized Query Tables (MQTs)
- If MQTs are built based on collation sensitive string columns, a refresh of the Db2 ICU support level, an MQT might need to be rebuilt. This also applies when MQTs use character functions, or when they cast explicitly or implicitly.
- Data-partitioning keys
- Change in collation can affect data partitioning in DPF nodes. When moving to a Db2 release where the ICU level has changed, you might need to rebuild tables using the ADMIN_MOVE_TABLE procedure to ensure that the table is partitioned correctly and that it reflects the collation rule change.
- Range partitioning on character columns
- Change in collation can affect range partitioning on character columns. When moving to a Db2 release where the ICU level has changed, you might need to rebuild tables using the ADMIN_MOVE_TABLE procedure to ensure that the table is partitioned correctly reflecting the collation rule change.
- Multidimensional clustering (MDC) tables
- Use numeric types in MDC dimensions to avoid a rebuild of MDC when Db2 refreshes its ICU support level.