Choosing a collation for a Unicode database

The collation of a database determines how string values are compared and ordered. Db2® provides three different types of collations for a Unicode database: IDENTITY collation, language-aware collation, and locale-sensitive UCA-based collation.

The collation you choose can significantly impact the performance of queries in the database. Collation also impacts how substring matching is done, which affects the behavior of SQL functions such as LIKE, POSITION, and REPLACE, and XQuery functions such as fn:substring-before and fn:starts-with.

If you want the quickest possible performance, then choose IDENTITY collation. However, note that this collation is not culturally correct.

If you want the most accurate, culturally expected treatment of characters, based on the Unicode Collation Algorithm, then choose locale-sensitive UCA-based collation. Choosing a local-sensitive UCA-based collation might cause a performance degredation.

If you want a collation identical to a non-Unicode database collation (for example, for a migration from a non-Unicode database to a Unicode database) then choose a language-aware collation, which is based on code points in a weight table. However, the language-aware collation only covers the first 256 code points in the weight table, while the other code points are sorted in binary order. In addition, these collations cannot handle different representations of a character.