Unicode Collation Algorithm based collations
The CREATE DATABASE command and the COLLATION_KEY_BIT scalar function support the CLDR2701 collation keyword. This keyword implements the UCA (Unicode Collation Algorithm) specified in the Unicode Standard and based on the Common Locale Data Repository (CLDR).
The default Unicode Collation Algorithm is implemented by the CLDR2701 keyword without any attributes. Since the default UCA cannot simultaneously encompass the collating sequence of every language supported by Unicode, optional attributes can be specified to customize the UCA ordering. The attributes are separated by the underscore (_) character. The CLDR2701 keyword and any attributes form a UCA collation name.
The following table describes the collation attributes, their values, and typical usage examples.
Attribute name | Attribute short form | Valid values | Description |
---|---|---|---|
Locale:
|
Locale:
|
See Table 2 for a list of all the valid Locale names. |
The Locale attribute is probably the most important attribute to obtain ordering that conforms to the user expectations in different countries and regions. You need to explicitly specify the Locale attribute to properly collate text for a specific language. The Locale attribute consists of the following parts: language, region/country, script, and keyword. Not all the parts are mandatory. See Table 2 for a complete list of the valid combinations. The specification of a locale automatically presets all the other collation attributes to values that are suitable for that locale. Typically there is no need to specify additional collation attribute. Examples:
|
Strength | S | 1, 2, 3, 4, or I |
The Strength attribute determines whether accent or case is taken into account when collating or comparing text strings. In writing systems without case or accent, the Strength attribute controls similarly important features. The possible values are: primary (1), secondary (2), tertiary (3), quaternary (4), and identity
(I). To ignore:
Almost all characters can be distinguished by the first three strength levels, therefore in most locales the default Strength attribute is set at the tertiary level. However if the Alternate attribute (described in a following row) is set to shifted, then the quaternary strength level can be used to break ties among white space characters, punctuation marks, and symbols that would otherwise be ignored. The identity strength level is used to distinguish among similar characters, such as the MATHEMATICAL BOLD SMALL A character (U+1D41A) and the MATHEMATICAL ITALIC SMALL A character (U+1D44E). Setting the Strength attribute to higher level will slow down text string comparisons and increase the length of the sort keys. Examples:
|
Case Level | E |
|
Setting the Case Level attribute to on and the Strength attribute to primary level will ignore accent but not case. The Case Level attribute is set to X by default in most locales. When this attributes is set to O, it will slightly affect text string comparisons performance and lengthen the sort keys. Examples:
|
Case First | C | X, L, or U |
The Case First attribute controls whether upper case characters collate before or after lower case characters, in the absence of other differences in the two text strings. The possible values are upper case first (U), lower case first (L), and off (X). There is almost no difference between the lower case first setting and the off setting, therefore typically there is no need to use the lower case first setting. Specifying a Case First attribute of U or L can increase the length of the sort keys. Examples:
|
Alternate | A | N or S |
The Alternate attribute controls the handling of variable characters in the UCA: white space, punctuation marks, and symbols. If the Alternate attribute is set to non-ignorable (N), then differences among these variable characters are of the same importance as differences among non-variable characters such as the English alphabet. If the Alternate attribute is set to shifted (S), then these variable characters are of only minor importance. If the Alternate attribute is set to shifted and the Strength attribute is set to the quaternary level, then variable characters are considered in a comparison when all other aspects of the strings - base letters, accents, and case - are identical. The default for most locales is non-ignorable. If shifted is selected, performance will be slower if there are many strings that are identical except for punctuation marks. Sort key length will not be affected unless the strength level is also increased. Examples:
|
Variable Top | T | [4 or 8 UTF-16BE hexadecimal digits] |
The Variable Top attribute controls which characters to ignore, and is only meaningful if the Alternate attribute is set to Shifted. All characters whose primary weight is equal or lower than the specified character are considered ignorable. The character is specified as one or two UTF-16BE code units in hexadecimal notation. A Unicode supplementary character is specified using a surrogate pair. For example, if you want to ignore white space characters and not visible characters, then set the Alternate attribute to Shifted and this attribute to U+0020 (space) or U+3000 (ideographic space). Since all characters having the same primary weight are equivalent, so setting this attribute to U+0020 is equivalent to setting it to U+3000. This attribute alone has little impact on text string comparison performance, but setting it higher makes sort keys longer. Example:
|
Normalization Checking | N |
|
The Normalization Checking attribute, if set to O, will normalize the input text if necessary.
Even if this attribute is set to X, as is the default for many locales, text as represented in
common usage will collate correctly. You should, however, set this attribute to O in two cases:
There is a medium string comparison performance cost if this attribute is set to O, depending on
the frequency of sequences that require normalization. There is no significant effect on length of
the sort keys. If the text is already in normalized form NFD or NFKD, then you can set this
attribute to X to improve performance.
Examples:
|
French | F |
|
The French sorts strings by examining the accents starting from the end of the string. This attribute is automatically set to O for the French locales, and has a minor performance cost for text string comparisons, but no change in the length of the sort keys. Examples:
|
Hiragana | H |
|
The Hiragana attribute determines whether to distinguish between Japanese Hiragana and Katakana characters. To conform with the Japanese JIS X 4061 standard, you need to set this attribute to O and the Strength attribute to the quaternary level. This will, however, slow down text string comparisons and increase the length of the sort keys. Examples:
|
Valid locale names for the collations are shown in Table 2. The Default collation attributes column shows the full name of the collation for the specific locale. For example, CLDR2701_LAR is equivalent to CLDR2701_LAR_AN_CX_EX_FX_HX_NX_S3.
Locale Name | Language (Region) | Default Collation Attributes |
---|---|---|
CLDR2701_LAF | Afrikaans | CLDR2701_LAF_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LAR | Arabic | CLDR2701_LAR_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LAR_KCOMPAT | Arabic | CLDR2701_LAR_KCOMPAT_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LAS | Assamese | CLDR2701_LAS_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LAZ | Azerbaijani | CLDR2701_LAZ_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LAZ_KSEARCH | Azerbaijani | CLDR2701_LAZ_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LBE | Belarusian | CLDR2701_LBE_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LBG | Bulgarian | CLDR2701_LBG_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LBN | Bengali | CLDR2701_LBN_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LBN_KTRADITIONAL | Bengali | CLDR2701_LBN_KTRADITIONAL_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LBS | Bosnian | CLDR2701_LBS_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LBS_KSEARCH | Bosnian | CLDR2701_LBS_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LBS_ZCYRL | Bosnian | CLDR2701_LBS_ZCYRL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LCA_KSEARCH | Catalan | CLDR2701_LCA_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LCS | Czech | CLDR2701_LCS_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LCS_KSEARCH | Czech | CLDR2701_LCS_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LCY | Welsh | CLDR2701_LCY_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LDA | Danish | CLDR2701_LDA_AN_CU_EX_FX_HX_NX_S3 |
CLDR2701_LDA_KSEARCH | Danish | CLDR2701_LDA_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LDE_KSEARCH | German | CLDR2701_LDE_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LDE_KPHONEBOOK | German | CLDR2701_LDE_KPHONEBOOK_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LDE_RAT_KPHONEBOOK | German (Austria) | CLDR2701_LDE_RAT_KPHONEBOOK_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LDSB | Lower Sorbian | CLDR2701_LDSB_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LEE | Ewe | CLDR2701_LEE_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LEL | Greek | CLDR2701_LEL_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LEN_RUS | English (United States) | CLDR2701_LEN_RUS_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LEO | Esperanto | CLDR2701_LEO_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LES | Spanish | CLDR2701_LES_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LES_KSEARCH | Spanish | CLDR2701_LES_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LES_KTRADITIONAL | Spanish | CLDR2701_LES_KTRADITIONAL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LET | Estonian | CLDR2701_LET_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LFA | Persian | CLDR2701_LFA_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LFA_RAF | Persian (Afghanistan) | CLDR2701_LFA_RAF_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LFI | Finnish | CLDR2701_LFI_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LFI_KSEARCH | Finnish | CLDR2701_LFI_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LFI_KTRADITIONAL | Finnish | CLDR2701_LFI_KTRADITIONAL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LFIL | Filipino | CLDR2701_LFIL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LFO | Faroese | CLDR2701_LFO_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LFO_KSEARCH | Faroese | CLDR2701_LFO_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LFR_RCA | French (Canada) | CLDR2701_LFR_RCA_AN_CX_EX_FO_HX_NX_S3 |
CLDR2701_LGL | Galician | CLDR2701_LGL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LGL_KSEARCH | Galician | CLDR2701_LGL_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LGU | Gujarati | CLDR2701_LGU_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LHA | Hausa | CLDR2701_LHA_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LHAW | Hawaiian | CLDR2701_LHAW_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LHE | Hebrew | CLDR2701_LHE_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LHE_KSEARCH | Hebrew | CLDR2701_LHE_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LHI | Hindi | CLDR2701_LHI_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LHR | Croatian | CLDR2701_LHR_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LHR_KSEARCH | Croatian | CLDR2701_LHR_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LHSB | Upper Sorbian | CLDR2701_LHSB_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LHU | Hungarian | CLDR2701_LHU_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LHY | Armenian | CLDR2701_LHY_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LIG | Igbo | CLDR2701_LIG_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LIS | Icelandic | CLDR2701_LIS_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LIS_KSEARCH | Icelandic | CLDR2701_LIS_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LJA | Japanese | CLDR2701_LJA_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LJA_KUNIHAN | Japanese | CLDR2701_LJA_KUNIHAN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LKK | Kazakh | CLDR2701_LKK_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LKL | Kalaallisut | CLDR2701_LKL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LKL_KSEARCH | Kalaallisut | CLDR2701_LKL_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKM | Khmer | CLDR2701_LKM_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKN | Kannada | CLDR2701_LKN_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKN_KTRADITIONAL | Kannada | CLDR2701_LKN_KTRADITIONAL_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKO | Korean | CLDR2701_LKO_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LKO_KSEARCH | Korean | CLDR2701_LKO_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKO_KUNIHAN | Korean | CLDR2701_LKO_KUNIHAN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LKO_KSEARCHJL | Korean | CLDR2701_LKO_KSEARCHJL_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKOK | Konkani | CLDR2701_LKOK_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LKY | Kyrgyz | CLDR2701_LKY_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LLKT | Lakota | CLDR2701_LLKT_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LLN | Lingala | CLDR2701_LLN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LLN_KPHONETIC | Lingala | CLDR2701_LLN_KPHONETIC_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LLT | Lithuanian | CLDR2701_LLT_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LLV | Latvian | CLDR2701_LLV_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LMK | Macedonian | CLDR2701_LMK_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LML | Malayalam | CLDR2701_LML_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LMR | Marathi | CLDR2701_LMR_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LMT | Maltese | CLDR2701_LMT_AN_CU_EX_FX_HX_NX_S3 |
CLDR2701_LMY | Burmese | CLDR2701_LMY_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LNB | Norwegian Bokmål | CLDR2701_LNB_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LNB_KSEARCH | Norwegian Bokmål | CLDR2701_LNB_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LNN | Norwegian Nynorsk | CLDR2701_LNN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LNN_KSEARCH | Norwegian Nynorsk | CLDR2701_LNN_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LOM | Oromo | CLDR2701_LOM_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LOR | Oriya | CLDR2701_LOR_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LPA | Punjabi | CLDR2701_LPA_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LPL | Polish | CLDR2701_LPL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LPS | Pashto | CLDR2701_LPS_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LRO | Romanian | CLDR2701_LRO_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LROOT | Root | CLDR2701_LROOT_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LRU | Russian | CLDR2701_LRU_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSE | Northern Sami | CLDR2701_LSE_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSE_KSEARCH | Northern Sami | CLDR2701_LSE_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LSI | Sinhala | CLDR2701_LSI_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LSI_KDICTIONARY | Sinhala | CLDR2701_LSI_KDICTIONARY_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LSK | Slovak | CLDR2701_LSK_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSK_KSEARCH | Slovak | CLDR2701_LSK_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LSL | Slovenian | CLDR2701_LSL_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSMN | Inari Sami | CLDR2701_LSMN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSMN_KSEARCH | Inari Sami | CLDR2701_LSMN_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LSQ | Albanian | CLDR2701_LSQ_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSR | Serbian | CLDR2701_LSR_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSR_ZLATN | Serbian | CLDR2701_LSR_ZLATN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSR_ZLATN_KSEARCH | Serbian | CLDR2701_LSR_ZLATN_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LSV | Swedish | CLDR2701_LSV_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LSV_KSEARCH | Swedish | CLDR2701_LSV_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LTA | Tamil | CLDR2701_LTA_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LTE | Telugu | CLDR2701_LTE_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LTH | Thai | CLDR2701_LTH_AS_CX_EX_FX_HX_NO_S3 |
CLDR2701_LTO | Tongan | CLDR2701_LTO_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LTR | Turkish | CLDR2701_LTR_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LTR_KSEARCH | Turkish | CLDR2701_LTR_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LUG | Uyghur | CLDR2701_LUG_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LUK | Ukrainian | CLDR2701_LUK_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LUR | Urdu | CLDR2701_LUR_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LVI | Vietnamese | CLDR2701_LVI_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LVI_KTRADITIONAL | Vietnamese | CLDR2701_LVI_KTRADITIONAL_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LYI | Yiddish | CLDR2701_LYI_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LYI_KSEARCH | Yiddish | CLDR2701_LYI_KSEARCH_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LYO | Yoruba | CLDR2701_LYO_AN_CX_EX_FX_HX_NO_S3 |
CLDR2701_LZH | Chinese | CLDR2701_LZH_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LZH_KUNIHAN | Chinese | CLDR2701_LZH_KUNIHAN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LZH_KBIG5HAN | Chinese | CLDR2701_LZH_KBIG5HAN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LZH_KGB2312HAN | Chinese | CLDR2701_LZH_KGB2312HAN_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LZH_KSTROKE | Chinese | CLDR2701_LZH_KSTROKE_AN_CX_EX_FX_HX_NX_S3 |
CLDR2701_LZH_KZHUYIN | Chinese | CLDR2701_LZH_KZHUYIN_AN_CX_EX_FX_HX_NX_S3 |
Details of the Unicode Collation Algorithm can be found in the Unicode Technical Standard #10, available at the Unicode Consortium website at http://www.unicode.org.
Collations introduced before version 10.1
The UCA400R1 collations introduced in Db2® Version 9.1 Fix Pack 3 are only supported in the COLLATION_KEY_BIT function. These collations are not supported when creating databases.
The UCA500R1 collations introduced in Db2 Version 9.5 Fix Pack 1 are supported when creating databases and in the COLLATION_KEY_BIT function.
For specific details about collations introduced before the current release, refer to the documentation for the previous release.
Collation keyword | Complies with Unicode Standard version | Complies with CLDR version |
---|---|---|
UCA400R1 | 4.0 | 1.2 |
UCA500R1 | 5.0 | 1.5.1 |
CLDR181 | 5.2 | 1.8.1 |
CLDR2701 | 7.0 | 2.7.0.1 |