Locale-sensitive UCA-based collation
Locale-sensitive collations are based on the full Unicode Collation Algorithm (UCA) specification and provide full cultural correctness.
Strings are ordered according to the Unicode Collation Algorithm. The collation can be tailored to account for features such as language or case and accent insensitivity. For more information about UCA, see Unicode Collation Algorithm-based collations.
This algorithm uses multiple weights per character as well as extra processing to handle special cases such as contractions and combining accents. The complexity of the algorithm adds significantly more processing time.
Substring matching is done using the collation. Substrings are matched in a linguistically meaningful manner.
- Advantages
- Full support of the UCA, including contractions and combining accents.
- Provides support for case and accent insensitive collations.
- Handles all Unicode code points.
- Allows collations to be tailored to suit different languages.
- Same order for character and graphic types.
- Substring matching is done using the collation.
- Disadvantages
- Substantial performance penalty.
Locale-sensitive UCA-based collations are suitable when fully linguistic ordering is needed and the extra performance time required can be tolerated.
Examples
The database with the locale-sensitive collation
was created using the following command: CREATE DATABASE TESTDB COLLATE
USING CLDR181_LCS
.
Sorting:
SELECT WORD FROM TESTDATA ORDER BY WORD
WORD
----------
cena
čas
c◌̌as
Čech
C◌̌ech
hlava
holub
chleb
Jana
jaro
Jaroslav
- The result is linguistically correct.
- Case and accent differences are treated as less significant than the base character.
- Combining accents are equal to the equivalent accented character.
- The word chleb is correctly ordered after the word holub.
Substring matching:
SELECT WORD FROM TESTDATA WHERE WORD LIKE 'c%'
WORD
----------
cena
- Neither c◌̌as nor chleb are selected, since linguistically they do not start with the letter c.