A number of issues regarding collating must be considered
when you use InfoSphere® DataStage® with
National Language Support (NLS) mode enabled.
Collating
is a complex issue for many languages. It is not sufficient to collate
a character set in numerical order of its Unicode values. Locales
that share a character set often have different collating rules. For
example, these are the main issues that affect collating in Western
European languages:
- Accented characters. Should accented characters
come before or after their unaccented equivalents? Or should accents
only be examined if two strings being compared would otherwise be
identical (that is, as a tie breaker)?
- Expanding characters. Some languages treat certain
single characters as two separate characters for collating purposes.
- Contracting characters. Some languages have pairs
of characters that collate as though they were a single character.
- Should case be considered? Should case be used as
a tie breaker for otherwise identical strings? If so, which comes
first, uppercase or lowercase?
- Should hyphens or other punctuation be considered
as tie breakers?