Multibyte and wide character string collation subroutines

Strings can be compared in the following ways:

  • Using the ordinal (binary) values of the characters.
  • Using the weights associated with the characters for each locale, as determined by the LC_COLLATE category.

Multicultural support uses the second method.

Collation is a locale-specific property of characters. A weight is assigned to each character to indicate its relative order for sorting. A character may be assigned more than one weight. Weights are prioritized as primary, secondary, tertiary, and so forth. The maximum number of weights assigned each character is system-defined.

A process inherits the C locale or POSIX locale at its startup time. When the setlocale (LC_ALL, " ") subroutine is called, a process obtains its locale based on the LC_* and LANG environment variables. The following subroutines are affected by the LC_COLLATE category and determine how two strings will be sorted in any given locale.

Note: Collation-based string comparisons take a long time because of the processing involved in obtaining the collation values. Perform such comparisons only when necessary. If you need to determine whether two wide character strings are equal, do not use the wcscoll and wcsxfrm subroutines; use the wcscmp subroutine instead.

The following subroutines compare multibyte character strings:

strcoll
Compares the collation weights of multibyte character strings.
strxfrm
Converts a multibyte character string to values representing character collation weights.

The following subroutines compare wide character strings:

wcscoll
Compares the collation weights of wide character strings.
wcsxfrm
Converts a wide character string to values representing character collation weights.