normalize-unicode function

The fn:normalize-unicode function performs Unicode normalization on a string.

Syntax

source-string

A value on which Unicode normalization is to be performed.

source-string is an xs:string value or the empty sequence.

normalization-type

An xs:string value that indicates the type of Unicode normalization that is to be performed. Possible values are:

NFC: Unicode Normalization Form C. If normalization-type, is not specified, NFC normalization is performed.
NFD: Unicode Normalization Form D.
NFKC: Unicode Normalization Form KC.
NFKD: Unicode Normalization Form KD.

If a zero-length string is specified, then no normalization is performed.

Returned value

If source-string is not the empty sequence, the returned value is the xs:string value that results when Unicode normalization that is specified by normalization-type is performed on source-string. If normalization-type is not specified, Unicode Normalization Form C (NFC) is performed on source-string. Unicode normalization is described in Character Model for the World Wide Web 1.0.

If source-string is the empty sequence, a string of length 0 is returned.

Examples

The following function performs Unicode Normalization Form C on the string "ṃ" (a Latin lowercase letter m with a dot below):

fn:normalize-unicode("&#x6d;&#x323;","NFC")

The returned value is the UTF-8 character represented by the numeric character reference &x1e43;, a Latin lowercase letter m with a dot below.

The following example converts the normalized Unicode to the decimal codepoint:

fn:string-to-codepoints(fn:normalize-unicode("&#x6d;&#x323;", "NFC"))

The returned value is 7747.