normalize-unicode function

The fn:normalize-unicode function performs Unicode normalization on a string.

Syntax

Read syntax diagramSkip visual syntax diagramfn:normalize-unicode( source-string, normalization-type)
source-string
A value on which Unicode normalization is to be performed.

source-string is an xs:string value or the empty sequence.

normalization-type
An xs:string value that indicates the type of Unicode normalization that is to be performed. Possible values are:
NFC
Unicode Normalization Form C. If normalization-type, is not specified, NFC normalization is performed.
NFD
Unicode Normalization Form D.
NFKC
Unicode Normalization Form KC.
NFKD
Unicode Normalization Form KD.
If a zero-length string is specified, then no normalization is performed.

Returned value

If source-string is not the empty sequence, the returned value is the xs:string value that results when Unicode normalization that is specified by normalization-type is performed on source-string. If normalization-type is not specified, Unicode Normalization Form C (NFC) is performed on source-string. Unicode normalization is described in Character Model for the World Wide Web 1.0.

If source-string is the empty sequence, a string of length 0 is returned.

Examples

The following function performs Unicode Normalization Form C on the string "ṃ" (a Latin lowercase letter m with a dot below):
fn:normalize-unicode("ṃ","NFC")

The returned value is the UTF-8 character represented by the numeric character reference &x1e43;, a Latin lowercase letter m with a dot below.

The following example converts the normalized Unicode to the decimal codepoint:
fn:string-to-codepoints(fn:normalize-unicode("ṃ", "NFC"))

The returned value is 7747.