normalize-unicode function
The fn:normalize-unicode function performs Unicode normalization on a string.
Syntax
- source-string
- A value on which Unicode normalization is to be performed.
source-string is an xs:string value or the empty sequence.
- normalization-type
- An xs:string value that indicates the type of Unicode normalization
that is to be performed. Possible values are:
- NFC
- Unicode Normalization Form C. If normalization-type, is not specified, NFC normalization is performed.
- NFD
- Unicode Normalization Form D.
- NFKC
- Unicode Normalization Form KC.
- NFKD
- Unicode Normalization Form KD.
Returned value
If source-string is not the empty sequence, the returned value is the xs:string value that results when Unicode normalization that is specified by normalization-type is performed on source-string. If normalization-type is not specified, Unicode Normalization Form C (NFC) is performed on source-string. Unicode normalization is described in Character Model for the World Wide Web 1.0.
If source-string is the empty sequence, a string of length 0 is returned.
Examples
fn:normalize-unicode("ṃ","NFC")
The returned value is the UTF-8 character represented by the numeric character reference &x1e43;, a Latin lowercase letter m with a dot below.
fn:string-to-codepoints(fn:normalize-unicode("ṃ", "NFC"))
The returned value is 7747.