tokenize function
The fn:tokenize function breaks a string into a sequence of substrings.
Syntax
- source-string
- A string that is to be broken into a sequence of substrings.
source-string is an xs:string value or the empty sequence.
- pattern
- The delimiter between substrings in source-string.
pattern is an xs:string value that contains a regular expression. A regular expression is a set of characters, wildcards, and operators that define a string or group of strings in a search pattern.
- flags
- An xs:string value that can contain any of the following values
that control how pattern is matched to characters in source-string:
- s
- Indicates that the dot (.) in the regular expression matches any
character, including the new-line character (X'0A').
If the s flag is not specified, the dot (.) matches any character except the new-line character (X'0A').
- m
- Indicates that the caret (‸) matches the start of a line (the
position after a new-line character), and the dollar sign ($) matches
the end of a line (the position before a new-line character).
If the m flag is not specified, the caret (‸) matches the start of a string, and the dollar sign ($) matches the end of the string.
- i
- Indicates that matching is case-insensitive.
If the i flag is not specified, case-sensitive matching is done.
- x
- Indicates that whitespace characters within pattern are
ignored.
If the x flag is not specified, whitespace characters are used for matching.
- Limitation of length
The length of source-string and pattern is limited to 32000 bytes.
Returned value
- source-string is searched for characters that match pattern.
- If pattern contains two or more alternative sets of characters, the first set of characters in pattern that matches characters in source-string is considered to be the matching pattern.
- Each set of characters that does not match pattern becomes an item in the result sequence.
- If pattern matches characters at the beginning of source-string, the first item in the returned sequence is a string of length 0.
- If two successive matches for pattern are found within source-string, a string of length 0 is added to the sequence.
- If pattern matches characters at the end of source-string, the last item in the returned sequence is a string of length 0.
If pattern is not found in source-string, an error is returned.
If source-string is the empty sequence, or is the zero-length string, the result is the empty sequence.
Example
fn:tokenize("Tokenize this sentence, please.", "\s+")
The returned value is the sequence ("Tokenize", "this", "sentence,", "please.").