fn:tokenize function

The fn:tokenize function breaks a string into a sequence of substrings.

Syntax

Read syntax diagramSkip visual syntax diagram fn:tokenize( source-string,pattern ,flags )
source-string
A string that is to be broken into a sequence of substrings.

source-string is a literal string, or an XQuery expression that resolves to an xs:string value or the empty sequence.

pattern
The delimiter between substrings in source-string.

pattern is a string literal that contains a regular expression. A regular expression is a set of characters, pattern-matching characters, and operators that define a string or group of strings in a search pattern.

flags
A string literal that can contain any of the following values that control matching of pattern to source-string:
s
Indicates that the dot (.) matches any character.

If the s flag is not specified, the dot (.) matches any character except the new line character (#x0A).

m
Indicates that the caret (^) matches the start of any line (the position after a new line character), and the dollar sign ($) matches the end of any line (the position before a new line character).

If the m flag is not specified, the caret (^) matches the start of the entire string, and the dollar sign ($) matches the end of the entire string.

i
Indicates that matching is case-insensitive for the letters "a" through "z" and "A" through "Z".

If the i flag is not specified, case-sensitive matching is done.

x
Indicates that whitespace characters (#x09, #x0A, #x0D, and #x20) within pattern are ignored, unless they are within a character class. Whitespace characters in a character class are never ignored.

If the x flag is not specified, whitespace characters are used for matching.

Returned value

If source-string is not the empty sequence or a zero-length string, the returned value is a sequence of xs:string values that results when the following operations are performed on source-string:
  • source-string is searched for characters that match pattern.
  • If pattern contains two or more alternative sets of characters, and the alternative sets of characters match characters that start at the same position in source-string, the first set of characters in pattern that matches characters in source-string is considered to match pattern.
  • Each set of characters that does not match pattern becomes an item in the result sequence.
  • If pattern matches characters at the beginning of source-string, the first item in the returned sequence is a string of length 0.
  • If two successive matches for pattern are found within source-string, a string of length 0 is added to the sequence.
  • If pattern matches characters at the end of source-string, the last item in the returned sequence is a string of length 0.

If pattern is not found in source-string, source-string is returned.

If pattern matches a string of length zero, an error is returned.

If source-string is the empty sequence, or is a zero-length string, the result is the empty sequence.

Example

The following function creates a sequence from the string "?A?B?C?D?" by removing the question mark (?) characters and creating a sequence from the remaining characters.
fn:tokenize("?A?B?C?D?","\?")

The returned value is the sequence ("", "A", "B", "C", "D", "").