Submitting Wildcard/Regex Queries in XML

Regular expression queries are parsed using the standard form, and are represented as a term element that contains an operator element. The operator element signals that the term is a regular expression, and identifies the beginning and ending delimiters for the query term that should undergo term expansion using single or multiple character wildcards or regular expressions. For example, m/viv.*mo/ is parsed as:

<term field="query" str="viv.*mo">
  <operator logic="regex" start-string="m/" end-string="/" />
</term>

The start-string and end-string attributes are artifacts of query processing and are not required to submit an XML query unless you want to use non-standard delimiters. The minimal XML is:

<term field="query" str="viv.*mo">
  <operator logic="regex" />
</term>

Wildcard queries are similarly represented as a term element that contains one or two operators elements identifying the wildcard (zero or more characters) and the wildchar (exactly 1 character) characters. If the standard wildchar operator is added to the input form, then the query vivis?mo* would be parsed as:

<term field="query" str="vivis?mo*">
  <operator logic="wildcard" char="*"/>
  <operator logic="wildchar" char="?"/>
</term>

Unlike the regex example, the char attribute is significant. It is used at search time to interpret the term. Using the XML representation of the query is the most common escaping mechanism for wildchar characters. For example, to find all words that contain * (assuming that * was being indexed as a word character) you would need to use the following term:

<term field="query" str="%*%">
  <operator logic="wildcard" char="%"/>
</term>

This element defines % to be the wildcard character instead of the standard * character. You could also modify the syntax or form that you are using in your source if you always want to use the % character as the wildcard character, but global changes to wildcard characters are not recommended.