Special characters in searches
With IBM® Text Search for Db2 for z/OS® you can index and search special characters. When you include a special character in a search, the special character is handled like any other term in the query.
When a special character is adjacent to a term in a query, documents that contain the special character and term in the same order are returned. For example, searching for “30$” finds documents that contain “30$”, but does not find documents that contain “$30”. However, searching for “30 $” (with a separating space) finds all documents that contain “30” and “$”, including the documents that contain either “30$” or “$30”.
When a special character separates two terms, the terms are searched for as a sequence. For example, searching for “jack_jones” finds documents that contain “jack_jones” but not documents that contain “jack_and_jones”.
Terms that are adjacent to special characters are lemmatized. For example, searching for “cats&dogs” in English finds documents that contain “cat&dog”. Also, when a term is adjacent to a special character in a query, the term is not removed from the query. For example, searching for “at&t” does not omit the term “at”. However, searching for “at & t” (separated by spaces) omits the term “at”.
You can use special characters in wildcard search expressions. For example, searching for “ja*_” finds documents that contain “jack_jones”. However, you cannot use wildcard characters to find special characters. For example, searching for “ca*s” finds documents that contain “cats”, “categories”, or “cas”, but not documents that contain “ca_s”.
Escaping special characters
- To search for the string “where?”, escape the question mark as follows: “where\?”
- To search for the string “c:\temp,” escape the colon and backslash as follows: “c\:\\temp”
| Special character | Behavior when not escaped |
|---|---|
| Ampersand (&) | |
| Asterisk (*) | Used as a wildcard character. |
| At sign (@) | A syntax error is generated when an at sign
is the first character of a query. In xmlxp expressions,
the at sign is used to refer to an attribute. |
| Brackets [ ] | Used in xmlxp expressions to
search the contents of elements and attributes. |
| Braces { } | Generates a syntax error. |
| Backslash (\) | |
| Caret (^) | Used for weighting (boosting) terms. |
| Colon (:) | Used to search in the contents of fields. |
| Equal sign (=) | Generates a syntax error. |
| Exclamation point (!) | A syntax error is returned when an exclamation point is the first character of a query. |
| Forward slash (/) | In xmlxp expressions, a forward
slash is used as an element path separator. |
| Greater than symbol (>) Less than symbol (<) |
Used in xmlxp expressions to
compare the value of an attribute. Otherwise, these characters generate
syntax errors. |
| Minus sign (-) | When a minus sign is the first character of a term, only documents that do not contain the term are returned. |
| Parentheses ( ) | Used for grouping. |
| Percent sign (%) | Specifies that a search term is optional. |
| Plus sign (+) | |
| Question mark (?) | Handled as a wildcard character. |
| Semicolon (;) | |
| Single quotation mark (‘) | Single quotation marks are used to contain xmlxp expressions. |
| Tilda (~) | Handled as proximity and fuzzy search operators. |
| Vertical bar (|) |
Escaping special characters that do not serve a special function in the query syntax is optional. The following table shows some examples of special characters that do not require escaping.
| Special character | Notes |
|---|---|
| Comma (,) | |
| Dollar sign ($) | |
| Percentage (%) | |
| Period (.) | In xmlxp expressions, a period
is used to search the content of elements. |
| Pound sign (#) | |
| Underscore (_) |