Wildcard and Regular Expression Support in Watson Explorer Engine Queries
Search platforms such as Watson™ Explorer Engine offer language-specific techniques that can analyze and expand your queries so that they can match a wider range of relevant results. These techniques include depluralizing (removing suffixes that differentiate between singular and plural terms), delanguaging (which normalizes Japanese writing systems and removed language-specific diacritical marks), and stemming (the ability to locate search results that match the same base word as the terms in your queries).
In addition to using these types of linguistic and character analysis to produce a wider range of relevant search results, Watson Explorer Engine provides additional flexibility by enabling you to search for terms by specifying a pattern to match, rather than simply a term. These techniques, generally referred to as term expansion, are the following:
- wildcards, which are special characters (the question mark (?) and asterisk or star (*) characters) that can be used to represent any single or multi-character sequence. The '?' matches any single character, while the '*' matches any sequence of zero or more characters.
- regular expressions, which provide a more powerful mechanism for pattern matching by enabling you to restrict pattern matches to specific character values, specific ranges and numbers of characters, specific character positions within a term, and so on. Watson Explorer Engine supports two standard forms of regular expressions:
When using regular expressions in a Watson Explorer Engine search, you must enclose terms containing them within the default regex operators 'm/' and '/' to identify the beginning and end of the regular expression. Watson Explorer Engine uses these operators to differentiate between wildcard terms and regular expressions. For example, the term 'too?' in the regular expression Watson Explorer Engine query 'm/too?/' will match the words 'to', 'too', and the substring 'too' in words such as 'tool', 'tooth', and 'campstool', while the wildcard query 'too?' will only match words such as 'tool' and 'toot'.