Watson Explorer Engine-Related Query Syntax

The table in the previous section shows standard keywords that are supported in Watson Explorer Engine queries. This table shows additional keywords that are more specific to the search and indexing operations performed by the Watson Explorer Engine search engine. This search engine manipulates ranges of text (more accurately, non-nested ranges of text). For example A AND B corresponds to all the shortest ranges of text containing both A and B, A BEFORE B to all the shortest ranges starting with A and ending with B. For more information on the underlying technology used by the Watson Explorer Engine search engine, you can read the following papers from the University of Waterloo: mathematical structures and data structures.

Unlike the standard operators, the following operators can only be truly interpreted in the context of manipulations of text ranges. These operators are usually destined to advanced users and/or to the administrator for testing the content of a collection.

Table 1. Watson Explorer Engine-Related Query Syntax
Keyword Definition Canonical XML Representation
CONTAINING

Selects the ranges specified by an expression that contain some of the ranges specified by a subsequent expression. For example (document THRU clustering) CONTAINING enterprise corresponds to minimal ranges starting with document and ending with clustering containing enterprise, CONTENT title CONTAINING enterprise corresponds to title contents containing enterprise.

Because Watson Explorer Engine operates on non-nested (i.e., minimal) ranges of text (document THRU clustering) CONTAINING enterprise is not the same thing as document THRU enterprise THRU clustering. A sequence such as "document clustering as done in enterprise search software provides an excellent example of clustering" would only be matched by the second expression, not the first one. In the first query, the initial phrase, "document THRU clustering" would match the text range "document clustering," not the larger text range in the expression. In the second query, "document THRU enterprise" would match the text range "document clustering as done in enterprise," which would then be expanded by "THRU clustering" to include the entire expression.

<operator logic="containing">
  ...
</operator>
CONTENT Selects the ranges of text corresponding to a specific content. For example (document AND clustering) WITHIN CONTENT title would find instances of the terms document and clustering in the content title. Unlike the field: operator, there is no "field" mapping involved with this operator, it matches the original content name as it is indexed.
<operator logic="content">
  <term str="title" field="query"/>
</operator>
NOTCONTAINING The opposite of CONTAINING, selects the ranges specified by an expression that contain some of the ranges specified by a subsequent expression.
<operator logic="not-containing">
  ...
</operator>
NOTWITHIN The opposite of WITHIN, the specified terms or expressions must not be matched within the ranges specified by a subsequent expression.
<operator logic="not-within">
  ...
</operator>
WITHIN The specified terms or expressions must be matched within the ranges specified by a subsequent expression. For example, document THRU clustering WITHIN 4 WORDS says find all the sequences starting with document and ending with clustering that fit within a range of 4 words.
<operator logic="within">
  ...
</operator>
N WORDS

A unary operator that takes an integer argument, and returns all the sequences of words of the specified length. For example, 5 WORDS THRU clustering CONTAINING enterprise would return all sequences of 5 words that end with the word clustering and which contain the word enterprise.

This operator shows the richness of the Watson Explorer Engine syntax. It allows to add a proximity constrain to Any expression, for example (document THRU (clustering OR cluster) AND vivisimo) WITHIN 15 WORDS.

<operator logic="words">
  <term str="4" field="query"/>
</operator>