Automatically Removing Stopwords from End-User Queries

Note: Pre-defined stopword lists are only available for English and Spanish.

Stopwords are function words, such as the, a and from, that provide little actual content, and thus can be safely discarded to allow Watson™ Explorer Engine to focus on the content words of end-user queries.

For example, with query stopword removal enabled, given the query a bridge above the Allegheny river will automatically be translated into bridge above Allegheny river.

You can now easily configure your Watson Explorer Engine project to automatically remove stopwords from end-user queries by going to the project's Simple tab and setting Enable query stopword removal to true on the Language section. If English or Spanish is specified as a main language or as one of the secondary languages, setting language.stopword-removal-enabled to true will remove common stopwords from the query string, unless double quotes are used.

To preserve a specific stopword in the query, end users can add the MUST operator (+) before the stopwords that they don't want to be removed. For example, if end users want the search engine to keep ‘the’ for their query rage against the machine, they would type rage against +the machine. This makes for a good tip to add to your application's advanced search link.

The list of query stopwords can be found in the installation directory's /data/key-match subdirectory, in the files english-stopwords.xml and spanish-stopwords.xml.

For information about stopwords used to improve clustering quality, see the Clustering Stopwords section.