Stemmed search

Stemmed searches are a good way to search for words with the same word stem and regular endings.

Restriction: Stemmed searches are supported only in English.

Searching for the stemmed form of a term means reducing the term to its word stem and then searching on the word stem (also known as the base word). For example, searching for the word grows as a stemmed search returns content with the words grow, grows, and growing, but not growth, grown, or grew.

Stemmed search applies to all single terms in the search with the following exceptions:
  • Terms marked for fuzzy search
  • Terms that contain wildcard characters
  • Phrases (text surrounded by double quotation marks)
  • Same-sentence searches

For example, if you specify election OR nomination OR president~ OR hold* OR (King Lear) WITHIN SENTENCE as the search terms and then elect to perform a stemmed search, the stemmed search will apply only to the terms election and nomination.

Tip: To find the most related words, do both a stemmed and a fuzzy search. For example, a stemmed search for grow returns content that contains grow, grows and growing, while a fuzzy search for grow returns content that contains grew, grown, and growth, plus some other words unrelated in meaning.

Support for stemmed searches is provided by DB2® Net Search Extender. For complete information about stemmed searches, see the Net Search Extender Administration and User's Guide.

On the IBM® FileNet® P8 platform, lemmatization is used instead of stemming for searching across content. Lemmatization is the algorithmic determination of a word's lemma (the base part of a word). The main difference between lemmatization and stemming is that lemmatization operates not just on a word, but also its context. For example:
  • The word "better" has "good" as its lemma. This link is missed by stemming, because it requires a dictionary look-up.
  • The word "walk" is the base form for word "walking", and so "walk" is matched in both stemming and lemmatization.
  • The word "versioning" can be either the base form of a noun or a form of a verb (meaning to version) depending on the context. Lemmatization can determine the correct lemma for "versioning" based on context. For example, in the sentence, "The versioning support in this product is fantastic," the lemmatization algorithm selects the noun form of "versioning" and identify the lemma as "versioning" which is the original search token.
Restriction: Lemmatization is not used in IBM FileNet P8 environments with IBM Legacy Content Search Engine.