Advanced search tips

You can change the way that content is searched by using a fuzzy search, same-sentence search, or stemmed search. Advanced searching is relevant only for text index searches. Text index searches are performed on content that has a full-text index. Ask your eDiscovery administrator whether you can use advanced search techniques on the content that you are searching.

Restriction: Advanced searching is not supported for content that is archived with IBM® FileNet® Email Manager and that is stored in an IBM FileNet P8 server.

Fuzzy search

A fuzzy search returns words that are spelled in a similar way to the search term. The words might or might not be related to each other. Fuzzy searches are especially useful when you have content that might contain misspelled words.

A fuzzy search takes the following form:

Term~n

where Term is a search word and n is a similarity value that is greater than 0.0 and less than 1.0.

Examples of fuzzy searches:
Lear~0.7
Searches with a similarity value of 0.7
Lear~0.5
Searches with a similarity value of 0.5
King AND Lear~0.5
Searches for exact matches to King and fuzzy matches to Lear
Lear~0.5 NOT lean
Searches for a fuzzy match to Lear but does not return matches for the word lean, which might be a fuzzy match to Lear
Note: Due to Verity syntax limitations, the similarity value is not supported in IBM FileNet P8 environments with Content Search Engine. If you specify a similarity value, it is ignored; it has no effect on the search.

Same-sentence search

Same-sentence searches, also known as proximity searches, are useful when you believe that two words might not always occur in the same order, but usually occur within the same sentence.

Same-sentence searching is not supported inFileNet P8 environments with IBM Content Search Services. Attempts to perform same-sentence searches in this environment return no results.

For example, if you did a same-sentence search for King Lear, any content that contained either of the following sentences would be returned:
King Lear was the most tragic character in all of Shakespeare's plays.
Lear is the most tragic king of all of Shakespeare's characters.
Express a same-sentence search as:
(Term1 Term2) WITHIN SENTENCE
where Term1 and Term2 are the two words that you want to appear in the same sentence. You can specify only two terms in a same-sentence search. The two terms must be inside parentheses and must be followed by WITHIN SENTENCE. For example, if you want the terms King and Lear to occur in the same sentence, enter:
("King" "Lear") WITHIN SENTENCE
If you want the terms Cordelia and King Lear to occur in the same sentence, enter:
("Cordelia" "King Lear") WITHIN SENTENCE
You can also combine same-sentence search terms with other search terms, for example:
(("King" "Louis") WITHIN SENTENCE) NOT nomination
A same-sentence search is performed on (King Louis) WITHIN SENTENCE and content is returned that contains the words king and louis in the same sentence but does not contain the word nomination.

Support for same-sentence search is provided by DB2® Net Search Extender, where this feature is sometimes also called proximity search. For more information about how DB2 Net Search Extender defines the end of a sentence, see the Paragraphs section of the Tokenization topic.

Same-sentence searching is not supported for content that is archived with IBM FileNet Email Manager and that is stored in an IBM FileNet P8 server.

Stemmed search

Stemmed searches are a good way to search for words with the same word stem and regular endings.

Restriction: Stemmed searches are supported only in English.

Searching for the stemmed form of a term means reducing the term to its word stem and then searching on the word stem (also known as the base word). For example, searching for the word grows as a stemmed search returns content with the words grow, grows, and growing, but not growth, grown, or grew.

Stemmed search applies to all single terms in the search with the following exceptions:
  • Terms marked for fuzzy search
  • Terms that contain wildcard characters
  • Phrases (text surrounded by double quotation marks)
  • Same-sentence searches

For example, if you specify election OR nomination OR president~ OR hold* OR (King Lear) WITHIN SENTENCE as the search terms and then elect to perform a stemmed search, the stemmed search will apply only to the terms election and nomination.

Tip: To find the most related words, do both a stemmed and a fuzzy search. For example, a stemmed search for grow returns content that contains grow, grows and growing, while a fuzzy search for grow returns content that contains grew, grown, and growth, plus some other words unrelated in meaning.

Support for stemmed searches is provided by DB2 Net Search Extender. For complete information about stemmed searches, see the Net Search Extender Administration and User's Guide.

In IBM FileNet P8 environments that use IBM Content Search Services, lemmatisation is used instead of stemming for searching across content. Lemmatisation is the algorithmic determination of a word's lemma (the base part of a word). The main difference between lemmatisation and stemming is that lemmatisation operates not just on a word, but also its context. For example:
  • The word "better" has "good" as its lemma. This link is missed by stemming, because it requires a dictionary look-up.
  • The word "walk" is the base form for word "walking", and so "walk" is matched in both stemming and lemmatisation.
  • The word "versioning" can be either the base form of a noun or a form of a verb (meaning to version) depending on the context. Lemmatisation can determine the correct lemma for "versioning" based on context. For example, in the sentence, "The versioning support in this product is fantastic," the lemmatisation algorithm would select the noun form of "versioning" and identify the lemma as "versioning" which is the original search token.

Search across a range of values in integer fields

The following examples show how queries can be constructed:
TIEFLAG: 	10000 

TIEFLAG: 	=10000 

TIEFLAG: 	<10000 

TIEFLAG: 	<>10000 

TIEFLAG: 	>=10000 AND <=20000  

TIEFLAG: 	>=10000 AND <=20000 OR =15000 

TIEFLAG: 	!=5000 AND (>20000 OR <10000) AND !=25000 
For example:
relational_operator integer [ boolean_operator relational_operator integer] [ boolean_operator relational_operator integer] ...
where:

relational_operator can be >, <, >=, <=, =, != or <>

boolean_operator can either be AND or OR

The implicit order of operator precedence is AND, followed by OR. Parentheses can be used to override the implicit order.