Search parameters
This topic describes the different types of parameters when searching including a description of the parameters.
Parameters
- RESULT LIMIT number
- A keyword specifying the maximum number of results to be returned
by the full-text search.
The RESULT LIMIT should be used together with the SCORE function to ensure that the returned results are scored and only the best matching results are processed.
- EXPANSION LIMIT number
- A keyword specifying the maximum number of terms that a wildcard term can be expanded to for searching. For example, to determine how many times you can expand the search term 'a*'. If your index is very large and you are using many wildcard terms, you must adjust the value of this keyword if you want to obtain a larger result set. The expansion order depends on the internal organization of the text index and cannot be predetermined. If your wildcard expression is too general, and can be expanded into more search terms than specified by 'EXPANSION LIMIT', the search returns with an error, indicating that the search result has been truncated due to this limit exhaustion.
- STOP SEARCH AFTER number DOCUMENT | DOCUMENTS
- A keyword specifying the search threshold. The search is stopped
when the given number of documents is reached during the search, and
an intermediate result is returned. A lower value will increase the
search performance, but may lead to fewer results and omit documents
with a potentially high rank.
Note that there is no default value and the number value must be a positive integer.
- boolean-search-expression
- The search-terms and search-factors can be combined using the
boolean operators NOT, AND, OR, ACCUM, and MINUS according to the
syntax diagrams. The operators have the following precedence order
(with the strongest first): NOT> MINUS = ACCUM = AND > OR.
This can be seen in the following example:
is evaluated as:"Pilot" MINUS "passenger" & "vehicle" | "transport" & "public"
The operator ACCUM evaluates to true, if one of the boolean arguments evaluates to true (which is comparable to the OR operator). The rank value is computed by accumulating rank values from both operands. The ACCUM operator has the same binding (precedence) as AND. The operator MINUS evaluates to true, if the left operand evaluates to true. The rank value is computed by taking the rank value for the left operand and subtracting a penalty, if the right operand evaluates to true.(("Pilot" MINUS "passenger") & ("vehicle")) | ("transport" & "public") - search-primary
- A search-primary, consisting of a text-literal-list evaluates to true, if any of the text-literals is found in the (specified section of the) document. A search-primary consisting of a thesaurus-invocation evaluates to true, if any of the expanded text-literals is found in the (specified section of the) document.
- SECTION | SECTIONS section-name
-
A keyword specifying one or more sections in a structured document that the search is to be restricted to. The section name must be specified in a model file specified at index creation time or be expressed in XPath notation.
Section names are case sensitive. Ensure that the case of the section name in the model file and query is identical.
This model describes the structure of documents that contain identifiable sections, so that the content of these sections can be individually searched. Section names cannot be masked using masking characters. The positive-search-factor using the SECTION clause evaluates to true, if the search primary is found in one of the specified sections.
Section names are not valid XPath expressions that are evaluated during query execution. If no model file is used, the default section names are phrased in XPath notation. The absolute path expression to the element (such as
/father/child/grandchild) is used as the name for identifying the section. Full XPath expressions are not supported as section names. - context-argument IN SAME context-unit AS context-argument AND context-argument ...
- This condition lets you search for a combination of text-literals
occurring in the same paragraph or same sentence. Context arguments
are always equivalent to text-literal-lists, and thesaurus expansion
may be used to expand a text-literal to such a list. The condition evaluates to true, if there is a context-unit (paragraph or sentence) in the document, which contains at least one of the text-literals of each expanded context-argument. This can be seen in the following example:
Assuming e1, e2 are synonyms of e, the following paragraphs would match:("a","b") IN SAME PARAGRAPH AS ("c","d") AND THESAURUS "t1" EXPAND SYNONYM TERM OF "e".".. a c e .." , ".. a c e1..", "a c e2..", ".. a d e .." , ".. a d e1..", "a d e2..", ".. b c e .." , ".. b c e1..", "b c e2..", ".. b d e .." , ".. b d e1..", "b d e2..". - PRECISE FORM OF
- A keyword that causes the word (or each word in the phrase) following PRECISE
FORM OF to be searched for exactly as typed. This form of
search is case-sensitive; that is, the use of upper- and lowercase
letters is significant. For example, if you search for
mice, you do not find "Mouse".This parameter requires that the index configuration parameter Respect case is set to yes. This configuration setting cannot be changed after the index has been built.
- STEMMED FORM OF
- A keyword that causes the word (or each word in the phrase) following STEMMED
FORM OF to be reduced to its word stem before the search
is carried out. This form of search is not case-sensitive. For example,
if you search for
mouse, you find "Mouse".The way in which words are reduced to their stem form is language-dependent. Currently, only English stemming is supported and the word must follow regular inflection endings.
- FUZZY FORM OF
- A keyword for making a "fuzzy" search, which is a search for terms
that have a similar spelling to the search term. This is particularly
useful when searching in documents that were created by an Optical
Character Recognition (OCR) program. Such documents often include
misspelled words. For example, the word
economycould be recognized by an OCR program aseconony. Note that successful matches are only returned for words in a document where the first three characters match. In the previous example,ecanomyis not a match. Fuzzy search cannot be used if a word in the search atom contains a masking character. - match-level
- An integer between 1 and 100 specifying the degree of similarity, where 100 is more similar than 1. 100 specifies an "exact match", and 60 is already considered a very "fuzzy value". The fuzzier the match level is, the longer the elapsed search time, since more documents qualify for the search. The default match level is 70.
- WEIGHT number
- Associates a text-literal with a weight value to change the default score. The allowed weight values are integers between 0 (the lowest score weighting) and 1000 (the highest); the default value is 100.
- word-or-phrase
- A word or phrase to be searched for. The characters that can be
used within a word are language-dependent. It is also language-dependent
whether words need to be separated by separator characters. For English
and most other languages, each word in a phrase must be separated
by a blank character. To search for a character string that contains double quotation marks, type the double quotation marks twice. For example, to search for the text "wildcard" character, use:
Note that in the example, it is only possible to search for one set of quotation marks. You cannot search for two quotation marks in a sequence. There is also a maximum length of 128 bytes for each word or phrase."""wildcard"" character" - Masking characters
- A word can contain the following masking characters:
- _ (underscore)
- Represents any single character.
- % (percent)
- Represents any number of arbitrary characters. If a word consists of a single %, then it represents an optional word of any length. A word cannot be composed exclusively of masking characters, except when a single % is used to represent an optional word. If you use a masking character, you cannot use the THESAURUS keyword. Masking characters cannot be used inside thesaurus query parts. If they are used in combination, search results are unpredictable. Masking characters cannot follow a non-alphanumeric character. Masking characters cannot be used inside a fuzzy search as masking always expands into a single word.
- ESCAPE escape-character
- A character that identifies the next character as one to be searched
for and not as one to be used as a masking character. For example,
if an escape-character is $, then $%, $_, and $$ represent %, _, and
$. Any % and _ characters not preceded by $ represent masking characters.
During search, you are only allowed to use single-byte escape characters. No double-byte characters are allowed.
- THESAURUS thesaurus-name
- A keyword used to specify the name of the thesaurus to be used
to expand a text-literal. The thesaurus name is the file name (without
its extension) of a thesaurus that has been compiled using the thesaurus
compiler. It must be located in
<os-dependent>/sqllib/db2ext/thes. Alternatively, the full path can be specified preceding the file name. - EXPAND relation
- Specifies which relation is used to expand the text-literal using
the thesaurus. The thesaurus has predefined relations described in
the DB2EXTTH command. These are referred to using
the following keywords:
- SYNONYM, a symmetrical relationship expressing equivalence.
- RELATED, a symmetrical relationship expressing association.
- BROADER, a directed hierarchical relationship that can be followed by specified depth levels.
- NARROWER, a directed hierarchical relationship that can be followed by specified depth levels.
RELATION(number), that corresponds to the relation definition inDB2TEXTTH. - TERM OF text-literal
-
The text-literal, to which other search terms are to be added from the thesaurus.
- count LEVELS
-
A keyword used to specify the number of levels (the depth) of terms in the thesaurus that are to be used to expand the search term for the given relation. If you do not specify this keyword, a count of 1 is assumed. The value of depth must be a positive integer value.
- ATTRIBUTE attribute-name
- Searches for documents that have attributes matching the specified
condition. The attribute-name refers to the name of an attribute expression
in the
CREATE INDEXcommand, or to an attribute definition in the document model file.The attribute-factor is allowed for attributes of type double only. The precision of the value is guaranteed for 15 digits. Numbers that consist of 16 digits and higher are rounded. Usage of masking characters is not allowed in attribute-name, valueFrom and, valueTo. For an explanation, see the following:
- BETWEEN valueFrom AND valueTo
- A BETWEEN attribute factor evaluates to true if the value of the attribute is greater than (not equal to) valueFrom and smaller than (not equal to) valueTo.
- >valueFrom
- A ">" attribute factor evaluates to true if the value of the attribute is greater than (not equal to) valueFrom.
- <valueTo
- A "<" attribute factor evaluates to true if the value of the attribute is lower than (not equal to) valueTo.
CREATE INDEXcommand, the attribute name must be in uppercase. - IS ABOUT language word-or-phrase
- An option that lets you specify a free-text search argument. Using IS
ABOUT, you can search for any (but not necessarily all)
of the words that you specify in word-or-phrase in any order in a
document. The closer together the terms used in word-or-phrase are
and the more terms that are included in a document, the higher the
returned score for the document.
The parameter language is optional and must be set only for Thai (TH_TH) where it is required for tokenization purposes, and for Turkish (TR_TR), where it is required for proper case mapping.
Note that IS ABOUT is useful only if document score values are requested and the search results are ordered by score values.