Query Modification Settings

The table below details the query-modification settings for a project. The name, variable name, variable type, description, and default are defined for each setting.

Note: Your Watson™ Explorer Engine search application may or may not require the configuration of these parameters. If the description of the parameters listed here does not clearly identify its potential use in your search application, please contact IBM support for clarification.
Table 1. Query Modification Settings
Setting Name Type Description

Enable wildcard expansion

(meta.wildcard-expand)

boolean

When true, Watson Explorer Engine will replace a wildcard term in a query with an OR'd combination of the words that match the wildcard pattern from all of the dictionaries used by the project. This option is used to support wildcard queries in search engines that do not directly support wildcards.

Default: false

Minimum length of wildcard term

(meta.wildcard-expand-min-length)

number

When wildcard query expansion is enabled, the minimum number of sequential characters required in a query term containing a wildcard character in order for the expansion to take place. This improves performance and helps to minimize the number of query terms generated.

Default: 3

Maximum expansions per word

(meta.expand-max)

number

Default: 200

Enable stem expansion

(meta.stem-expand)

boolean

When true, Watson Explorer Engine will replace a stem-expanded word by an OR'd combination of the words having the same stem, using the stemmer(s) specified in the meta.stem_expand_stemmer variable.

Default: false

If using lexical analysis language streams, set to false. See Lexical Analysis Streams.

Dictionary file

(dictionary)

string

The path to the dictionary file containing #word #num couples. This is used by wildcard expansion. Stem expansion uses the name of this file as a root, appending a period and the list of stemmers. This is then used as the name of a stemming file to look for that contains words and associated stems from that dictionary.

Default: {install-dir}/data/dictionaries/default/wildcard.dict

Stemmers for stem expansion

(meta.stem-expand-stemmer)

string

Specifies the stemming algorithm(s) to use for the stem expansion. Run chico -h or call vivisimo_command_usage to see a list of supported stemmers. Multiple stemmers can be combined by writing a + between them. For example: dutch+french will apply the Dutch stemmer, and then, if that stemmer did not stem the word, the French stemmer.

Default: english+depluralize

Wildcard segmenter

(meta.wildcard-segmenter)

enum

Specifies the segmenter algorithm to use for the wildcard expansion. A segmenter is used to take sequential utterances in non-segmented languages (for example, Japanese or Chinese) languages and divide the utterance into the individual components (or words).

  • unigram: segments into individual characters
  • unigram-of-bigram: segmenter that segments the first character of each word
  • bigram: segments into overlapping pairs of words
  • mixed: segments into unigrams all ideographic characters. This is a good segmenter to use with Chinese and mixed CJK data
  • japanese: segmenter for japanese words (inflected form of surface form)
  • japanese-base: segmenter for japanese words (base form of surface form)
  • japanese-reading: segmenter for japanese words (base form of the reading)
  • thai: segmenter for thai words

Wildcard should delanguage

(meta.wildcard-delanguage)

boolean

Attempt to normalize the diacritics and different writing systems before expanding wildcard operators.

Default: false

Enable semantic expansion

(query-expansion.enabled)

boolean

Setting this to true will enable automatic and manual expansion of the original query with synonyms and other related terms previously entered into an Ontolection, commonly done using the Results Module Terminology Manager. If semantic expansion is enabled and no value is specified for Query expansion ontolections, any source whose name begins with iopro-tm- will be queried. For more information, see the online documentation

Default: false

Query expansion ontolections

(query-expansion.ontolections)

string

Space or comma separated list of the special collections that contain word equivalences and other semantic relations. If Enable semantic expansion is set to true and no value is specified here, any source whose name begins with iopro-tm- will be queried. For more information, see the online documentation

Automatic expansions

(query-expansion.automatic)

separated-set

List relation labels that are to be automatically used for expansion. Relation labels must be present as contents in the OntoLection(s) specified in query-expansion.ontolections.

Examples of relations are:

  • synonym for bidirectional synonyms (needs to be spelled like this)
  • spelling for spelling variations
  • translation for one single language translation, or
  • spanish, japanese, etc. for specific language translations
  • narrower for narrower terms (more specific than the query term)
  • broader for broader terms (more general than the query term)
  • suggested for suggested concepts derived from a data collection. For more information, see the online documentation
  • alternative for unidirectional synonyms (rewrites); 'synonym' is reserved for bidirectional synonyms
  • acronym for acronyms or abbreviations of the query term or phrase
  • standsfor for the term or phrase that stands for the query, when it is an acronym or an abbreviation
  • related for otherwise related terms, or
  • isEditor, inCountry etc. for specific kinds of related terms

In order to specify a weight to be applied to all expansion terms for a specific relation type, you can add a number from 0 to 1 after a colon (e.g. spelling:0.9|synonym:0.5). Original terms have a weight of 1; use weights smaller than 1 to give less weight to expansions.

For more information about what the different relation types are, see the online documentation

Default: synonym:0.8|spelling:0.8|narrower:0.5

Manual expansions

(query-expansion.suggestion)

separated-set

List relation labels that are to be displayed as suggestions. Relation labels must be present as contents in the OntoLection(s) specified in query-expansion.ontolections.

Examples of relations are:

  • synonym for bidirectional synonyms (needs to be spelled like this)
  • spelling for spelling variations
  • translation for one single language translation, or
  • spanish, japanese, etc. for specific language translations
  • narrower for narrower terms (more specific than the query term)
  • broader for broader terms (more general than the query term)
  • suggested for suggested concepts derived from a data collection. For more information, see the online documentation
  • alternative for unidirectional synonyms (rewrites); 'synonym' is reserved for bidirectional synonyms
  • acronym for acronyms or abbreviations of the query term or phrase
  • standsfor for the term or phrase that stands for the query, when it is an acronym or an abbreviation
  • related for otherwise related terms, or
  • isEditor, inCountry etc. for specific kinds of related terms

In order to specify a weight to be applied to all expansion terms for a specific relation type, you can add a number from 0 to 1 after a colon (e.g. translation:0.9|spanish:0.4). Original terms have a weight of 1; use weights smaller than 1 to give less weight to expansions.

For more information about what the different relation types are, see the online documentation

Default: translation:0.5|broader:0.3|related:0.2

Enable stemming of the expanded query

(query-expansion.stem-expansions)

boolean

Setting this option to true will add stemming variations for all the terms present in the expanded query.

Unless otherwise specified in the Query Expansion display, stemming expansions of the original terms will appear under Stemming and stemming expansions of semantic expansions (synonyms, narrower terms, etc.) will appear under that relation type plus -stemming (for example, Synonym+stemming).

Default: false

Stemming expansions weight

(query-expansion.stemming-weight)

number

Original terms have a weight of 1 by default. You can modify the ranking of documents containing stemming variations by modifying this number. The higher it is, the higher documents with stemming variations will appear on the result list.

Default: 0.5

Stemming dictionary for semantic expansions

(query-expansion.stemming-dictionary)

string

Stemming dictionary to generate stem expansions for the semantically expanded query.

Default: english/wildcard.dict

Max expansions

(query-expansion.max-terms-per-type)

number

Maximum number of expansions for each term/phrase in the original query and for each relation type specified in Automatic and Manual Expansion Types. -1 allows an unlimited number of expansions.

Default: 5

Query expansion match type

(query-expansion.query-match-type)

enum

Type of query match. 'terms' matches all terms and explicit phrases against the OntoLections. 'exact' matches the whole query against the OntoLections, and 'exact-terms' matches the whole query as well as the individual terms when query has multiple terms and no double quotes are used. 'all-subphrases' matches all contiguous sub-strings against the OntoLections. 'exact', 'exact-terms' and 'all-subphrases' are only supported for simple queries with no operators.

Default: exact-terms

Data-driven conceptual search correlation metric

(query-expansion.conceptual-search-metric)

enum

Correlation metric to compare query terms with suggestion candidates stored in a cs-ontolection. 'dice' refers to the dice coefficient, which counts the number of dimensions in which the two vectors are non-zero and normalizes for length; The 'euclidean dot product' takes weights into account.

Default: euclidean-dot-product

Data-driven conceptual search similarity threshold

(query-expansion.conceptual-search-similarity-threshold)

number

Similarity threshold used to filter suggestion candidates stored in a cs-ontolection. The higher this number, the more similar suggestion candidates need to be to the query term, and thus fewer and better suggestions are likely to be produced. As a starting point, we recommend setting it to 0.5 if using euclidean-dot-product as a metric and to 0.2 if using dice as a metric above.

For more information, see the online documentation

Default: 0.5