query-search

Query one or more sources. Most of the advanced options offered here apply mostly to search collections (and can be specified on a per collection basis as well in the form of their associated sources). If you use the browse mode, search results are cached into a temporary file and can be browsed by using the query-browse function. The SOAP name of this function is: QuerySearch

Synopsis

query-results nodeset query-search(query, query-object, 
query-condition-object, query-condition-xpath, query-modification-macros, 
sources, start, syntax-operators, syntax-repository-node, 
syntax-field-mappings, sort-by, sort-xpaths, sort-score-xpath, 
sort-num-passages, rank-decay, num, num-over-request, 
num-per-source, num-max, browse, browse-num, browse-start, 
browse-clusters-num, term-expand-max-expansions, term-expand-error-when-exceeds-limit, 
spelling-enabled, spelling-configuration, dict-expand-dictionary, 
dict-expand-max-expansions, dict-expand-stem-enabled, dict-expand-stem-stemmers, 
dict-expand-wildcard-enabled, dict-expand-wildcard-min-length, dict-expand-wildcard-segmenter, 
dict-expand-wildcard-delanguage, dict-expand-regex-enabled, arena, 
output-contents-mode, output-contents, output-summary, output-score, 
output-shingles, output-duplicates, output-key, output-cache-references, 
output-cache-data, output-sort-keys, output-axl, output-bold-contents, 
output-bold-contents-except, output-bold-class-root, output-query-node, 
output-display-mode, authorization-rights, authorization-username, 
authorization-password, collapse-xpath, collapse-num, collapse-binning, 
collapse-sort-xpaths, binning-state, binning-mode, binning-configuration, 
force-binning, fetch, fetch-timeout, aggregate, 
aggregate-max-passes, cluster, cluster-near-duplicates, cluster-kbs, 
cluster-stemmers, cluster-segmenter, efficient-paging, 
efficient-paging-n-top-docs-to-cluster, debug, profile, extra-xml);
string query;
operator nodeset query-object;
operator nodeset query-condition-object;
string query-condition-xpath;
string query-modification-macros;
string sources;
int start;
string syntax-operators;
string syntax-repository-node;
field-map nodeset syntax-field-mappings;
string sort-by;
sort nodeset sort-xpaths;
string sort-score-xpath;
int sort-num-passages;
double rank-decay;
int num;
double num-over-request;
int num-per-source;
int num-max;
boolean browse;
int browse-num;
int browse-start;
int browse-clusters-num;
int term-expand-max-expansions;
boolean term-expand-error-when-exceeds-limit;
boolean spelling-enabled;
spelling-correction nodeset spelling-configuration;
string dict-expand-dictionary;
int dict-expand-max-expansions;
boolean dict-expand-stem-enabled;
string dict-expand-stem-stemmers;
boolean dict-expand-wildcard-enabled;
int dict-expand-wildcard-min-length;
string dict-expand-wildcard-segmenter;
boolean dict-expand-wildcard-delanguage;
boolean dict-expand-regex-enabled;
string arena;
enum output-contents-mode;
string output-contents;
boolean output-summary;
boolean output-score;
boolean output-shingles;
boolean output-duplicates;
boolean output-key;
boolean output-cache-references;
boolean output-cache-data;
boolean output-sort-keys;
nodeset output-axl;
string output-bold-contents;
boolean output-bold-contents-except;
string output-bold-class-root;
boolean output-query-node;
enum output-display-mode;
string authorization-rights;
string authorization-username;
string authorization-password;
string collapse-xpath;
int collapse-num;
boolean collapse-binning;
sort nodeset collapse-sort-xpaths;
string binning-state;
enum binning-mode;
binning-setnodeset binning-configuration;
boolean force-binning;
boolean fetch;
int fetch-timeout;
boolean aggregate;
int aggregate-max-passes;
boolean cluster;
double cluster-near-duplicates;
string cluster-kbs;
string cluster-stemmers;
string cluster-segmenter;
boolean efficient-paging;
int efficient-paging-n-top-docs-to-cluster;
boolean debug;
boolean profile;
nodeset extra-xml;

Parameters

  • string query - A query string, which is parsed by using the syntax-operators. This is convenient to pass user generated queries.
  • operator nodeset query-object - A query object (similar to the object obtained by parsing the query string according to the syntax). This query object is convenient to pass programmatically generated queries. When both a query string and a query object are specified they are combined with an "and" operator. It is perfectly fine to specify both (to combine a user generated query with a programmatic condition for example).
  • operator nodeset query-condition-object - A query object (similar to the object obtained by parsing the query string according to the syntax) which is interpreted as a condition on top of both query and query-object but only used to restrict results and not rank them.
  • string query-condition-xpath - An XPath used to restrict the set of results (but not rank them) combined with the other query parameters.
  • string query-modification-macros - Space separated list of macros that are used to modify the original query.
  • string sources - Space separated list of sources to be queried. If not specified then no source is queried (that can be convenient when trying to see how the query is parsed and/or expanded without actually running it).
  • int start - The rank of the first document that is returned by each source/collection (starts at 0). This only makes sense when there is only one collection. If you are federating multiple sources you should use browse-start. If you are only federating search collections it is possible to be more efficient to get the "next" set of results by passing a start for each collection (as a field) based on what has been retrieved for the first request. Default value: 0.
  • string syntax-operators - A space separated list of operator names. These operators need to be defined in the syntax repository node referenced here. You can use query-search only for parsing a query by submitting it without any source selected. Default value: AND and () CONTAINING CONTENT %field%: + NEAR - NOT NOTCONTAINING NOTWITHIN OR0 quotes regex stem THRU BEFORE FOLLOWEDBY weight wildcard wildchar WITHIN WORDS site less-than less-than-or-equal greater-than greater-than-or-equal equal range.
  • string syntax-repository-node - The name of a syntax node (containing the definition of syntax operators) that is loaded from the repository. Default value: custom.
  • field-map nodeset syntax-field-mappings - A set of field mappings.
  • string sort-by - A space separated list of sort field (that is, v.sort-by) values used for sorting (these need to be mapped to a sorting formula in each target source).
  • sort nodeset sort-xpaths - A set of sorting nodes that specify a sort xpath and sort order. See also sort-score-xpath and sort-by. If sort-by is specified then this is used in combination, sort-by specifying the primary sort keys.
  • string sort-score-xpath - The score that is outputted for each result (see output-score). This is the default sorting formula if no other sorting is specified. You can specify a custom xpath expression that uses the variables $score, $la-score and any content name that you have defined to be fast-indexed (in the indexing configuration of the collection and reported in the index status). The score is always numeric so the outcome of the evaluation of the formula might be cast to a number if necessary. If both a sort-score-xpath and sorting conditions are specified, the sorting conditions take precedence. Be aware that this XPath is evaluated for each search result (that is, potentially millions of times). Some (but not all) of the XPaths operators and functions have been optimized to avoid manipulating heavy XPath structures. It is recommended to restrict your formula to these operators for large collections (to avoid unsuitable response time). Check the documentation for the current list of optimized XPath operators and functions.
  • int sort-num-passages - The number of matching passages that are taken into account when calculating natural relevancy (that is, $score). Set this to 0 if you want to improve perfomance and are not using $score for sorting. If not specified, the value set at the collection level is used.
  • double rank-decay - When ranking, the nth best passage is scored at this value to the power of i. Set this to 1 to treat all passages as being of equal value.
  • int num - When specified, Watson™ Explorer tries to adapt the number of results that are retrieved from each source to get to that total. In aggregate mode, it might mean going back to certain sources for more results. Default value: 10.
  • double num-over-request - When two or more sources are queried simultaneously and duplicates are removed, the total number of results might end up lower than expected. This parameter multiplies the total number of results that are requested to compensate for this loss. This is also useful in aggregate mode to avoid returning to a source too many times. Default value: 1.3.
  • int num-per-source - The number of results that are requested for each source. Takes precedence over num when specified.
  • int num-max - Maximum number of top results that are ranked for each collection. This needs to be higher than the number requested and is only meaningful from a acache optimization standpoint (when trying to get more results from a source, the initial cache can only be used if the top rank requested is lower than num-max).
  • boolean browse - When enabled the results will be saved in a temporary file for future browsing. This is especially useful when clustering results. Default value: false.
  • int browse-num - In browse mode, only this number of results are returned initially. Default value: 10.
  • int browse-start - Out of the num results retrieved, only return the ones with rank higher than browse-start (starts at 0). Default value: 0.
  • int browse-clusters-num - Only this number of clusters are returned in browse mode. Default value: 10.
  • int term-expand-max-expansions - When provided, the search collection limits the number of expanded terms to this value. The expansions are sorted with the most frequently occurring terms first.
  • boolean term-expand-error-when-exceeds-limit - When enabled, if a term has more expansions than allowed, the search collection does not return results, instead returning an error.
  • boolean spelling-enabled - When enabled, a spell corrected version of the query node is returned along with the results. Default value: false.
  • spelling-correction nodeset spelling-configuration - Configuration options for the spelling corrector. Use this if you need to spell correct fields other than the default or to use a different dictionary.
  • string dict-expand-dictionary - The name of the dictionary to use to perform dictionary-based expansions.
  • int dict-expand-max-expansions - The maximum number of expansions each term might be expanded to.
  • boolean dict-expand-stem-enabled - When enabled, terms using the stem operator are replaced with all terms that have the same stem.
  • string dict-expand-stem-stemmers - When performing stem expansion, this stemmer is used to find related words.
  • boolean dict-expand-wildcard-enabled - When enabled, terms that use the wildcard or wildchar operator are replaced with all terms that match the wildcard expression.
  • int dict-expand-wildcard-min-length - When performing wildcard expansion, there must be at least this many non-wildcard characters in the term.
  • string dict-expand-wildcard-segmenter - When performing wildcard expansion, this segmenter is used to segment the term before finding matches.
  • boolean dict-expand-wildcard-delanguage - When performing wildcard expansion, this causes the term to be delanguaged before finding matches.
  • boolean dict-expand-regex-enabled - When enabled, terms using the regex operator are replaced with all terms that match the regular expression.
  • string arena - Specifies the arena to query within search-collections.
  • enum output-contents-mode - Specifies how the output-contents parameter is interpreted: defaults: parameter is ignored, the defaults for each collection is used. list: output the contents whose names are listed (use this option with an empty value in output-contents not to output any content). except: output the contents whose names are not listed (use this option with an empty value in output-contents to output all contents, in general you always want to exclude snippet to avoid outputting the full text of the documents). Default value: defaults. Possible values: defaults|list|except.
  • string output-contents - Space separated list of content names to be outputted or not outputted depending on the value of output-contents-mode.
  • boolean output-summary - If enabled, summaries are generated for each result (based on the contents that are selected in the collection configuration). Summaries usually provide a better user experience and better clustering but can have a substantial I/O cost (it is therefore advised to turn this off when retrieving a large number of results).
  • boolean output-score - Output the score that is computed by the search engine for each result, not the score set at the federation layer. You should turn this on when aggregating multiple collections. If not specified, the default for each source is used.
  • boolean output-shingles - Output shingles allowing the aggregation of search results from different collections to remove shingles duplicates across collections (see aggregate parameter). You should turn this off when querying a single collection. If not specified, the default for each source is used.
  • boolean output-duplicates - When false duplicates are removed from each search collection based on shingles.
  • boolean output-key - Output a key (based on the normalized vse-key) that is used at the federation level to remove duplicates across collections (should always be off when querying a single collection). If not specified, the default for each source is used.
  • boolean output-cache-references - Output cache references for each result document allowing to reference the corresponding cache data. If not specified, the default for each source is used.
  • boolean output-cache-data - Output cache data for each result, this can generate large amount of data and is normally used for a single document at a time.
  • boolean output-sort-keys - When sorting formulas are used, their value can be outputted so that cross-sorting across collections can be done. You should turn this off when you are only querying one collection at a time and on when you are aggregating multiple collections. If not specified, the default for each source is used.
  • nodeset output-axl - AXL code that is used to specify the output of search collections.
  • string output-bold-contents - Space separated list of content names to be bolded (or not bolded depending on the value of output-bold-contents-except) with the query words.
  • boolean output-bold-contents-except - If set to true, the list of contents is considered an exclusion list of contents not to bold. Default value: false.
  • string output-bold-class-root - If not specified, a standard <b> tag is used for bolding. If specified, a <span> tag is used with a class name using this root concatenated with a number corresponding to each keyword.
  • boolean output-query-node - Return the query node that was used to perform the search. This might be different from the query node that was provided as a parameter. Default value: true.
  • enum output-display-mode - Using 'limited' mode returns a subset of the XML data that is returned in 'normal' mode. This subset is optimized for applications using the API to return large data sets. Default value: default. Possible values: default|limited.
  • string authorization-rights - Newline separated list of security groups (the target collections are expected to have acls using the same groups).
  • string authorization-username - User name to use for late-binding security.
  • string authorization-password - Password to use for late-binding security.
  • string collapse-xpath - This XPath expression is evaluated for each document that satisfies the query and documents sharing the same value are collapsed into a single search result.
  • int collapse-num - When collapsing, this is the maximum number of secondary documents to output per key. If you specify 0, this outputs only the best scoring document for each key.
  • boolean collapse-binning - By default the binning is applied to each document. If you select this option, the binning is instead performed only on the best document of each collapsed set. It is possible for this mode to cause fewer than the requested number of documents to be returned.
  • sort nodeset collapse-sort-xpaths - This XPath expression is evaluated for each document that satisfies the query and documents sharing the same key value are collapsed to a single search result.
  • string binning-state - String token to be passed when interacting with the binning.
  • enum binning-mode - off: disables the binning mode. normal: computes all bins for the current state. double: computes the same information as normal but also computes the bins for the full-state without restrictions. Default value: defaults. Possible values: defaults|off|normal|double.
  • binning-set nodeset binning-configuration - Any additional configuration that you would like to use to augment the binning configuration from the collection. This is usually passed only when more processing based on the user is required.
  • boolean force-binning - Usually, binning data is not output when the start parameter is not zero. This is to prevent incorrect results when using a collection for aggregration. If aggregation is not being used, you can force binning data to be output with this option. Default value: false.
  • boolean fetch - If true then remote requests are fetched. This should always be on unless you want to see the set of source references fully expanded without doing any request (which can be convenient when building an advanced form). Default value: true.
  • int fetch-timeout - Timeout after which the request is aborted. Default value: 60000.
  • boolean aggregate - When turned on, all sources that are queried are made to behave like a single source. This should be turned on when querying a set of search collections which are made to behave as if they were a single collection. If you turn this on, make sure to turn output-score, output-shingles, output-key and output-sort-keys on. Default value: false.
  • int aggregate-max-passes - In aggregate mode, this allows to go back that many times to a given source to get the right number of results (when querying multiple sources as one, Watson Explorer cannot know before seeing results, which of the sources has low/high score results and might need to go to some of them multiple times to fetch the true top results across all sources). Default value: 3.
  • boolean cluster - Enable clustering of the search results. Only the contents with weights greater than 0 and a default action or a "cluster" action are taken into account. Default value: false.
  • double cluster-near-duplicates - Documents more similar to each other than x (think of it as percentage of different content) are considered duplicates. Only one instance of a duplicate document appears in the output (with multiple sources). Setting this option to 1 causes all documents to be duplicates of each other, setting it to 0 causes only documents that contain exactly the same indexed text to be marked as duplicates, and setting this option to -1 completely disables near duplicate detection. Default value: -1.
  • string cluster-kbs - Space separated list of knowledge bases to use for clustering and near-deduplication (predefined or user defined). If the same word appears in more than one knowledge base, the knowledge base listed last is used. For example, with "english german" both knowledge bases contain die, but the entry in the German knowledge base takes priority since it is listed last. See the main documentation for the list of currently predefined knowledge bases. Default value: core web english custom.
  • string cluster-stemmers - Space separated list of stemmers to use for clustering and near-deduplication. Multiple stemmers can be combined. For example: dutch french applies the Dutch stemmer, and then, if that stemmer did not stem the word, the French stemmer. See the main documentation for the list of currently supported stemmers. Default value: delanguage english depluralize.
  • string cluster-segmenter - Specifies the segmenter to use for clustering and near-deduplication. A segmenter is used to take sequential utterances in non-segmented languages (for example, Japanese or Chinese) languages and divide the utterance into the individual components (or words). See the main documentation for the list of currently supported segmenters. Default value: none.
  • boolean efficient-paging - Retrieves documents from sources in a more efficient manner, allowing a larger number of documents to be returned. This feature is incompatible with certain other Watson Explorer features. Default value: false.
  • int efficient-paging-n-top-docs-to-cluster - When efficient paging is enabled, not all documents are retrieved up front and thus are not available for clustering. This option controls how many of the top-ranked documents are retrieved for clustering. Default value: 200.
  • boolean debug - When true, Watson Explorer creates a debugging log, which can be browsed through the administrative tool (Management > Debugging Sessions). Never leave this option on in production as it affects the performance. Default value: false.
  • boolean profile - When true, Watson Explorer will perform light debugging allowing to use the debug log for profiling. Note that debug needs to be turned on for this option to work. Default value: false.
  • nodeset extra-xml - Extra AXL code processed before the query is submitted. Can be used to add more advanced options or custom variables.

Return Value

Exceptions

  • There are no exceptions specific to this function.

Authentication

Like all Watson Explorer Engine API functions except for ping, the query-search function requires authentication.

When using REST, you can simply pass v.username and v.password as CGI parameters via HTTP or HTTPS to authenticate the REST call to the query-search function.

When using the SOAP API, you can pass credentials as parameters on an endpoint, or you can leverage the authentication method that is supported by all Watson Explorer Engine functions. Each provides a setAuthentication method that can be passed an authentication object to provide the user name and password under which a function runs. An example of this in Java for a SOAP call to the query-search function is the following:


    Authentication authentication = new Authentication();
    authentication.setUsername("joe-user");
    authentication.setPassword("joes-password");

    QuerySearch foo = new QuerySearch();
    foo.setAuthentication(authentication);

A single authentication object would typically be reused throughout each individual application.

Spell Correction with HTTP Authentication

If the web server is configured to require HTTP authentication to perform searches, then additional configuration will be required to enable spell correction with from a search API. Essentially, the web server must be configured to allow anonymous (for example, non-authenticated) access to the spelling module.

On a Linux system, Apache should be configured to allow access to file <Installation Directory>/WEX/Engine/www/cgi-bin/spelling-corrector. This can be done by adding the following stanza to the relevant apache *.conf file:

<Files "spelling-corrector">
Allow from all
Satisfy any
</Files>

With Windows, IIS Manager permissions should be configured to allow anonymous access to Sites <Default Web Site>/vivisimo/cgi-bin/spelling-corrector.exe.