Query one or more sources. Most of the advanced options offered here apply mostly to search collections (and can be specified on a per collection basis as well in the form of their associated sources). If you use the browse mode, search results will be cached into a temporary file and can be browsed using the query-browse function. The SOAP name of this function is: QuerySearch
Synopsis
query-results nodeset query-search(query, query-object, query-condition-object, query-condition-xpath, query-modification-macros, sources, start, syntax-operators, syntax-repository-node, syntax-field-mappings, sort-by, sort-xpaths, sort-score-xpath, sort-num-passages, rank-decay, num, num-over-request, num-per-source, num-max, browse, browse-num, browse-start, browse-clusters-num, term-expand-max-expansions, term-expand-error-when-exceeds-limit, spelling-enabled, spelling-configuration, dict-expand-dictionary, dict-expand-max-expansions, dict-expand-stem-enabled, dict-expand-stem-stemmers, dict-expand-wildcard-enabled, dict-expand-wildcard-min-length, dict-expand-wildcard-segmenter, dict-expand-wildcard-delanguage, dict-expand-regex-enabled, arena, output-contents-mode, output-contents, output-summary, output-score, output-shingles, output-duplicates, output-key, output-cache-references, output-cache-data, output-sort-keys, output-axl, output-bold-contents, output-bold-contents-except, output-bold-class-root, output-query-node, output-display-mode, authorization-rights, authorization-username, authorization-password, collapse-xpath, collapse-num, collapse-binning, collapse-sort-xpaths, binning-state, binning-mode, binning-configuration, force-binning, fetch, fetch-timeout, aggregate, aggregate-max-passes, cluster, cluster-near-duplicates, cluster-kbs, cluster-stemmers, cluster-segmenter, efficient-paging, efficient-paging-n-top-docs-to-cluster, debug, profile, extra-xml);
string query;
operator nodeset query-object;
operator nodeset query-condition-object;
string query-condition-xpath;
string query-modification-macros;
string sources;
int start;
string syntax-operators;
string syntax-repository-node;
field-map nodeset syntax-field-mappings;
string sort-by;
sort nodeset sort-xpaths;
string sort-score-xpath;
int sort-num-passages;
double rank-decay;
int num;
double num-over-request;
int num-per-source;
int num-max;
boolean browse;
int browse-num;
int browse-start;
int browse-clusters-num;
int term-expand-max-expansions;
boolean term-expand-error-when-exceeds-limit;
boolean spelling-enabled;
spelling-correction nodeset spelling-configuration;
string dict-expand-dictionary;
int dict-expand-max-expansions;
boolean dict-expand-stem-enabled;
string dict-expand-stem-stemmers;
boolean dict-expand-wildcard-enabled;
int dict-expand-wildcard-min-length;
string dict-expand-wildcard-segmenter;
boolean dict-expand-wildcard-delanguage;
boolean dict-expand-regex-enabled;
string arena;
enum output-contents-mode;
string output-contents;
boolean output-summary;
boolean output-score;
boolean output-shingles;
boolean output-duplicates;
boolean output-key;
boolean output-cache-references;
boolean output-cache-data;
boolean output-sort-keys;
nodeset output-axl;
string output-bold-contents;
boolean output-bold-contents-except;
string output-bold-class-root;
boolean output-query-node;
enum output-display-mode;
string authorization-rights;
string authorization-username;
string authorization-password;
string collapse-xpath;
int collapse-num;
boolean collapse-binning;
sort nodeset collapse-sort-xpaths;
string binning-state;
enum binning-mode;
binning-setnodeset binning-configuration;
boolean force-binning;
boolean fetch;
int fetch-timeout;
boolean aggregate;
int aggregate-max-passes;
boolean cluster;
double cluster-near-duplicates;
string cluster-kbs;
string cluster-stemmers;
string cluster-segmenter;
boolean efficient-paging;
int efficient-paging-n-top-docs-to-cluster;
boolean debug;
boolean profile;
nodeset extra-xml;
Parameters
- string query - A query string which will be parsed using the syntax-operators. This is convenient to pass user generated queries.
- operatornodeset query-object - A query object (similar to the object obtained by parsing the
query string according to the syntax). This query object is convenient to pass programmatically generated queries (as it is much easier to manipulate unambiguously). When both a query string and a query object are specified they are combined with an "and" operator. It is perfectly fine to specify both (to combine a user generated query with a programmatic condition for example).
- operatornodeset query-condition-object - A query object (similar to the object obtained by parsing the
query string according to the syntax) which will be interpreted as a condition on top of both query and query-object but only used to restrict results and not rank them.
- string query-condition-xpath - An XPath used to restrict the set of results (but not rank them) combined with the other query parameters.
- string query-modification-macros - Space separated list of macros used to modify the original query.
- string sources - Space separated list of sources to be queried. If not specified then no source will be queried (that can be convenient when trying to see how the query is parsed and/or expanded without actually running it).
- int start - The rank of the first document returned by each source/collection (starts at 0). This only makes sense when there is only one collection. If you are federating multiple sources you should use browse-start. If you are only federating search collections it is possible to be more efficient to get the "next" set of results by passing a start for each collection (as a field) based on what has been retrieved for the first request. Default value: 0.
- string syntax-operators - A space separated list of operator names. These operators need to be defined in the syntax repository node referenced here.You can use query-search only for parsing a query by submitting it without any source selected. Default value: AND and () CONTAINING CONTENT %field%: + NEAR - NOT NOTCONTAINING NOTWITHIN OR0 quotes regex stem THRU BEFORE FOLLOWEDBY weight wildcard wildchar WITHIN WORDS site less-than less-than-or-equal greater-than greater-than-or-equal equal range.
- string syntax-repository-node - The name of a syntax node (containing the definition of syntax operators) that will be loaded from the repository. Default value: custom.
- field-map nodeset syntax-field-mappings - A set of field mappings.
- string sort-by - A space separated list of sort field (i.e., v.sort-by) values used for sorting (these will need to be mapped to a sorting formula in each target source).
- sort nodeset sort-xpaths - A set of sorting nodes specifying a sort xpath and sort order. See also sort-score-xpath and sort-by. If sort-by is specified then this is used in combination, sort-by specifying the primary sort key(s).
- string sort-score-xpath - The score that will be outputted for each result (see output-score). This is the default sorting formula if no other sorting is specified. You can specify a custom xpath expression that uses the variables $score, $la-score and any content name that you have defined to be fast-indexed (in the indexing configuration of the collection and reported in the index status). The score is always numeric so the outcome of the evaluation of the formula may be cast to a number if necessary. If both a sort-score-xpath and sorting conditions are specified, the sorting conditions take precedence. Be aware that this XPath will be evaluated for each search result (i.e., potentially millions of times). Some (but not all) of the XPaths operators and functions have been optimized to avoid manipulating heavy XPath structures. It is recommended to restrict your formula to these operators for large collections (to avoid unsuitable response time). Check the documentation for the current list of optimized XPath operators and functions.
- int sort-num-passages - The number of matching passages taken into account when calculating natural relevancy (i.e., $score). Set this to 0 if you want to improve perfomance and are not using $score for sorting. If not specified, the value set at the collection level will be used.
- double rank-decay - When ranking, the ith best passage is scored at this value to the power of i. Set this to 1 to treat all passages as being of equal value.
- int num - When specified, Watson Explorer will try to adapt the number of results retrieved from each source to get to that total. In aggregate mode
it may mean going back to certain sources for more results. Default value: 10.
- double num-over-request - When two or more sources are queried simultaneously and duplicates are removed, the total number of results may end up lower than expected. This parameter will multiply the total number of results requested to compensate for this loss. This is also useful in aggregate mode to avoid returning to a source too many times. Default value: 1.3.
- int num-per-source - The number of results requested for each source. Takes precedence over num when specified.
- int num-max - Maximum number of top results ranked for each collection. This needs to be higher than the number requested and is only meaningful from a acache optimization standpoint (when trying to get more results from a source, the initial cache can only be used if the top rank requested is lower than num-max).
- boolean browse - When enabled the results will be saved in a temporary file
for future browsing. This is especially useful when clustering results. Default value: false.
- int browse-num - In browse mode, only this number of results will be returned initially. Default value: 10.
- int browse-start - Out of the num results retrieved, only return the ones with rank higher than browse-start (starts at 0). Default value: 0.
- int browse-clusters-num - Only this number of clusters will be returned in browse mode. Default value: 10.
- int term-expand-max-expansions - When provided, the search collection will limit the number of expanded terms to this value. The expansions will be sorted with the most frequently occurring terms first.
- boolean term-expand-error-when-exceeds-limit - When enabled, if a term has more expansions than allowed, the search collection will not return results, instead returning an error.
- boolean spelling-enabled - When enabled, a spell corrected version of the query node will be returned along with the results. Default value: false.
- spelling-correction nodeset spelling-configuration - Configuration options for the spelling corrector. Use this if you need to spell correct fields other than the default or to use a different dictionary.
- string dict-expand-dictionary - The name of the dictionary to use to perform dictionary-based expansions.
- int dict-expand-max-expansions - The maximum number of expansions each term may be expanded to.
- boolean dict-expand-stem-enabled - When enabled, terms using the stem operator will be replaced with all terms that have the same stem.
- string dict-expand-stem-stemmers - When performing stem expansion, this stemmer will be used to find related words.
- boolean dict-expand-wildcard-enabled - When enabled, terms using the wildcard or wildchar operator will be replaced with all terms that match the wildcard expression.
- int dict-expand-wildcard-min-length - When performing wildcard expansion, there must be at least this many non-wildcard characters in the term.
- string dict-expand-wildcard-segmenter - When performing wildcard expansion, this segmenter will be used to segment the term before finding matches.
- boolean dict-expand-wildcard-delanguage - When performing wildcard expansion, this will cause the term to be delanguaged before finding matches.
- boolean dict-expand-regex-enabled - When enabled, terms using the regex operator will be replaced with all terms that match the regular expression.
- string arena - Specifies the arena to query within search-collections.
- enum output-contents-mode - Specifies how the output-contents parameter is interpreted:
defaults: parameter is ignored, the defaults for each collection is used. list: output the contents whose names are listed (use this option with an empty value in output-contents not to output any content). except: output the contents whose names are not listed (use this option with an empty value in output-contents to output all contents, in general you always want to exclude snippet to avoid outputting the full text of the documents). Default value: defaults. Possible values: defaults|list|except.
- string output-contents - Space separated list of content names to be outputted or not outputted depending on the value of output-contents-mode.
- boolean output-summary - If enabled, summaries will be generated for each result (based on the contents selected in the collection configuration). Summaries usually provide a better user experience and better clustering but can have a substantial I/O cost (it is therefore advised to turn this off when retrieving a large number of results).
- boolean output-score - Output the score computed by the search engine for each result, not the score set at the federation layer. You should turn this on when aggregating multiple collections. If not specified, the default for each source will be used.
- boolean output-shingles - Output shingles allowing the aggregation of search results from different collections to remove shingles duplicates across collections (see aggregate parameter). You should turn this off when querying a single collection. If not specified, the default for each source will
be used.
- boolean output-duplicates - When false duplicates will be removed from each search collection based on shingles.
- boolean output-key - Output a key (based on the normalized vse-key) that will be used at the federation level to remove duplicates across collections (should always be off when querying a single collection). If not specified, the default for each source will be used.
- boolean output-cache-references - Output cache references for each result document allowing to reference the corresponding cache data. If not specified, the default for each source will
be used.
- boolean output-cache-data - Output cache data for each result, this can generate large amount of data and is normally used for a single document at a time.
- boolean output-sort-keys - When sorting formulas are used, their value can be outputted so that cross-sorting across collections can be done. You should turn this off when you are only querying one collection at a time and on when you are aggregating multiple collections. If not specified, the default for each source will be used.
- nodeset output-axl - AXL code used to specify the output of search collections.
- string output-bold-contents - Space separated list of content names to be bolded (or not bolded depending on the value of output-bold-contents-except) with the query words.
- boolean output-bold-contents-except - If set to true, the list of contents is considered an exclusion list of contents not to bold. Default value: false.
- string output-bold-class-root - If not specified, a standard <b> tag is used for bolding. If specified, a <span> tag is used with a class name using this root concatenated with a number corresponding to each keyword.
- boolean output-query-node - Return the query node that was used to perform the search. This may be different from the query node that was provided as a parameter. Default value: true.
- enum output-display-mode - Using 'limited' mode will return a subset of the XML data that is returned in 'normal' mode. This subset is optimized for applications using the API to return large data sets. Default value: default. Possible values: default|limited.
- string authorization-rights - Newline separated list of security groups (the target collections are expected to have acls using the same groups).
- string authorization-username - Username to use for late-binding security.
- string authorization-password - Password to use for late-binding security.
- string collapse-xpath - This XPath expression will be evaluated for each document that satisfies the query and documents sharing the same value will be collapsed into a single search result.
- int collapse-num - When collapsing, this is the maximum number of secondary documents to output per key. If you specify 0, this will output only the best scoring document for each key.
- boolean collapse-binning - By default the binning is applied to each document. If you select this option, the binning will instead be performed only on the best document of each collapsed set. It is possible for this mode to cause fewer than the requested number of documents to be returned.
- sort nodeset collapse-sort-xpaths - This XPath expression will be evaluated for each document that satisfies the query and documents sharing the same key value will be collapsed to a single search result.
- string binning-state - String token to be passed when interacting with the binning.
- enum binning-mode - off: will disable the binning mode. normal: computes all bins for the current state. double: computes the same information as normal but also computes the bins for the full-state without restrictions. Default value: defaults. Possible values: defaults|off|normal|double.
- binning-set nodeset binning-configuration - Any additional configuration that you would like to use to augment the binning configuration from the collection. This is usually passed only when additional processing based on the user is required.
- boolean force-binning -
Usually, binning data is not output when the start parameter
is not zero. This is to prevent incorrect results when using a
collection for aggregration. If aggregation is not being used,
you can force binning data to be output with this option. Default value: false.
- boolean fetch - If true then remote requests will be fetched. This should always be on unless you want to see the set of source references fully expanded without doing any request (which can be convenient when building an advanced form). Default value: true.
- int fetch-timeout - Timeout after which the request will be aborted. Default value: 60000.
- boolean aggregate - When turned on, all sources queried are made to behave like a
single source. This should be turned on when querying a set of search collections which are made to behave as if they were a single collection. If you turn this on, make sure to turn output-score, output-shingles, output-key and output-sort-keys on. Default value: false.
- int aggregate-max-passes - In aggregate mode, this allows to go back that many times to a given source to get the right number of results (when querying multiple sources as one, Watson Explorer cannot know before seeing results, which of the sources has low/high score results and may need to go to some of them multiple times to fetch the true top results across all sources). Default value: 3.
- boolean cluster - Enable clustering of the search results. Only the contents with weights greater than 0 and a default action or a "cluster" action will be taken into account. Default value: false.
- double cluster-near-duplicates -
Documents more similar to each other than x (think of it as percentage of different content) will be considered duplicates. Only one instance of a duplicate document will appear in the output (with multiple sources). Setting this option to 1 will cause all documents to be duplicates of each other, setting it to 0 will cause only documents containing exactly the same indexed text to be marked as duplicates, and setting this option to -1 will completely disable near duplicate detection. Default value: -1.
- string cluster-kbs - Space separated list of knowledge bases to use for clustering and near-deduplication (predefined or user defined). If the same word appears in more than one knowledge base, the knowledge base listed last will be used. For example, with "english german" both knowledge bases contain die, but the entry in the German knowledge base will take priority since it is listed last.
See the main documentation for the
list of currently predefined knowledge bases. Default value: core web english custom.
- string cluster-stemmers - Space separated list of stemmers to use for clustering and near-deduplication. Multiple stemmers can be combined. For example: dutch french will apply the Dutch stemmer, and then, if that stemmer did not stem the word, the French stemmer.
See the main documentation for the list of currently supported stemmers. Default value: delanguage english depluralize.
- string cluster-segmenter - Specifies the segmenter to use for clustering and near-deduplication. A segmenter is used to take sequential utterances in non-segmented languages (for example, Japanese or Chinese) languages and divide the utterance into the individual components (or words).
See the main documentation for the list of currently supported segmenters. Default value: none.
- boolean efficient-paging - Retrieves documents from sources in a more efficient manner, allowing a larger number of documents to be returned. This feature is incompatible with certain other Watson Explorer features. Default value: false.
- int efficient-paging-n-top-docs-to-cluster - When efficient paging is enabled, not all documents will be retrieved up front and thus are not available for clustering. This option controls how many of the top-ranked documents will be retrieved and used for clustering. Default value: 200.
- boolean debug - When true, Watson Explorer will create a debugging log which can be browsed through the administrative tool (Management > Debugging Sessions). Never leave this option on in production as it affects the performance. Default value: false.
- boolean profile -
When true, Watson Explorer will perform light debugging allowing to use the debug log for profiling. Note that debug needs to be turned on for this option to work. Default value: false.
- nodeset extra-xml - Extra AXL code processed before the query is submitted. Can be used to add more advanced options or custom variables.
Exceptions
- There are no exceptions specific to this function.
Authentication
Like all Watson Explorer Engine API functions with the exception of ping, the query-search function requires authentication.
When using REST, you can simply pass v.username and v.password as CGI parameters via HTTP or HTTPS to authenticate the REST call to the query-search function.
When using the SOAP API, you can pass credentials as parameters on an endpoint, or you can leverage the authentication method that is supported by all Watson Explorer Engine functions. Each provides a setAuthentication method that can be passed an authentication object to provide the username and password under which a function executes. An example of this in Java for a SOAP call to the query-search function is the following:
Authentication authentication = new Authentication();
authentication.setUsername("joe-user");
authentication.setPassword("joes-password");
QuerySearch foo = new QuerySearch();
foo.setAuthentication(authentication);
A single authentication object would typically be reused throughout each individual application.