Contents hold the data for clustering
Description
Contents are always the children of schema.x.element.document elements They specify some text associated with each schema.x.element.document as well as how this text should be processed by the clustering engine.
Attributes
- async (Boolean default: true) - Asynchronous processing. For parse tags, should the request be enqueued (false) or processed before its next sibling. For other elements, this attribute only makes a difference when they contain asynchronous requests which need to be processed before the element is processed. In this case, when false the element's next sibling will only be processed after the current element, when true Watson Explorer won't wait for the current element to be processed before processing its next sibling.
- elt-id (Integer) - Usage: Internal
- max-elt-id (Integer) - Usage: Internal
- execute-acl (Text)
- process (Text) - An XPath determining which of the attributes and/or children will be processed. Currently only "", "*", "@*" and "*|@*" are supported.
- name (Text) - Name of the content (may be unspecified for anonymous contents).
- count (Integer) - Number of times the content occurs (used for metadata).
- type (Any of: html, text Deprecated values: title, snippet, token default: html) - Specifies the type of the content and consequently the tokenization that will be applied.
- html:
HTML (tagged) text.
- text:
Plain text
- token:
Plain text that should be treated as a single token.
For example, a source name, etc.
This generates a token which is the full text of the content
with a '#' prepended to it.
The maximum length of a content is 499 bytes. Any data
beyond 499 bytes will be truncated.
Usage: Must be specified once all the child elements have been processed
- action (Any of: none, cluster, discard, index-only Deprecated values: cluster-bold, cluster-summarize default: cluster) - Sets the processing behavior of this content.
- none:
Simply copied to the output
- cluster:
Cluster the content and then copy it to the output
- cluster-bold:
Use
action="cluster" output-action="bold" instead.
- cluster-summarize:
Use
action="cluster" output-action="summarize" instead.
- discard:
This was a generated content that should just be stripped
when processed.
- index-only:
Contents marked index-only will only be used to
determine if a document matches the query, and will
not be returned in the result. The text of the
content is not stored in the index, which causes the
index to take less space. These contents will not be
available for fast-indexing.
If specified with either a negative weight or the
attribute indexed set to false, the
data will be discarded during indexing.
- output-action (Any of: bold, bold-preserve-tags, summarize, summarize-discard, discard, subsearch, none) - Sets the output behavior of this content.
- bold:
Convert this content to HTML and apply the bolding
algorithm to bold query terms and/or tree labels to it.
If the content is HTML, all tags will be stripped from
it while bolding.
- bold-preserve-tags:
Convert this content to HTML and apply the bolding
algorithm to bold query terms and/or tree labels to it.
If the content is HTML, all tags will be preserved in
the content while bolding.
- summarize:
Summarize and bold it in the snippet content.
- summarize-discard:
Summarize and bold it in the snippet content.
- discard:
If possible, remove it from the output.
- subsearch:
Duplicated version of a content saved for allowing
multiple subsearches on the same file.
- none:
Used by the default rendering to avoid displaying this content.
Usage: This functionality is deprecated
- weight (Decimal number default: 1) - Weight multiplier for this content, implicitly set to 0 at cluster time if the action is none. In this case, the default weight of 1 is still used when searching and scoring.
- indexed (Boolean default: true) - When set to true, the data will be stored in the index and available for full-text search (unless the weight is negative). If set to false, the text will be stored in the index but will not be available for full-text searching. If set to false and the action is set to index-only then the data will be completely discarded.
Usage: Search engine module only
- matches (Text) - A space separated list of pairs "start-end" indicating the character ranges (0 based) that should be bolded within this content. This data is generated when the search-engine is given a bolding query.
Usage: Search engine module only
- max-size (Text) - Maximum number of bytes to index in this content.
- add-to ( Restricted form of xs:string: Pattern [^"]+) - ID of the document to which you wish to add this content.
Usage: Search engine module only
- acl (Text) - ACL to apply to this content.
Usage: Search engine module only
- vse-had-acl (Text) - Indicates that the content was indexed with an ACL that will be available for retrieval. This attribute should not be set by the user and will never be seen by the user.
Usage: Search engine module only Internal
- vse-add-to-normalized (May only be: vse-add-to-normalized) - In the search engine, use this attribute to mark the fact that the add-to is already absolute and should not be normalized. If this attribute is not specified, the add-to will be treated as a relative URL and normalized with respect to the URL of that was crawled.
Usage: Search engine module only
- vse-add-to-check (May only be: crawlable) - This attribute only has meaning for the search engine. In the search engine, setting this attribute will cause the indexer to index this content only if the key is a URL that would be crawlable by the current configuration.
Usage: Search engine module only Experimental feature which may be officially supported in a subsequent release
- la-score-multiplier (Decimal number) - Specify a multiplier for the generated la-score when indexing documents in the search-engine. If specified in any other context, it will have no effect.
Usage: Search engine module only
- u (Integer) - Reported by the search engine, this is the relative offset of the start of this content within the indexed document.
Usage: Search engine module only
- fast-index (Any of: number, double, float, int, date, set, checked-number, checked-double, checked-float, checked-int, checked-date, checked-set) - If specified, the search-engine will fast-index this content using the type specified (see the fast-index configuration of the search-engine). It is not valid to specify a fast-index type and an action of index-only.
Usage: Search engine module only
- indexed-fast-index (Any of: number, double, float, int, date, set, checked-number, checked-double, checked-float, checked-int, checked-date, checked-set) - In addition to adding this content to the fast-index, also generate additional traditional index tokens to allow for faster application of fast-index/binning conditions. If both the fast-index and indexed-fast-index types are specified, this one takes precedence.
Usage: Search engine module only
- fast-index-base (Integer default: 10) - Usage: Search engine module only
- fast-index-min-exponent (Integer default: 10) - Usage: Search engine module only
- matched (xs:unsignedInt) - Was the content matched during a sub-search.
Usage: Read-only
- vse-streams (List of: xs:unsignedInt default: 0) - Produced by the search-engine if streams other than the default were used to generate the data.
Usage: Read-only
- Any user-defined attribute