Some common problems and suggested solutions are the following:
- If your cluster labels are very long and seem to include only heavily repeated text, the most likely cause is that the segmenter option is not specified in the project. Make sure the projects's Main and secondary language variables (in the Simple tab) are specified correctly.
- If your search-engine collection is failing to return very many documents for most queries
(and not finding documents that you know exist), the most likely cause is that the segmenter
option is not specified in the index stream configuration of the collection. See Index Streams
for more information. Note: This information does not apply to indexing support in lexical analysis language streams because those streams do not have a segmenter option. See Lexical Analysis Streams for more information.
- If your search-engine collection is returning pages that appear to be "corrupt" or
otherwise contain many strange characters, the most likely cause is that the crawled site
does not specify the character set used on the page. This is a problem because web browsers
often guess the correct character encoding, but the Watson™ Explorer Engine software more strictly adheres to web standards.
To correct the problem, identify the correct character encoding and add a Custom recursive condition/Custom conditional settings to the crawler configuration of the collection (located on the Configuration > Crawling tab for your search collection). Specify a condition that matches the incorrect pages and change the Default encoding option in the Retrieval and encodings section to specify the encoding to use when no encoding is specified.