The following table shows the languages for which Watson Explorer Engine provides language-specific components, and which components are currently available for each. Watson Explorer Engine provides a flexible and easily-extended framework for search application development that makes it easy to add and integrate custom, language-related components.
The Language Configuration in Watson Explorer Engine section describes how to specify high-level language variables in a Watson Explorer Engine project, which will instantiate all the appropriate individual language-related components available for the set of languages you are interested in.
| Language | Knowledge Base | Stemmer | Depluralize | Segmenter | Stoplist(s) | Minor Language | Localized Display |
|---|---|---|---|---|---|---|---|
| Arabic | no | yes* | no | no | yes* | no | no |
| Catalan | yes* | no | no | no | yes* | no | no |
| Legacy Chinese | yes* | N/A | N/A | yes | yes* | no | no |
| Danish | yes* | yes* | no | no | yes* | no | no |
| Dutch | yes* | yes* | no | no | yes* | no | no |
| English | yes | yes | yes | no | yes | yes | yes |
| Farsi | no | no | no | no | yes* | no | no |
| Finnish | yes* | yes* | no | no | yes* | no | no |
| French | yes | yes* | no | no | yes | yes | no |
| German | yes | yes | no | no | yes | no | no |
| Hebrew | no | yes* | no | no | yes* | no | no |
| Italian | yes* | yes* | no | no | yes* | no | no |
| Legacy Japanese | yes | N/A | N/A | yes | yes | no | yes |
| Legacy Korean | yes | yes | N/A | yes | yes* | no | no |
| Norwegian | yes* | yes* | yes* | no | yes* | no | no |
| Polish | no | no | no | no | yes* | no | no |
| Portuguese | yes* | yes* | no | no | yes* | no | no |
| Russian | yes* | yes* | no | no | yes* | no | no |
| Spanish | yes | yes | yes* | no | yes | no | yes |
| Swedish | yes* | yes* | no | no | yes* | no | no |
| Thai | yes* | N/A | N/A | yes | yes* | no | no |
| Turkish | no | yes* | no | no | no | no | no |
This table uses the following values to indicate the robustness of each feature:
The language support components associated with the columns in this table are the following:
Defines how specific terms are handled during indexing, key matching, or when clustering the search results returned by a query. The terms in a knowledge base can include source specific stopwords, stop phrases, stemmer corrections, synonyms, and other rules that will determine whether two words should be in the same cluster. (See Stoplist.)
For information about creating and using knowledge bases, see Knowledge Bases. For information about how knowledge bases are used in the search engine, see Knowledge Bases and Search.
Identifying the root or base portion of a word in order to make it easier to identify related terms. The root or base portion of a word is known as its stem. For example, the terms hunted, hunting, hunter, and huntress all share the stem "hunt." A stemmer is the term for the application or algorithm used to identify and derive stems. By default, stemmers known as "depluralize stemmers" are used by the Watson Explorer Engine search engine during indexing so that searching collections will return the terms "hunter" and "hunters" on a query for either, but not will not return "hunting".
A software tool that takes sequential statements in non-segmented languages such as Chinese, Japanese, Korean, and Thai, and divides those statements into individual components or words.
Terms that provide little actual content, and thus are usually safe to remove. In Watson Explorer Engine, stopwords can be removed in two different contexts: from the end-user query and from cluster labels. Clustering stopwords should never appear as the name of a cluster under certain circumstances (usually by itself). Articles such as "a", "an", and "the" are general examples of standard stopwords, but stopwords can also be source-specific. For example, the term "ABC" is not all that useful in clustering search results retrieved by searching documents at "ABC, Inc." because, as the name of the company, many documents will contain the term "ABC", and therefore the cluster ABC would add no real value.