As discussed in the overview, the knowledge that is used by IBM® Watson™ Explorer Engine for clustering is determined by the language combination that is used in your project and the knowledge bases contained in your project's language.custom-kbs variable. This is set to an ordered list of knowledge bases, which can be predefined (see list below) or user-defined (see next section). The knowledge base listed last takes precedence when the same word or phrase appears in more than one knowledge base.

High-level language variables (language.main, language.other, and language.custom-kbs) are used to instantiate the following low-level language options: stoplist, stem and segmenter. To see the real value of these low-level options, turn debugging on from the results page using your project and clicking the [+more] link at the end of the last section called Modified Variables. You should be able to see the values of all the high-level language variables as well as the low-level ones.

The low-level stem option specifies an ordered list of stemmers (stemmers are algorithms that match words with the same roots, such as cluster, clusters, and clustering).

Your first and foremost decision is to decide what are the main and secondary languages you want to use in your search application. All the relevant language components (stoplists, knowledge bases, stemmers) are automatically added for this language combination. For a list of all the available languages, see Language Configuration in Watson Explorer Engine.

For up-to-date lists of the predefined stoplists, knowledge bases, and stemmers that are available in the Watson Explorer Engine release which you are using, see Specific Language Components Available in Watson Explorer Engine.

The second decision you must make is whether you want to use any domain stoplists or custom knowledge bases that are relevant to your application. Domain stoplists and knowledge bases in the version of Watson Explorer Engine that you are using are:

  • Domain stoplists:
    • ads
    • business
    • chemistry
    • computers
    • core
    • drugs
    • email
    • government
    • medicine
    • news
    • patents
    • physics
    • science
    • shopping
    • support
    • web
  • Knowledge bases:
    • csol
    • custom
    • doc-search-kb
    • unbreak-span
Note: csol, a conceptual search ontolection knowledge base, and the doc-search-kb knowledge base are both used internally by Watson Explorer Engine. The custom knowledge base that is also available can be modified to include customized information.

To proceed to the next step of this tutorial, click Adding Knowledge.