Dictionary Lookup annotator

The Dictionary Lookup annotator searches for user-defined words in the input text and marks the words with the associated facet path. The facet path is used by the Pattern Matcher annotator.

The Dictionary Lookup annotator can be used with content analytics collections only.

In the administration console, an administrator can configure a dictionary and define the words to be analyzed. The administrator defines words to be used as the subject of analysis, defines equivalent terms for the words, and associates the words with facets. The dictionary editor allows only nouns, although an administrator can define aliases for words in the editor.

The definitions are used for both tokenization and document counting. In the Facets view of the content analytics miner user interface, occurrences of aliases are counted as occurrences of their normal forms.

Dictionary matching during text analysis is case-sensitive. In addition, a dictionary has an associated language. All definitions in the dictionary are activated only for documents of that language. Administrators can define different definitions to support various languages.

A content analytics collection converts user-defined words and aliases in the dictionary for analysis by the Linguistic Analysis annotator to ensure that user-defined words are correctly tokenized as one word. However, because Linguistic Analysis uses syntactic and statistical rules in addition to dictionary definitions, literal occurrences of user-defined words in the input text might not always be captured by the Dictionary Lookup annotator.

If you use IBM® Content Analyzer and have user-defined dictionaries that you use with Dictionary Lookup, you can use the dictionaries with Watson Explorer Content Analytics if both of the following conditions are met:

The new facet path structure is the same as the existing category path structure.
The appropriate language is set for the definition files.