Annotators

When text is processed, annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations can be used to enhance content analysis.

For text to be analyzed by annotators, you must enable the annotators that you want to use. When annotators are enabled for a collection, fields and facets are returned to Watson™ Explorer through the text analytics API.

The following annotators are provided with Annotation Administration Console:

Language Identification Annotator: Automatically determines the language based on the content of the input text and the language options that were configured for the collection.
Linguistic Analysis Annotator: Analyzes the basic syntactic attributes of the input text to identify sentence ranges, lemmas of words, and parts of speech.
Named Entity Recognition Annotator: Uses predefined rules to extract person names, locations, and organizations from the input text and maps this information to facets.
Content Classification Annotator: Adds metadata based on fields and category scores that are defined in IBM® Content Classification.

In addition to the provided annotators, you can configure each collection to use a custom annotator. To use a custom annotator, you must add a custom text analysis engine to the system and configure one or more collections to use it. For example, you might create a custom annotator to capture specific text analysis results, and then map the results to a relational database.

To create a custom annotator, you can use IBM Watson Explorer Content Analytics Studio as an alternative to manually developing annotators with the Unstructured Information Management Architecture (UIMA) software development kit (SDK).

Content Analytics Studio is a separately installable component of IBM Watson Explorer Advanced Edition.