IBM Content Analytics with Enterprise Search, Version 3.0.0

Annotators

IBM® Content Analytics with Enterprise Search provides a number of UIMA annotators for advanced text analysis.

When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis. Some annotators support user-defined dictionaries, user-defined rules, and custom configurations.

When configuring the document processing pipeline for a collection, an administrator selects the annotators to be used. Some of the key functions the annotators support include:

Populating the common analysis structure to a relational database with specific text analysis results.
Capturing special words of interest as the subject of text analytics.
Capturing patterns of words as the subject of text analytics.
Capturing named entities, such as persons, places, and organization names.
Categorizing documents.
Fundamental text analytics, such as parsing content to identify sentence ranges, lemmas of words, and parts of speech.
Multilingual text analytics capabilities. The results of analytics can vary based on the language of the input document.

Feedback