Annotators extract words and phrases from a dataset that satisfy the rules of the annotator being used.
There are four built-in annotators available to all collections.
- PII Annotator
- The PII Annotator is a Character Pattern annotator that extracts personally identifiable information (PII) from text using regular expressions.
- Machine Learning Annotator
- The Machine Learning Annotator allows users to extract entities and relationships from documents using pre-trained models.
- Part of Speech
- The Part of Speech annotator extracts words and phrases from the given text and marks these extractions as annotations. For more information, see Part of Speech annotator.
- Named Entity Recognition
- The Named Entity Recognition annotator extracts person names, locations, and organizations from the given text and marks these extractions as annotations. For more information, see Named Entity Recognition annotator
- Sentiment Analysis
- The Sentiment Analysis annotator extracts phrases and expressions which convey sentiment, and marks these extractions as annotations. For more information, see Sentiment Analysis annotator.
There are also four annotators that you can create.
- The Dictionary annotator uses a dictionary that contains words meaningful to your business. To improve text analysis, you can specify equivalent terms and different grammatical forms of the words. You can also map the words in a dictionary to a facet. To use a dictionary annotator, you must create or upload a dictionary containing your terms. For more information, see Dictionary annotator.
- Character Pattern
- The Character Pattern annotator uses regular expressions to extract tokens from the text and then assign facets to the tokens.
- UIMA Pipeline
- The UIMA Pipeline annotator extracts text annotations using a UIMA pipeline configured in Content Analytics Studio. To use a UIMA pipeline annotator, you must first create the annotator in Content Analytics Studio. For more information, see UIMA Pipeline annotator.