Part-of-speech tag sets

Depending on the language of the documents that are analyzed, Content Analytics Studio uses several kinds of tag sets for part-of-speech tagging. These part-of-speech tags appear as a property value of the uima.tt.TokenAnnotation type when a document is analyzed with a UIMA pipeline.

The following tables list the part-of-speech tags that are used for English and other languages such as German, French, and Arabic. For information about the part-of-speech tags that are used for Hebrew, Korean, Turkish, Chinese, and Japanese documents, see the Content Analytics Studio context-senstive help.

English tag set

The following part-of-speech tags are used for English documents.

Table 1. List of part-of-speech tags that are used for English documents
Part-of-speech tag Description
UNKNOWN Unknown word
DT Determiner
QT Quantifier
CD Cardinal number
NN Noun, singular
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
EX Existential there, such as in the sentence There was a party.
PRP Personal pronoun (PP)
PRP$ Possessive pronoun (PP$)
POS Possessive ending
RBS Adverb, superlative
RBR Adverb, comparative
RB Adverb
JJS Adjective, superlative
JJR Adjective, comparative
JJ Adjective
MD Modal
VB Verb, base form
VBP Verb, present tense, other than third person singular
VBZ Verb, present tense, third person singular
VBD Verb, past tense
VBN Verb, past participle
VBG Verb, gerund or present participle
WDT Wh-determiner, such as which in the sentence Which book do you like better
WP Wh-pronoun, such as which and that when they are used as relative pronouns
WP$ Possessive wh-pronoun, such as whose
WRB Wh-adverb, such as when in the sentence I like it when you make dinner for me
TO The preposition to
IN Preposition or subordinating conjunction
CC Coordinating conjunction
UH Interjection
RP Particle
SYM Symbol
$ Currency sign
'' Double or single quotation marks
( Opening parenthesis, bracket, angle bracket, or brace
) Closing parenthesis, bracket, angle bracket, or brace
, Comma
. End of sentence punctuation (. ! ?)
: Mid-sentence punctuation (: ; ... -- -)

Simplified tag set

The following part-of-speech tags are used for documents in languages other than English.

Table 2. List of part-of-speech tags that are used for non-English documents
Part-of-speech tag Description
UKW Unknown word
CC Coordinating conjunction
CD Cardinal number
DT Determiner
IN Preposition or subordinating conjunction
JJ Adjective
MD Modal
NN Noun
NNP Proper noun
PRP Pronoun
QT Quantifier
RB Adverb
SYM Symbol, including all types of punctuation
UH Interjection
VB Verb
WH Wh-word, such as the equivalent of what