Pattern Matcher annotator

The Pattern Matcher annotator captures patterns that are constructed from one or more words in the input text. The text is mapped to predefined facets for the parts of speech, such as nouns and verbs, and phrase patterns, such as a noun sequence.

The Pattern Matcher annotator can be used with content analytics collections only.

In the administration console, an administrator can configure rules for the patterns that are to be extracted and analyzed and associates the rules with facets. When the annotator runs, it uses the rules to extract the defined patterns of text. Pattern matching during text analysis is case-sensitive.

If you use IBM® Content Analyzer and have user-defined pattern definitions (rule files) that you use with Pattern Matcher, you can use the pattern definitions with Watson Explorer Content Analytics if both of the following conditions are met:

This annotator captures patterns constructed from one or more words in the input text. A pattern is a sequence of words with constraints. The following constraints are available:

Table 1. Constraints in pattern matching
Constraint Description Example
str Surface string (the exact characters that appear in the input text) ate
lex Lemma of the word eat
pos The part of speech that the word represents noun
ftrs Additional features (attributes) of the words proper
category The facet path assigned by the Dictionary Lookup annotator $.myword
guard If a word is set as a guard, it matches against a word that meets other constraints (as usual), or the beginning or end of the sentence. For example, if you want to capture the sequence of exactly two nouns, the pattern is "!noun" "noun" "noun" "!noun". But a match does not result if two nouns appear at the beginning of the sentence because the first element does not match. Set guard="true" for the first and the last elements to guard the inner two nouns, which are the ones that you want. The default value is guard="false".  

Content analytics collections have predefined pattern definitions to provide default text analytics capability. The following facets are defined by default for part-of-speech analysis. Part-of-speech analysis is provided for all languages.

Table 2. Facets for the parts of speech
Facet path Facet name
$._word.noun.general General Noun
$._word.noun.unk Unknown
$._word.verb Verb
$._word.adj Adjective
$._word.adv Adverb
$._word.conj Conjunction
$._word.intj Interjection
$._word.num Numeral

The following facets are defined by default for phrase analysis. Phrase analysis is not the same for all languages. For example, some facets are not used for some languages.

Table 3. Facets for phrase analysis
Facet path Facet name
$._phrase.noun_phrase.nouns Noun Sequence
$._phrase.noun_phrase.mod_noun Modified Noun
$._phrase.noun_phrase.adp_noun Preposition Noun
$._phrase.pred_phrase.adv_pred Predicate with Adverb
$._phrase.pred_phrase.noun_pred Noun - Predicate
$._phrase.pred_phrase.verb_noun Verb - Noun
$._phrase.conj_phrase.resultative Resultative Conjunction
$._phrase.conj_phrase.contradictory Contradictory Conjunction