Forced Definitions

When extracting information from your documents , the extraction engine scans the text and identifies the part of speech for every word it encounters. In some cases, a word could fit several different roles depending on the context. If you want to force a word to take a particular part-of-speech role or to exclude the word completely from processing, you can do so in the Forced Definition section of the Advanced Resources tab. See the topic About Advanced Resources for more information.

To force a part-of-speech role for a given word, you must add a line to this section using the following syntax:

	term:code
Table 1. Syntax description
Entry Description
term A term name.
code A single-character code representing the part-of-speech role. You can list up to six different part-of-speech codes per uniterm. Additionally, you can stop a word from being extracted into compound words/phrases by using the lowercase code s, such as additional:s.

Formatting Rules for Forced Definitions

  • One line per word.
  • Terms cannot contain a colon.
  • Use the lowercase s as a part-of-speech code to stop a word from being extracted altogether.
  • Use up to six part-of-speech codes per line. Supported part-of speech codes are shown in the Extraction Patterns section. See the topic Extraction patterns for more information.
  • Use the asterisk character (*) as a wildcard at the end of a string for partial matches. For example, if you enter add*:s, words such as add, additional, additionally, addendum, and additive are never extracted as a term or as part of a compound word term. However, if a word match is explicitly declared as a term in a compiled dictionary or in the forced definitions, it will still be extracted. For example, if you enter both add*:s and addendum:n, addendum will still be extracted if found in the text.