Forced Definitions
When extracting information from your documents , the extraction engine scans the text and identifies the part of speech for every word it encounters. In some cases, a word could fit several different roles depending on the context. If you want to force a word to take a particular part-of-speech role or to exclude the word completely from processing, you can do so in the Forced Definition section of the Advanced Resources tab. See the topic About Advanced Resources for more information.
To force a part-of-speech role for a given word, you must add a line to this section using the following syntax:
term:code
Entry | Description |
---|---|
term
|
A term name. |
code
|
A single-character code representing the part-of-speech role. You can list up to six
different part-of-speech codes per uniterm. Additionally, you can stop a word from being extracted
into compound words/phrases by using the lowercase code s , such as
additional:s . |
Formatting Rules for Forced Definitions
- One line per word.
- Terms cannot contain a colon.
- Use the lowercase
s
as a part-of-speech code to stop a word from being extracted altogether. - Use up to six part-of-speech codes per line. Supported part-of speech codes are shown in the Extraction Patterns section. See the topic Extraction patterns for more information.
- Use the asterisk character (*) as a wildcard at the end of a string for
partial matches. For example, if you enter
add*:s
, words such asadd
,additional
,additionally
,addendum
, andadditive
are never extracted as a term or as part of a compound word term. However, if a word match is explicitly declared as a term in a compiled dictionary or in the forced definitions, it will still be extracted. For example, if you enter bothadd*:s
andaddendum:n
,addendum
will still be extracted if found in the text.