Supported Elements for Rules and Macros
The following arguments are accepted for the value parameters in text link analysis rules and macros:
Macros
You can use a macro directly in a text link analysis rule or within another
macro. If you are entering the macro name by hand or from within the source view (as opposed to
selecting the macro name from a context menu), make sure to prefix the name with a dollar sign
character ($), such as $mTopic. The macro name is case sensitive.
You can choose from any macro defined in the current Text Link Rules tab when selecting macros
through the context menus.
Types
You can use a type directly in a text link analysis rule or macro. If you are
entering the type name by hand or in the source view (as opposed to selecting the type from a
context menu), make sure to prefix the type name with a dollar sign character ($),
such as $Person. The type name is case sensitive. If you use the context menus, you
can choose from any type from the current set of resources being used.
If you reference an unrecognized type, you will receive a warning message, and the rule will have a warning icon in the Rules and Macro Tree until you correct it.
Literal Strings
To include information that was never extracted, you can define a literal
string for which the extraction engine will search. All extracted words or phrases have been
assigned to a type and for this reason, they cannot be used in literal strings. If you use a word
that was extracted, it will be ignored, even if its type is <Unknown>.
A literal string can be one or more words. The following rules apply when defining a list of literal strings:
- Enclose the list of strings in parentheses such as (his). If there is a
choice of literal strings then each string must be separated by the OR operator, such as
(a|an|the)or(his|hers|its). - Use single or compound words.
- Separate each word in the list by the
|character, which is like a BooleanOR. - Enter both singular and plural forms if you want to match both. Inflection is not automatically generated.
- Use lower case only.
- To reuse literal strings, define them as a macro and then use that macro in your other macros and text link analysis rules.
- If a string contains periods (full stops) or hyphens, you must include them.
For example, to match
a.k.ain the text, enter the periods along with the lettersa.k.aas the literal string.
Exclusion Operator
Use ! as an exclusion operator to stop any expression of the
negation from occupying a particular slot. You can only add an exclusion operator by hand through
inline cell editing (double-click the cell in the Rule Value table or Macro Value table) or in the
source view. For example, if you add $mTopic @{0,2} !($Positive) $Budget to your
text link analysis rule, you are looking for text that contains (1) a term assigned to any of the
types in the mTopic macro, (2) a word gap of zero to two words long, (3) no
instances of a term assigned to the <Positive> type, and (4) a term assigned
to the <Budget> type. This might capture "cars have an inflated price
tag" but would ignore "store offers amazing discounts".
To use this operator, you must enter the exclamation point and parenthesis manually into the element cell by double-clicking the cell.
Word Gaps (<Any Token>)
A word gap, also referred to as <Any Token>, defines a
numeric range of tokens that may be present between two elements. Word gaps are very useful when
matching very similar phrases that may differ only slightly due to the presence of additional
determiners, prepositional phrases, adjectives, or other such words.
| # | Element |
| 1 |
Unknown
|
| 2 |
mBeHave
|
| 3 |
Positive
|
Note: In the source view this value is defined as: $Unknown $mBeHave
$Positive
This value will match sentences like "the hotel staff was nice”, where
hotel staff belongs to type <Unknown>, was is under the macro
mBeHave and nice is <Positive>. But it will not match
“the hotel staff was very nice”.
| # | Element |
| 1 |
Unknown
|
| 2 |
mBeHave
|
| 3 |
|
| 4 |
Positive
|
Note: In the source view this value is defined as: $Unknown $mBeHave @{0,1}
$Positive
If you add a word gap to your rule value, it will match both “the hotel staff was nice” and “the hotel staff was very nice”.
In the source view or with inline editing, the syntax for a word gap is
@{#,#}, where @ signifies a word gap and the
{#,#} defines the minimum and maximum of words accepted between the preceding
element and following element. For example, @{1,3} means that a match can be made
between the two defined elements if there is at least one word present but no more than three words
appearing between those two elements. @{0,3} means that a match can be made between
the two defined elements if there is 0, 1, 2 or 3 words present but no more than three words.