Working with Text Link Rules
A text link analysis rule is a Boolean query that is used to perform a match on a sentence. Text link analysis rules contain one or more of the following arguments: types, macros, literal strings, or word gaps. You must have at least one text link analysis rule in order to extract TLA results.
The following areas and fields are displayed in the Text Link Rules tab, Rule Editor:
Name field. The unique name for the text link rule.
Example field. Optionally, you can include an example
sentence or word sequence that would be captured by this rule. We recommend using examples. In this
editor, you will be able to generate tokens from this example text to see how it matches the rule
and how it will be output. A token is defined as any
word or word phrase identified during the extraction process. For
example, in the sentence My uncle lives
in New York, the following tokens might be found during
extraction: my, uncle, lives, in, and new york. Additionally, uncle could be extracted as a concept and typed as <Unknown>
, and new york could also be extracted as a concept and typed as <Location>
. All concepts are tokens
but not all tokens are concepts. Tokens can also be other macros,
literal strings, and word gaps. Only those words or word
phrases that are typed can be concepts.
Rule Value table. This table contains the elements of the rule that are used for matching a rule to a sentence. You can add or remove rows in the table using the buttons to its right. The table consists of 3 columns:
- Element column. Enter values as one or a combination
of types, literal strings, word gaps (
<Any Token>
), or macros. See the topic Supported Elements for Rules and Macros for more information. Double-click the element cell to enter the information directly. Alternatively, right-click in the cell to display a contextual menu offering lists of common macros, type names, and nonlinguistic type names. Keep in mind that if you enter the information into the cell by typing it in, precede the macro or type name with a ‘$
’ character such as$mTopic
for the macromTopic
. The order in which you create your element rows is critical to how the rule will be matched to the text. When combining arguments, you must use parentheses( )
to group the arguments and the character|
to indicate a BooleanOR
. Keep in mind that values are case-sensitive. - Quantity column. This indicates the minimum and maximum number of times the element must be found for a match to occur. For example, if you want to define a gap, or a series of words, between two other elements of anywhere from 0 to 3 words, you could choose Between 0 and 3 from the list or enter the numbers directly into the dialog box. The default is ‘Exactly 1’. In some cases you will want to make an element optional. If this is the case, then it will have a minimum quantity of 0 and a maximum quantity greater than 0 (i.e. 0 or 1, between 0 and 2). Note that the first element in a rule cannot be optional, meaning it cannot have a quantity of 0.
- Example Token column. If you click Get Tokens, the program breaks the Example text down into tokens and uses those tokens to fill this column with those that match the elements you defined. You can also see these tokens in the output table if you choose to.
Rule Output table Each row in this table defines how
the TLA pattern output will appear in the results. Rule output can produce patterns of up to six
Concept/Type column pairs, each representing a slot. For example, the type pattern
<Location> + <Positive>
is a two slot pattern meaning that it is made
up of 2 Concept/Type column pairs.
Just as language gives us the freedom to express the same basic ideas in many
different ways, so you might have a number of rules defined to capture the same basic idea. For
example, the text "Paris is a place I love" and the text "I really, really like Paris and
Florence" represent the same basic idea -- that Paris is liked -- but are expressed differently
and would require two different rules to both be captured. However, it is easier to work with the
pattern results if similar ideas are grouped together. For this reason, while you might have 2
different rules to capture these 2 phrases, you could define the same output for both rules, such as
the type pattern <Location> + <Positive>
so that it represents both
texts. And in this way, you can see that the output does not always mimic the structure or order of
the words found in the original text. Furthermore, such a type pattern could match other phrases and
could produce concept patterns such as: paris + like
and tokyo +
like
.
To help you define the output quickly with fewer errors, you can use the
context menu to choose the element you want to see in the output. Alternatively, you can also drag
and drop elements from the Rule Value table into the output. For example, if you have a rule that
contains a reference to the mTopic
macro in row 2 of the Rule Value table, and you
want that value to be in your output, you can simply drag/drop the element for
mTopic
to the first column pair in the Rule Output table. Doing so will
automatically populate both the Concept and Type for the pair you've selected. Or if you want the
output to begin with the type defined by the third element (row 3) of the rule value table, then
drag that type from the Rule Value table to the Type 1 cell in the output
table. The table will update to show the row reference in parenthesis (3).
Alternatively, you can enter these references manually into the table by
double-clicking the cell in each Concept column you want to output and
entering the $
symbol followed by the row number, such as $2
to
refer to the element defined in row 2 of the Rule Value table. When you enter the information
manually, you need to also define the Type column, enter the
#
symbol followed by the row number, such as #2
to refer to the
element defined in row 2 of the Rule Value table.
Furthermore, you might even combine methods. Let's say you had the type
<Positive>
in row 4 of your Rule Value table. You could drag it to the
Type 2
column and then double-click the cell in the Concept 2
column and then manually enter the word 'not' in front of it. The output column would then
read not (4)
in the table, or if you were in the edit mode or source mode
not $4
. Then you could right-click in the Type 1 column and select, for example,
the macro called mTopic
. Then this output could result in a concept pattern such
as: car + bad
.
Most rules have only one output row but there are times when more than one output is possible and desired. In this case, define one output per row in the Rule Output table.
t$3\t#3
, this means that
the pattern will ultimately display the final concept for the third element and the final type for
the third element after all linguistic processing is applied (synonyms and other groupings).- Show output as. By default, the option References to row in Rule Value table is selected and the output is shown by using the numerical references to the row as defined in the Rule Value tab. If you previously clicked Get Tokens and have tokens in the Example Tokens column in the Rule Value table, you can choose to see the output for these specific tokens by choosing the option .
Example Rule
Let's suppose your resources contain the following text link analysis rule and that you have enabled the extraction of TLA results:

Whenever you extract, the extraction engine will read each sentence and will try to match the following sequence:
Element (row) | Description of the arguments |
---|---|
1
|
The concept from one of the types represented by the macros mPos or
mNeg or from the type <Uncertain> . |
2
|
A concept typed as one of the types represented by the macro mTopic . |
3
|
One of the words represented by the macro mBe . |
4
|
An optional element, 0 or 1 words, also referred to as a word gap or <Any
Token>
|
5
|
A concept typed as one of the types represented by the macro mTopic . |
The output table shows that all that is wanted from this rule is a pattern
where any concept or type corresponding to the mTopic
macro that was defined in row
5 in the Rule Value table
+
any concept or type corresponding to the mPos
,
mNeg
, or <Uncertain>
as was defined in row 1 in the
Rule Value table. This could be sausage + like
or
<Unknown> + <Positive>
.