Working with Text Link Rules

A text link analysis rule is a Boolean query that is used to perform a match on a sentence. Text link analysis rules contain one or more of the following arguments: types, macros, literal strings, or word gaps. You must have at least one text link analysis rule in order to extract TLA results.

The following areas and fields are displayed in the Text Link Rules tab, Rule Editor:

Name field. The unique name for the text link rule.

Example field. Optionally, you can include an example sentence or word sequence that would be captured by this rule. We recommend using examples. In this editor, you will be able to generate tokens from this example text to see how it matches the rule and how it will be output. A token is defined as any word or word phrase identified during the extraction process. For example, in the sentence My uncle lives in New York, the following tokens might be found during extraction: my, uncle, lives, in, and new york. Additionally, uncle could be extracted as a concept and typed as <Unknown>, and new york could also be extracted as a concept and typed as <Location>. All concepts are tokens but not all tokens are concepts. Tokens can also be other macros, literal strings, and word gaps. Only those words or word phrases that are typed can be concepts.

Rule Value table. This table contains the elements of the rule that are used for matching a rule to a sentence. You can add or remove rows in the table using the buttons to its right. The table consists of 3 columns:

  • Element column. Enter values as one or a combination of types, literal strings, word gaps (<Any Token>), or macros. See the topic Supported Elements for Rules and Macros for more information. Double-click the element cell to enter the information directly. Alternatively, right-click in the cell to display a contextual menu offering lists of common macros, type names, and nonlinguistic type names. Keep in mind that if you enter the information into the cell by typing it in, precede the macro or type name with a ‘$’ character such as $mTopic for the macro mTopic. The order in which you create your element rows is critical to how the rule will be matched to the text. When combining arguments, you must use parentheses ( ) to group the arguments and the character | to indicate a Boolean OR. Keep in mind that values are case-sensitive.
  • Quantity column. This indicates the minimum and maximum number of times the element must be found for a match to occur. For example, if you want to define a gap, or a series of words, between two other elements of anywhere from 0 to 3 words, you could choose Between 0 and 3 from the list or enter the numbers directly into the dialog box. The default is ‘Exactly 1’. In some cases you will want to make an element optional. If this is the case, then it will have a minimum quantity of 0 and a maximum quantity greater than 0 (i.e. 0 or 1, between 0 and 2). Note that the first element in a rule cannot be optional, meaning it cannot have a quantity of 0.
  • Example Token column. If you click Get Tokens, the program breaks the Example text down into tokens and uses those tokens to fill this column with those that match the elements you defined. You can also see these tokens in the output table if you choose to.

Rule Output table Each row in this table defines how the TLA pattern output will appear in the results. Rule output can produce patterns of up to six Concept/Type column pairs, each representing a slot. For example, the type pattern <Location> + <Positive> is a two slot pattern meaning that it is made up of 2 Concept/Type column pairs.

Note: Terms in the Element column of the Rule Value table, or in any of the Concept columns of the Rule Output table cannot start with any of the following characters: `, #, %, ^, *, _, -, :, <, >, /, \, or ".

Just as language gives us the freedom to express the same basic ideas in many different ways, so you might have a number of rules defined to capture the same basic idea. For example, the text "Paris is a place I love" and the text "I really, really like Paris and Florence" represent the same basic idea -- that Paris is liked -- but are expressed differently and would require two different rules to both be captured. However, it is easier to work with the pattern results if similar ideas are grouped together. For this reason, while you might have 2 different rules to capture these 2 phrases, you could define the same output for both rules, such as the type pattern <Location> + <Positive> so that it represents both texts. And in this way, you can see that the output does not always mimic the structure or order of the words found in the original text. Furthermore, such a type pattern could match other phrases and could produce concept patterns such as: paris + like and tokyo + like.

To help you define the output quickly with fewer errors, you can use the context menu to choose the element you want to see in the output. Alternatively, you can also drag and drop elements from the Rule Value table into the output. For example, if you have a rule that contains a reference to the mTopic macro in row 2 of the Rule Value table, and you want that value to be in your output, you can simply drag/drop the element for mTopic to the first column pair in the Rule Output table. Doing so will automatically populate both the Concept and Type for the pair you've selected. Or if you want the output to begin with the type defined by the third element (row 3) of the rule value table, then drag that type from the Rule Value table to the Type 1 cell in the output table. The table will update to show the row reference in parenthesis (3).

Alternatively, you can enter these references manually into the table by double-clicking the cell in each Concept column you want to output and entering the $ symbol followed by the row number, such as $2 to refer to the element defined in row 2 of the Rule Value table. When you enter the information manually, you need to also define the Type column, enter the # symbol followed by the row number, such as #2 to refer to the element defined in row 2 of the Rule Value table.

Furthermore, you might even combine methods. Let's say you had the type <Positive> in row 4 of your Rule Value table. You could drag it to the Type 2 column and then double-click the cell in the Concept 2 column and then manually enter the word 'not' in front of it. The output column would then read not (4) in the table, or if you were in the edit mode or source mode not $4. Then you could right-click in the Type 1 column and select, for example, the macro called mTopic. Then this output could result in a concept pattern such as: car + bad.

Most rules have only one output row but there are times when more than one output is possible and desired. In this case, define one output per row in the Rule Output table.

Important: Keep in mind that other linguistic handling operations are performed during the extraction of TLA patterns. So when the output reads t$3\t#3, this means that the pattern will ultimately display the final concept for the third element and the final type for the third element after all linguistic processing is applied (synonyms and other groupings).
  • Show output as. By default, the option References to row in Rule Value table is selected and the output is shown by using the numerical references to the row as defined in the Rule Value tab. If you previously clicked Get Tokens and have tokens in the Example Tokens column in the Rule Value table, you can choose to see the output for these specific tokens by choosing the option .
Note: If there are not enough concept/type output pairs shown in the output table, you can add another pair by clicking the Add button in the editor toolbar. If 3 pairs are currently shown and you click add, 2 more columns (Concept 4 and Type 4) are added to the table. This means that you will now see 4 pairs in the output table for all rules. You can also remove unused pairs as long as no other rule in the set of rules in this library uses that pair.

Example Rule

Let's suppose your resources contain the following text link analysis rule and that you have enabled the extraction of TLA results:

Figure 1. Text Link Rules tab: Rule Editor
Text Link Rules tab: Rule Editor

Whenever you extract, the extraction engine will read each sentence and will try to match the following sequence:

Table 1. Extraction sequence example
Element (row) Description of the arguments
1 The concept from one of the types represented by the macros mPos or mNeg or from the type <Uncertain>.
2 A concept typed as one of the types represented by the macro mTopic.
3 One of the words represented by the macro mBe.
4 An optional element, 0 or 1 words, also referred to as a word gap or <Any Token>
5 A concept typed as one of the types represented by the macro mTopic.

The output table shows that all that is wanted from this rule is a pattern where any concept or type corresponding to the mTopic macro that was defined in row 5 in the Rule Value table + any concept or type corresponding to the mPos, mNeg, or <Uncertain> as was defined in row 1 in the Rule Value table. This could be sausage + like or <Unknown> + <Positive>.