Viewing and working in source mode

For each rule and macro the TLA editor generates the underlying source code that is used by the Extractor for matching and producing TLA output. If you prefer to work with the code itself, you can view this source code and edit it directly by clicking the “View Source” button at the top of the Editor. The Source view will jump to and highlight the currently selected rule or macro. However, we recommend using the editor panes to reduce the chance of errors.

When you have finished viewing or editing the source, click Exit Source. If you generate invalid syntax for a rule, you will be required to fix it before you exit the source view.

Important: If you edit in the source view, we strongly recommend that you edit rules and macros one at a time. After editing a macro, please validate the results by extracting. If you are satisfied with the result, we recommend that you save the template before making another change. If you are not satisfied with the result or an error occurs, revert to your saved resources.

Macros in the Source View

[macro]
name = macro_name
value = ([type_name|macro_name|literal_string|word_gap])
Table 1. Macro entries
[macro] Each macro must begin with the line marked [macro] to denote the beginning of a macro.
name The name of the macro definition. Each name must be unique.
value A combination of one or more types, literal strings, word gaps, or macros.See the topic Supported Elements for Rules and Macros for more information. When combining arguments, you must use parentheses ( ) to group the arguments and the character | to indicate a Boolean OR.

In addition to the guidelines and syntax covered in the section on Macros, the source view has a few additional guidelines that aren't required when working in the editor view. Macros must also respect the following when working in source mode:

  • Each macro must begin with the line marked [macro] to denote the beginning of a macro.
  • To disable an element, place a comment indicator (#) before each line.

Example. This example defines a macro called mTopic. The value for mTopic is the presence of a term matching one of the following types: <Product>, <Person>, <Location>, <Organization>, <Budget>, or <Unknown>.

[macro]
name=mTopic
value=($Unknown|$Product|$Person|$Location|$Organization|$Budget|$Currency)

Rules in the Source View

[pattern(ID)]
name = pattern_name
value = [$type_name|macro_name|word_gaps|literal_strings]
output = $digit[\t]#digit[\t]$digit[\t]#digit[\t]$digit[\t]#digit[\t]
Table 2. Rule entries
[pattern (<ID>)] Indicates the start of a that text link analysis rule and provides a unique numerical ID use to determine processing order.
name Provides a unique name for this text link analysis rule.
value Provides the syntax and arguments to be matched to the text. See the topic Supported Elements for Rules and Macros for more information.
output The output format for the resulting matched patterns discovered in the text. The output does not always resemble the exact original position of elements in the source text. Additionally, it is possible to have multiple output lines for a given text link analysis rule by placing each output on a separate line.

Syntax for output:

  • Separate output with the tab code \t, such as $1\t#1\t$3\t#3
  • $ and a number calls for the term found matching the argument defined in the value parameter in that position. So $1 means the term matching the first argument defined for the value.
  • # and a number calls for the type name of the element in that position. If an item is a list of literal strings, the type <Unknown> will be assigned.
  • A value of Null\tNull will not create any output.

In addition to the guidelines and syntax covered in the section on Rules, the source view has a few additional guidelines that aren't required when working in the editor view. Rules must also respect the following when working in source mode:

  • Whenever two or more elements are defined, they must be enclosed in parentheses whether or not they are optional (for example, ($Negative|$Positive)  or  ($mCoord|$SEP)?). $SEP represents a comma.
  • The first element in a text link analysis rule cannot be an optional element. For example, you cannot begin with value = $mTopic? or value = @{0,1}.
  • It is possible to associate a quantity (or instance count) to a token. This is useful in writing only one rule that encompasses all cases instead of writing a separate rule for each case. For example, you may use the literal string ($SEP|and) if you are trying to match either , (comma) or and. If you extend this by adding a quantity so that the literal string becomes ($SEP|and){1,2}, you will now match any of the following instances: "," "and" ", and".
  • Spaces are not supported between the macro name and the $ and ? characters in the text link analysis rule value.
  • Spaces are not supported in the text link analysis rule output.
  • To disable an element, place a comment indicator (#) before each line.

Example. Let's suppose your resources contain the following TLA text link analysis rule and that you have enabled the extraction of TLA results:

## Jean Doe was the former HR director of IBM in France
[pattern(201)]
name= 1_201
value = $Person ($SEP|$mDet|$mSupport|as|then){1,2} @{0,1} $Function
 (of|with|for|in|to|at) @{0,1} $Organization @{0,2} $Location
output = $1\t#1\t$4\t#4\t$7\t#7\t$9\t#9

Whenever you extract, the extraction engine will read each sentence and will try to match the following sequence:

Table 3. Extraction sequence example
Position Description of the arguments
1 The name of a person ($Person),
2 One or two of the following: comma ($SEP), determiner ($mDet), auxiliary verb ($mSupport), the strings “then” or “as”,
3 0 or 1 word (@{0,1})
4 A function ($Function)
5 One of the following strings: “of”, “with”, “for”, “in”, “to”, or “at”,
6 0 or 1 word (@{0,1})
7 The name of an organization ($Organization)
8 0, 1, or 2 words (@{0,2})
9 The name of a location ($Location)

This sample text link analysis rule would match sentences or phrases like:

Jean Doe, the HR director of IBM in France

Jean Doe was the former HR director of IBM in France

IBM appointed Jean Doe as the HR director of IBM in France

This sample text link analysis rule would produce the following output:

jean doe <Person> hr director <Function> ibm <Organization> france <Location>

Where:

  • jean doe is the term corresponding to $1 (the first element in the text link analysis rule) and <Person> is the type for jean doe (#1),
  • hr director is the term corresponding to $4 (the 4th element in the text link analysis rule) and <Function> is the type for hr director (#4),
  • ibm is the term corresponding to $7 (the 7th element in the text link analysis rule) and <Organization> is the type for ibm. (#7),
  • france is the term corresponding to $9 (the 9th element in the text link analysis rule) and <Location> is the type for france (#9)

Rule Sets in the Source View

[set(<ID>)]

Where [set (<ID>)] indicates the start of a rule set and provides a unique numerical ID use to determine processing order of the sets.

Example. The following sentence contains information about individuals, their function within a company, and also the merge/acquisition activities of that company.

Org1 Inc has entered into a definitive merger agreement with Org2 Ltd, said John Doe, CEO of Org2 Ltd.

You could write one rule with several outputs to handle all possible output such as:

## Org1 Inc entered into a definitive merger agreement with Org2 Ltd, said 
John Doe, CEO of Org2 Ltd.

[pattern(020)]
name=020
value = $Organization @{0,4} $ActionNouns @{0,6}  $mOrg @{1,2}
 $Person @{0,2} $Function @{0,1} $Organization
output = $1\t#1\t$3\t#3\t$5\t#5
output = $7\t#7\t$9\t#9\t$11\t#11

which would produce the following 2 output patterns:

  • org1 inc<Organization> + merges with <ActiveVerb> + org2 ltd<Organization>
  • john doe <Person> + ceo <Function> + org2 ltd<Organization>

Important! Keep in mind that other linguistic handling operations are performed during the extraction of TLA patterns. In this case, merger is grouped under merges with during the synonym grouping phase of the extraction process. And since merges with belongs to <ActiveVerb> type, this type name is what appears in the final TLA pattern output. So when the output reads t$3\t#3, this means that the pattern will ultimately display the final concept for the third element and the final type for the third element after all linguistic processing is applied (synonyms and other groupings).

Instead of writing complex rules like the preceding, it can be easier to manage and work with two rules. The first is specialized in finding out mergers/acquisitions between companies:

[set(1)]
## Org1 Inc has entered into a definitive merger agreement with Org2 Ltd
[pattern(44)]
name=firm + action + firm_0044
value=$mOrg @{0,20} $ActionNouns @{0,6} $mOrg
output(1)=$1\t#1\t$3\t#3\t$5\t#5

which would produce org1 inc<Organization> + merges with <ActiveVerb> + org2 ltd <Organization>

The second is specialized in individual/function/company:

[set(2)]
## said John Doe, CEO of Org2 Ltd
[pattern(52)]
name=individual + role + firm_0007
value=$Person @{0,3} $mFunction (at|of)? ($mOrg|$Media|$Unknown)
output(1)=$1\t#1\t$3\tFunction\t$5\t#5

which would produce john doe <Person> + ceo <Function> + org2 ltd <Organization>