Rule lab
The Rule lab is an interactive tool within the Text Analytics Workbench. You can find it on the Resource editor tab. Use it to test and refine text link analysis (TLA) rules before you apply them to your full dataset.
If you want to create new text link rules or understand how certain sentences are matched during text link analysis, you can use a sample piece of text to run simulations with text link rules. You can run simulations in the Rule lab. It provides a testing environment where you can validate how your TLA rules match against sample text data. The Rule lab helps you understand rule behavior and refine your rules to ensure that they capture the patterns that you intend. You can use Rule lab to quickly modify rules and rerun simulations to improve pattern matching without processing your entire dataset.
Using the Rule lab
You run tests in the Rule lab by entering sample text data and then running the extraction process on the sample. The Rule lab uses the same set of linguistic resources and extraction settings as the full text link analysis process. These similar settings help ensure that the test results accurately reflect how rules are later applied to your complete dataset. You can then check the results of the text link analysis for the sample text to better understand how matching occurs.
During the extraction process, the sample text is parsed into sentences. And these sentences are then broken down into tokens. Your TLA rules are then applied to identify matching patterns in the tokenized text. The final results show the tokens for each sample and any TLA rules that match a pattern that is extracted from the text.
- Tokens
-
Tokens are words or word phrases that are identified during the extraction process. For example, in the sentence My uncle works in New York, the following tokens might be found during extraction: my, uncle, works, in, and new york. Also, uncle could be extracted as a concept and typed as
<Unknown>, and new york could also be extracted as a concept and typed as<Location>. All concepts are tokens but not all tokens are concepts. Tokens can also be other macros, literal strings, and non-extracted words. Only those words or word phrases that are typed can be concepts. - Macros
-
A macro is a reusable group of text link analysis rule elements that combines types, literal strings, other macros, and similar values by using the
ORoperator (|). The value of a macro is case sensitiveEach macro has a unique name, and the name is prefixed with a lowercase
m, such asmTopic. When a text link analysis rule references a macro, the name of the macro is prefixed with$, for example$mTopic. The macro name is case sensitive.You reuse macros in multiple TLA rules. If you build TLA rules by using macros, it can help simplify the TLA rules by simplifying complex rules into reusable chunks. You can then update the shared logic in one macro instead of editing multiple TLA rules.
Generating new TLA rules
You can automatically generate new rules based on the simulation results by clicking Generate rule. The new rule uses the actual tokens and types that are identified in your sample text. SPSS Modeler takes the following actions to make the new TLA rule:
- It uses the sample text as the example for the TLA rule.
- It automatically uses the first six tokens that are matched to generate the
output sequentially. This automation can save you adding the basic details
for the rule manually. However, the rule is likely to require further editing.
For example, in the sentence “No TV, internet or wifi provided”, each token is parsed (
<no><TV><,><internet><or><wifi><provided>). These tokens are then typed and used in the output for the rule in the exact order in which they appear:#@# no TV, internet or wifi provided [pattern(40)] name=generated_rule_40 value=$mMiscNeg $mTopic $SEP $mTopic $mCoord $mTopic $mSupport output(1)=$1\t#1\t$2\t#2\t$3\t#3\t$4\t#4\t$5\t#5\t$6\t#6 - It inserts tokens that were typed or matched to macros during the simulation into the rule definition.
- It prioritizes macros over type values when a token matches both, which simplifies the rule
structure. For example, in the sentence “I like pizza”, "pizza" is typed
as
<Unknown>and matched to the macromTopic. In this case,mTopicis used as the element in the generated rule.
After you generate a rule, you can verify that it correctly matches the pattern by returning to the Rule lab to test the new rule against your sample.