Creating parsing rules

You can configure parsing rules that are based on sample patterns of text.

About this task

Before you create parsing rules, you must create a parsing rules database and include the compiled JAR file in the parsing rules stage of your UIMA pipeline. You can then analyze the sample text that contains the patterns to use for the basis of the rules. After the sample text is analyzed by Content Analytics Studio, the pattern of annotations that represent the selected text is displayed in a tree format. You can modify the text pattern for the rule to match, such as generalizing the pattern to match similar occurrences of the concept, and then define one or more annotations to create when matching text is found in the document. After you add the rule to the database, rebuild the parsing rules file.

Alternatively, you can manually add annotations to a parsing rule by using the Add Annotation option. Using this approach, you can create a parsing rule without dragging any sample text. For example, this approach might be an easier way to create rules when the exact pattern that you want to match is not available in the text.

Procedure

To create parsing rules:

Create a parsing rules database.
In the Studio Explorer view, right-click the Resources/Parsing Rules directory in your project and click New > Parsing Rules Database.
Include the parsing rules JAR file to your UIMA pipeline configuration:
1. From the Configuration/Annotators directory, open the ANNOCONFIG file for your pipeline.
2. Select the Parsing Rules stage, select the appropriate language, and add the new parsing rules JAR file to the list of rule files.
Run an initial analysis of sample text on which to base the parsing rule.
From the Documents directory, open a document that contains the sample text. Right-click the document in the editor view and click Analyze Document. Ensure that you select the UIMA pipeline to which you added the new parsing rules file.
From the Resources/Parsing Rules directory, open the new parsing rules database by double-clicking it.
Add rules to the database by using the Create Parsing Rules view:
1. Define the text pattern for the rule to match.
  Drag sample text from your annotated document to the Selection tab, where the text is displayed as a tree of UIMA annotations. Refine the match criteria by configuring the nodes of the parse tree, such as generalizing the pattern to match similar occurrences of the same concept.
  
  For example, you create a rule to identify a person by dragging the text Sir Winston Churchill from an annotated document to the Selection tab. To generalize this rule so that it also matches names without a title, you can specify that the title is optional by right-clicking the node for the title annotation and setting the Repeats option to Occurring zero or one time.
2. Define one or more annotations to create when text matches the specified pattern.
  On the Annotation tab, select the text for which to create the new annotation. Then, right-click the selection, click Insert Annotation, and specify a name for the new annotation, such as Person. You can create an annotation over all of the text in the specified pattern or over only part of the text. For example, you can create an annotation that covers the title, given name, and surname, or covers only the given name and surname.
  Tip: You can also select one or more annotations to delete, such as negated terms. For example, you have a dictionary that annotates diseases such as diabetes. You probably do not want to annotate negated instances of the term, such as in the phrase the patient does not have diabetes. To configure a rule to delete negated forms of the annotation, first create a dictionary of negation terms such as no and not. Drag the sample negated text to the Selection tab, right-click the disease annotation on the Annotation tab, and click Delete Annotation.
3. Optional: Create features for the new annotations.
  For example, for a Person annotation you might create a feature for surnames. In the annotations tree on the Annotation tab, select the node that represents the surname and drag the node under the Features node of the Person annotation that you created.
4. Optional: Specify the rule set on the Selection tab.
  Rule sets are used for grouping related rules together and ensuring that rules do not interfere with each other.
5. Add the rule to the database by clicking the Add/Save the current rule icon in the Create Parsing Rules view.
Rebuild the parsing rules file by clicking the Build icon.
Test the rule by reviewing the updated annotations in your sample document.
In the Outline view for the annotated document, verify that the new annotations are now displayed. If the rule did not identify all instances of the text in the document, refine the criteria that you specified for the rule in the Create Parsing Rules view.
Tip: To temporarily disable a rule, select the Properties > Omit rule from build option in the Create Parsing Rules view. You can omit a rule from the build to compare the results of similar rules that you create and determine which of the rules best identifies text without having to delete and re-create the rules for each test.

What to do next

Whenever you add or change parsing rules, you must rebuild the parsing rules file from the database before your pipeline can use the updated rules to analyze documents.