Example for creating rule files

In this example, you can create a rule file that extracts the concepts country code, area code, and extension.
To create a rule file for international phone numbers that extracts the concepts country code, area code, and extension number, follow these steps:
  1. In the Text Analysis folder, right-click the Rules folder and select New -> Rules from the popup menu.

    The New Rules dialog is displayed. It shows the data warehousing projects that you created previously and the Text Analysis Sample project.

  2. On the New Rules page, select the data warehousing project that you want to use and type PhoneNumbers in the Rules File Name entry field and click Finish.

    The Regular Expression editor is displayed. In the Types section, the type PhoneNumbers is displayed. The Features folder is empty. A regular-expression pattern is not yet defined.

  3. In the Regular Expression editor, click New Feature to open the New Feature dialog.
  4. In the New Feature dialog, type Country_Code in the entry field and click OK.
  5. Repeat the previous step to add the features Area_Code, Extension_Number, and Country_Name.
  6. In the Rules section, select the rule PhoneNumbers to display the parameters for specifying a regular-expression pattern for the rule.
  7. In the Rule entry field, type the regular-expression pattern (\d*)-(\d+)-(\d+).

    The characters that are enclosed in the first pair of parenthesis denote the country code, in the second pair of parenthesis denote the area code, and in the third pair of parenthesis denote the extension number.

    You might want to use the Regular Expression Builder to create the regular-expression patterns. With the Regular Expression Builder, you can easily create regular-expression patterns by selecting several constructs from different categories.

    You can specify the match strategy for the regular-expression pattern or test the rule in the Regular Expression Builder or in the Rule properties.

  8. In the Input text entry field of the Test Rule section, type 0033-1234-56567878 to test the rule.

    In the Matched field, the matches of the input text are displayed. These matches consist of the subpatterns that are defined for this rule. The subpatterns can be mapped to the defined features of the rule.

  9. In the Feature section, select Country_Code and select Subpattern1 from the list of subpattern references.

    By mapping the feature Country_Code to the subpattern1, the matching group of subpattern1 is set as value for the feature Area_Code when a match is found in the text to be analyzed.

  10. Repeat the previous step to map the feature Area_Code to subpattern2 and the feature Extension_Number to subpattern3.
  11. In the Types section, click New Rule, type German_Phone_Numbers in the entry field, and click OK.
  12. In the Rule section, type (0049)-(\d+)-(\d+) in the entry field, where 0049 denotes the number for Germany.
  13. In the Test Rule section, type 0049-7031-666666 to test the rule.
  14. In the Feature section, select the feature Country–Name and type Germany in the Value entry field. Do not specify a subpattern reference because you want to set the fixed value Germany whenever this rule matches and a text starts with the country code 0049-.


Feedback | Information roadmap