In this example, you can create a rule file that extracts
the concepts country code, area code, and extension.
To create a rule file for international phone numbers that
extracts the concepts country code, area code, and extension number,
follow these steps:
- In the Text Analysis folder, right-click the Rules folder
and select New -> Rules from the popup menu.
The New Rules dialog is displayed. It shows the data warehousing projects that
you created previously and the Text Analysis Sample project.
- On the New Rules page, select the data warehousing project that
you want to use and type PhoneNumbers in the Rules
File Name entry field and click Finish.
The Regular Expression editor is displayed. In the Types
section, the type PhoneNumbers is displayed.
The Features folder is empty. A regular-expression pattern is not
yet defined.
- In the Regular Expression editor, click New
Feature to open the New Feature dialog.
- In the New Feature dialog, type Country_Code in
the entry field and click OK.
- Repeat the previous step to add the features Area_Code, Extension_Number,
and Country_Name.
- In the Rules section, select the rule PhoneNumbers to
display the parameters for specifying a regular-expression pattern
for the rule.
- In the Rule entry field, type the regular-expression pattern (\d*)-(\d+)-(\d+).
The characters that are enclosed in the first pair of parenthesis
denote the country code, in the second pair of parenthesis denote
the area code, and in the third pair of parenthesis denote the extension
number.
You might want to use the Regular Expression Builder
to create the regular-expression patterns. With the Regular Expression
Builder, you can easily create regular-expression patterns by selecting
several constructs from different categories.
You can specify
the match strategy for the regular-expression pattern or test the
rule in the Regular Expression Builder or in the Rule properties.
- In the Input text entry field of
the Test Rule section, type 0033-1234-56567878 to
test the rule.
In the Matched field,
the matches of the input text are displayed. These matches consist
of the subpatterns that are defined for this rule. The subpatterns
can be mapped to the defined features of the rule.
- In the Feature section, select Country_Code and
select Subpattern1 from the list of subpattern references.
By
mapping the feature Country_Code to the subpattern1,
the matching group of subpattern1 is set as value for the feature Area_Code when
a match is found in the text to be analyzed.
- Repeat the previous step to map the feature Area_Code to
subpattern2 and the feature Extension_Number to
subpattern3.
- In the Types section, click New Rule,
type German_Phone_Numbers in the entry field, and
click OK.
- In the Rule section, type (0049)-(\d+)-(\d+) in
the entry field, where 0049 denotes the number for Germany.
- In the Test Rule section, type 0049-7031-666666 to
test the rule.
- In the Feature section, select the feature Country–Name and
type Germany in the Value entry
field. Do not specify a subpattern reference because you want to set
the fixed value Germany whenever this rule
matches and a text starts with the country code 0049-.