Character rules identify sequences of characters that represent particular entities in your text, such as telephone numbers, email addresses, or product identifiers.
For example, you might want to identify United States telephone numbers such as 704-501-1500. But because phone numbers can be written in other ways, such as (704) 501-1500, writing a regular expression to find all other the possible variations can be challenging.
By using the ICA Studio character rules editor, you can generate these character rule expressions graphically by basing the rules on sample text that contains the character sequences. After the sample text is analyzed by ICA Studio, the pattern of character classes that represent the selected text is displayed in a tree format. You can then modify the pattern to match similar sequences of characters and define one or more annotations to create when matching text is found in the document. You can also create features for the annotations. For a United States phone number annotation, you might create a feature for the area code, which is the first three numbers in the telephone number.
Alternatively, you can manually configure character rules by defining variables and regular expressions in CHARRULES files. This approach is for experienced users who want to create complex character rules.