Identifying simple pattern classes
Simple pattern classes are used to further identify the data with a meaningful pattern from which to match the pattern actions.
Simple pattern classes are represented by single characters.
Within patterns, you must use the backslash (\) escape character to prevent the syntax of the pattern tables from interfering with certain single character classes. Use the backslash (\) escape character with the following single character classes: the hyphen (-), slash (/), number sign (#), left and right parentheses () and ampersand (&).
Take care when specifying SEPLIST and STRIPLIST entries. For example, to recognize the ampersand as a single token, include it in the SEPLIST but not in the STRIPLIST. If the backslash is in theSEPLIST, its class is \ (backslash). If a backslash is used in a pattern, then it must have an escape character in a pattern as a double backslash (\\). Also see Applying parsing rules to a list
The NULL class (0) is not included in this list of single character classes. The NULL class is used in the classifications (.CLS) or in the RETYPE action to make a token NULL. Because a NULL class never matches to anything, it is never used in a pattern.
| Class | Description |
|---|---|
| A - Z | User-supplied class from the classifications The classes A - Z correspond to classes that you code in the classifications. For example, if APARTMENT is given the class of U in the classifications, then APARTMENT matches a simple pattern of U. |
| ^ | Numeric The class ^ (caret) represents a single number, for example, the number 123. However, the number 1,230 uses three tokens: the number 1, a comma, and the number 230. |
| ? | One or more consecutive words that are not in
classifications. The class ? (question mark) represents one or more consecutive alphabetic words. For example, MAIN, CHERRY HILL, and SATSUMA PLUM TREE HILL each match to a single ? class provided none of these words are in the classifications for the rule set. Class ? is useful for street names when multi-word and single-word street names must be treated identically. |
| + | A single alphabetic word that is not in classifications The
class + (plus sign) is useful for separating the parts
of an unknown string. For example, in a name like OWAIN LIAM
JONES, copy the individual words to columns with given name,
middle name, and family name as follows:
|
| & | A single token of any type The class & (ampersand)
represents a single token of any class. For example, a pattern to
match to a single word following an apartment type is:
SUITE 11 is recognized by this pattern. However, in a case such as APT 1ST FlOOR, only APT 1ST is recognized by this pattern. |
| \& | Type the backslash (\) escape character before
the ampersand to use the ampersand as a literal.
1ST & MAIN ST is recognized by this
pattern. |
| > | Leading numeric The class > (greater
than symbol) represents a token with numbers that is followed by letters.
For example, a house number like 123A MAPLE AVE can be matched
as follows:
123A is recognized by this pattern. The token contains numbers and alphabetic characters but the numbers are leading. In this example, T represents street type. |
| < | Leading alphabetic character The class < (less
than symbol) matches itself to leading alphabetic letters. It is useful
with the following examples:
The token contains alphabetic characters and numbers but the alphabetic characters are leading. |
| @ | Complex mix The class @ (at
sign) represent tokens that have a complex mixture of alphabetic characters
and numerics, for example: A123B, 345BCD789. For example,
area information like Hamilton ON L8N 2P1 can be matched
as follows:
In this example, P represents Province. The first @ represents L8N and the second @ represents 2P1. |
| ~ | Special punctuation The class ~ (tilde) represents special characters that are not in the SEPLIST. For example, if a SEPLIST does note contain the dollar sign and percent sign, then you might use the following pattern:
In this example, $ HELLO and % OFF match the pattern. |
| k | One or more Chinese numeric characters |
| / | Literal The class / (slash)
is useful for fractional addresses like 123 ½ MAPLE AVE,
which matches to the following pattern:
|
| \/ | Backslash, forward slash You can use the backslash (\) escape character with the slash in the same manner that you use the / (slash) class. |
| - | Literal The class - (hyphen)
is often used for address ranges, for example, an address range like
123-127 matches the following pattern:
|
| \- | You can use the backslash (\) escape character with the hyphen in the same manner you use the - (hyphen) class. |
| \# | Literal. You must use with the backslash (\)
escape character, for example: \#. The class # (pound
sign) is often used as a unit prefix, for example, an address like
suite #12 or unit #9A matches the following pattern:
|
| () | Literal The classes ( and ) (parentheses) are used to enclose operands or user variables in a pattern syntax. An example of a pattern syntax that includes a leading numeric operators and a trailing character operator is as follows:
The pattern syntax example, can recognize the address 123A MAPLE AVE. The numbers 123 are recognized as the house number and the letter A is recognized as a house number suffix. Use the backslash (\) escape character with the opening parenthesis or closing parenthesis to filter out parenthetical remarks. To remove a parenthetical remark such as (see Joe, Room 202), you specify this pattern:
The code example removes the parentheses and the contents of the parenthetical remark. In addition, when you retype these fields to NULL you essentially remove the parenthetical statement from consideration by any patterns that are further down in the pattern-action file. The NULL class (0) is not included in this list of single character classes. The NULL class is used in the classifications or in the RETYPE action to make a token NULL. Because a NULL class never matches to anything, it is never used in a pattern. |
| \( and \) | Use the backslash (\) escape character with the opening parenthesis or closing parenthesis to filter out parenthetical remarks. To remove a parenthetical remark such as (see Joe, Room 202), you specify this pattern:
The code example removes the parentheses and the contents of the parenthetical remark. In addition, when you retype these fields to NULL you essentially remove the parenthetical statement from consideration by any patterns that are further down in the pattern-action file. |