Retyping operands

The RETYPE action is used to change the class, token value, and abbreviation of an operand.

The format of the retype operand is the following pattern-action set:


RETYPE operand class [variable | literal] [variable | literal]
Argument Description
operand The operand number in the pattern
class The new class value
variable | literal The new token value, which can be a variable or a literal (optional)
variable | literal The new token abbreviation can be a variable or a literal (optional)
Note: If you want to change the token abbreviation but leave the token value as is, you can copy the token value to a user variable and use that variable as the third argument.

A basic concept of writing standardization rules is filtering. You can change phrases and clauses by detecting, processing, or removing them from the token table. You can remove them by retyping them to the NULL class (0). NULL classes are ignored in all pattern matching.

For example, if you want to process apartments and remove the unit information from further processing, you can use the following pattern action set:


*U | &
COPY_A [1] {UnitType}
COPY [2] {UnitValue}
RETYPE [1] 0
RETYPE [2] 0

Removing the apartment designation converts the address 123 MAIN ST APT 56 to a standard form, such as 123 MAIN ST. An apartment followed by any single token is detected. The fields are moved to the dictionary field and retyped to NULL, so that they would not match in any later patterns.

You can use a third operand to replace the text of a token. For example, if you want to recognize streets names like ST CHARLES and replace the ST with the word SAINT, you can use the following rule:


*!? | T | + | T 
RETYPE [2] ? "SAINT"

This set scans for a type T token (the only value for type T is ST) preceded by any token type except unknown alphabetic character and followed by a single alphabetic word. The RETYPE action changes the type T operand to an unknown alphabetic character (?) and replaces the text with SAINT. If the input data is the following address:


123 ST CHARLES ST

The result corrects the ambiguity of the abbreviations for SAINT and STREET as shown in the following example:


123 SAINT CHARLES ST

This rule matches the standard ^ | ? | T pattern after the final ST is retyped to T as is done in the geocode rule set. The ? is used for the retype class. This is important because you want to consider SAINT to be the same token as the neighboring unknown alphabetic tokens.

The first operand of the pattern (*!?) makes sure that ST is not preceded by an unknown alphabetic character. This prevents the input MAIN ST JUNK from being standardized into MAIN SAINT JUNK.

A fourth operand is available for the RETYPE action to change the standard abbreviation of the operand. If you include this operand, you must also include the token value replacement argument (third argument). If you do not want to replace the token value, you can use the original value as the replacement value. For example, the address ST 123 can mean SUITE 123.

If the standard abbreviation for SUITE is STE, you need to change the token abbreviation with the following pattern-action set:


S | ^
RETYPE [1] U "SUITE" "STE"

This is interpreted in future patterns as SUITE 123 and has the abbreviation STE.

For the optional third and fourth arguments, you can use a substring range. For example, with an input record of 8 143rd Ave and the following pattern-action set:


^ | > | T
COPY [2](n) temp
RETYPE [2] ^ temp(1:2)

The 143 is copied to the variable temp, the 14 replaces the contents of the second token, and its class becomes numeric (^).

RETYPE reclassifies all elements of a possibly concatenated alphabetic class (?) or universal class (**), if no subfield ranges (n:m) are specified. RETYPE reclassifies only those tokens within the subfield range if a range is specified. For example, if you have the following input:


15 AA BB CC DD EE FF RD

The following patterns or actions have the described effect:

Pattern Action Effect
^ | ? | T RETYPE [2] 0 ; Sets AA to FF to NULL class
^ | 3 | T RETYPE [2] 0 ; Sets CC to NULL class
^ | (2:3) | T RETYPE [2] 0 ; Sets BB and CC to NULL class
^ | -2 | T RETYPE [2] 0 ; Sets EE to NULL class

RETYPE operates in a similar fashion to CONVERT with a TKN argument. The values for RETYPE (class, token value, abbreviation) are not available within the current pattern-action set and are available only for following pattern-action sets. Consequently, a pattern action set does not produce the wanted results and is therefore flagged as an error, as in the following example:


...
RETYPE [1] ...
COPY [1] ...