USADDR

This function is used to standardize US addresses.

The USADDR function has a length limitation of 512 characters. If the value of the total length of the USADDR input fields (stline1, stline2, stline3, stline4) is over 512 characters, the operational server truncates the input value to the first 512 characters of the value.

Output Type
any type
Fldargs
stline1,stline2,stline3,stline4
MinFldArgs, MaxFldArgs
1,4
Number of standard roles
1
Strword table
ADDR-TOK
CMAP strcode
CMAP table

For most addresses, the USADDR function produces a string with a street number, an optional directional token, a street name, and a unit number that might be an apartment, suite, or floor number. For example, 345 NORTH ELM STREET, APT 66 becomes 345_N_ELM_66.

One exception to the formatting is Post Office boxes, which are formatted as POBOX_##.

The algorithm that produces this result is described here.

  1. The ADDR-TOK table is loaded by looking up the mpi_strhead and mpi_strword database tables.
  2. The leading spaces are trimmed and a space is added between each line of the address input.
  3. If there is a CMAP table that is specified, the CMAP character conversion is done.
  4. Uppercase, digit, or space characters are not changed. Lowercase characters are converted to uppercase. Punctuation characters are replaced by a space. If there is a # (number sign), there is a space added before and after the #. Any other character is taken out and the tokens are collapsed.
  5. The address is then parsed into words and the words are then classified according to the word type. The maximum number of words possible is 4. The classifications are found in the mpi_strword table under the ADDR-TOK labels.
  6. Depending on the token type, the roles are assigned to the address inputs. A list of possible token types includes:
    • MPI_TOKTYPE_BN = block number type
    • MPI_TOKTYPE_DT = direction type
    • MPI_TOKTYPE_ST = street type
    • MPI_TOKTYPE_MT = pobox type
    • MPI_TOKTYPE_UT = unit number type
    • MPI_TOKTYPE_SP = special (_MT/_UT)

      Token roles are such that:

      • The box number is assigned when there is a non-MT/SP token
      • Unit number is assigned when there is a non-UT/SP token
      • Block number is assigned when there is a BN token
      • Block number and the actual number are assigned when there is a NUM token
      • Direction or street name is assigned when there is a DT token (for example, North, N, NE)
      • Street name is assigned when there is an ANT token
      • If the type is ST and the street name is null, then the street type is assigned (for example, 123 Circle Court)
      • If the token type is UT or SP, the role is assigned to U (which is Unit)
      • If the token type is MT, the role is assigned to B (which is block)
      • If the token type is NXM and if the previous character was not ST,ND,RD,TH, then the street name is assigned. If not, the unit number is assigned. For example, 32ND AVENUE assigns 32 to the street name, but 32 AVENUE ROAD assigns 32 to the unit number.
  7. If the box number is null and the street name is null, but there is a direction, then the direction is assigned to this street name. If there is a street name, but no other sub components (for example, block number or unit number), then the street name is considered unsuitable and is given a NULL value. The box number (POBOX) has the highest priority among the token types.
  8. Finally the address string is formatted. If there is a box number, the POBOX is added in front of it and the number is appended. Then the street name, street direction, block number, and unit numbers are added, each of them by looking in the mpi_strword table and getting the equivalent substitution. For example, Street is replaced with ST, North is replaced with N.
USADDR example:

354 North Elm Street becomes 354_N_ELM_ST