RXNM (standardization function)
The RXNM standardization function is used to split and rewrite (name) tokens.
This function allows for standardization of input that contains non-Romanized letters. For tokenization, this function can be used with the built-in rules (RXNMARABICRULES) or with custom tokenization rules. RXNM standardization is designed for use with the QXNM comparison function. Use the PXNM bucketing function with RXNM (EQMETA can also be used to provide nickname phonetics). Any of the bucket generation options are valid. The phonetic function should be ARABICPHONE, unless you have developed a custom phonetic function.
The RXNM function has a length limitation of 512 characters. If the value of the length of the RXNM field is over 512 characters, the operational server truncates the input value to the first 512 characters of the value.
- Output type
- Alphanumeric
- Fldargs
- MEMNAME, MEMATTR
- MinFldArgs, MaxFldArgs
- None
- Number of standard roles
- 1
- MinLength, MaxLength
- None
- dvdargs
- STRCODE (reference from mpi_strconfig), NOSKIPTIOKEN
- mpi_strconfig strcode
- RXNMARABICRULES or user defined
- mpi_stranon strcode
- ANON
- mpi_strequi strcode
- EQUI
- mpi_strcmap strcode
- CMAP
- The lookups are completed during standardization
and transforming the input token to an alternate form. Unless the
alternate form is anonymous, it is used to create the resulting cmpval
that is stored in mpi_memcmpd. These lookups are performed on sequences
that are generated from user-defined rules too. Lookups include:
- PFX - Prefix lookup table. Defined in mpi_strword with the following strcode RXNM-PFX
- SFX - Suffix lookup table. Defined in mpi_strword with the following strcode RXNM-SFX
- DGR - Degree lookup table. Defined in mpi_strword with the following strcode RXNM-DGR
- Two dvdargs are available: a strcode that references
an encoder type from mpi_strconfig (required) and NOSKIPTOKEN (optional).
- A strcode reference to a rule set is required to use RXNM. A default specific to Arabic names called RXNMARABICRULES is provided with the operational server installation. New rule sets can be added to mpi_strconfig by associating them with the cfgtype of ENCODER. The strcode that indicates the rule set must be the first argument.
- NOSKIPTOKEN is an optional argument is used to persist the original input token into the final cmpval along with any tokens generated by rule processing. For example, if the input name is ABDULSATAR and a rule is defined to generate ABD AL SATAR, the final cmpval would look like: ABDULSATAR:ABD:AL:SATAR. Otherwise, by default the cmpval would look like: ABD:AL:SATAR. In practice, NOSKIPTOKEN should not be used, but is provided for customers that might require it. You get better comparison results if you do not use the NOSKIPTOKEN option.