IBM InfoSphere QualityStage, Version 11.3.1
A classification definition assigns a value to a class. The definition can include additional information about the value and affect other similar values.
The standard value might be an abbreviation or expanded variation of the word. For example, the standard value for WEST might be W, and the standard value for POB might be "PO BOX".
In the classifications table (previously called the .CLS file), the maximum length for a standard value is 25 characters.
In the classification definition for a value in the null class, the standard value is not required.
The degree of variation that can exist in the spelling or representation of the value. If you want the classification definition to affect values that are different from the value in the definition, you can set the similarity threshold lower than the default of 900.
When the rule set that contains a classification definition is applied to data, values in the data are compared and a score is assigned. This score indicates the degree of similarity between two values. The string comparison method that is used can take into account phonetic errors, random insertion, deletion and replacement of characters, and transposing of characters.
The score is weighted by the length of the value because small errors in long values are less serious than errors in short values. Because errors in short values cannot generally be tolerated, do not specify a similarity threshold for short values.