Classifications strengthen the contextual information that
patterns provide by identifying that the underlying values belong
to particular categories. Each rule set contains its own set of categories,
which are called classes.
In
IBM® InfoSphere® QualityStage®,
records are represented as patterns. In the same way that a record
consists of one or more values, patterns consist of one or more abstract
characters, each of which represents a class. For example, a set of
address data might include the record
123 N CHERRY HILL
ROAD, which is represented by the pattern ^D++T. The following
table shows the contextual information that each class in the pattern
^D++T provides.
Table 1. Example of a standard address pattern
with the contextual information that each class provides| Input record |
Class label |
Contextual information that the class provides |
| 123 |
^ |
Value that includes only numbers |
| N |
D |
Street direction |
| Cherry |
+ |
Value that includes only letters |
| Hill |
+ |
Value that includes only letters |
| Road |
T |
Street type |
Patterns contain the following types of classes:
- Default classes provide basic information about the type of the
value, such as whether the value is comprised of alphabetic characters,
numeric characters, or some combination of both.
- Custom classes provide stronger contextual information about the
type of the value. In a data set that contains retail product information,
custom classes might be used to indicate whether an alphabetic value
is the name of a product or the name of a brand. The one-character
label for custom classes can be any letter in the Latin alphabet or 0,
which indicates a null class.
Rule sets use classifications to identify and classify key values.
For example, a rule set for address data might use classifications
to categorize values that are street types (AVE, ST, RD) or directions
(N, NW, S) by providing the following information:
- Standard abbreviations for each word; for example, HWY for Highway
- A list of one-character labels that represent classes and that
are assigned to individual data elements during processing
Classifications are added and modified by editing the classifications
table (previously called .CLS file) , enhancing a rule set in the Standardization Rules Designer,
or using the user classification override.