Classifications
Classifications strengthen the contextual information that patterns provide by identifying that the underlying values belong to particular categories. Each rule set contains its own set of categories, which are called classes.
In DataStage®, records are represented as
patterns. In the same way that a record consists of one or more values, patterns consist of one or
more abstract characters, each of which represents a class. For example, a set of address data might
include the record 123 N CHERRY HILL ROAD, which is represented by the
pattern ^D++T. The following table shows the contextual information that each class in the pattern
^D++T provides.
Input record | Class label | Contextual information that the class provides |
---|---|---|
123 | ^ | Value that includes only numbers |
N | D | Street direction |
Cherry | + | Value that includes only letters |
Hill | + | Value that includes only letters |
Road | T | Street type |
Patterns contain the following types of classes:
- Default classes provide basic information about the type of the value, such as whether the value is comprised of alphabetic characters, numeric characters, or some combination of both.
- Custom classes provide stronger contextual information about the type of the value. In a data set that contains retail product information, custom classes might be used to indicate whether an alphabetic value is the name of a product or the name of a brand. The one-character label for custom classes can be any letter in the Latin alphabet or 0, which indicates a null class.
Rule sets use classifications to identify and classify key values. For example, a rule set for address data might use classifications to categorize values that are street types (AVE, ST, RD) or directions (N, NW, S) by providing the following information:
- Standard abbreviations for each word; for example, HWY for Highway
- A list of one-character labels that represent classes and that are assigned to individual data elements during processing
Classifications are added and modified by editing the classifications table (previously called .CLS file) , enhancing a rule set in DataStage, or using the user classification override.