Classifications

Classifications strengthen the contextual information that patterns provide by identifying that the underlying values belong to particular categories. Each rule set contains its own set of categories, which are called classes.

In DataStage®, records are represented as patterns. In the same way that a record consists of one or more values, patterns consist of one or more abstract characters, each of which represents a class. For example, a set of address data might include the record 123 N CHERRY HILL ROAD, which is represented by the pattern ^D++T. The following table shows the contextual information that each class in the pattern ^D++T provides.
Table 1. Example of a standard address pattern with the contextual information that each class provides
Input record Class label Contextual information that the class provides
123 ^ Value that includes only numbers
N D Street direction
Cherry + Value that includes only letters
Hill + Value that includes only letters
Road T Street type
Patterns contain the following types of classes:
  • Default classes provide basic information about the type of the value, such as whether the value is comprised of alphabetic characters, numeric characters, or some combination of both.
  • Custom classes provide stronger contextual information about the type of the value. In a data set that contains retail product information, custom classes might be used to indicate whether an alphabetic value is the name of a product or the name of a brand. The one-character label for custom classes can be any letter in the Latin alphabet or 0, which indicates a null class.

Rule sets use classifications to identify and classify key values. For example, a rule set for address data might use classifications to categorize values that are street types (AVE, ST, RD) or directions (N, NW, S) by providing the following information:

  • Standard abbreviations for each word; for example, HWY for Highway
  • A list of one-character labels that represent classes and that are assigned to individual data elements during processing

Classifications are added and modified by editing the classifications table (previously called .CLS file) , enhancing a rule set in DataStage, or using the user classification override.