Standardize stage
Use the Standardize stage to make your source data internally consistent, so each data type has the same kind of content and format.
The Standardize stage builds on the interpretation of the data during the Investigate stage. The Standardize stage reformats data and creates a consistent data presentation with fixed and discrete columns, according to your company requirements.
The Standardize stage uses the data content and placement within the record context to determine the meaning of each data element. Common examples of data elements that can be identified are name, address, city, state, and postal code.
To correctly parse and identify each element or value (previously called token), and place them in the appropriate column in the output file, the Standardize stage uses rule sets that are designed to comply with standards or conventions. For example, you can standardize data names (individuals and businesses) and addresses to comply with the conventions of a specific country. The rule sets that are used by the Standardize stage can assimilate the data and append additional information from the input data, such as gender. These rule sets are the same as those used in the Investigate stage.
Standardized data is important for the following reasons:
- Effectively matches data
- Facilitates a consistent format for the output data
The Standardize stage parses free-form and fixed-format columns into single-domain columns to create a consistent representation of the input data.
- Free-form columns contain alphanumeric information of any length as long as it is less than or equal to the maximum column length defined for that column.
- Fixed-format columns contain only one specific type of information, such as only numeric, character, or alphanumeric information, and have a specific format.
The Standardize stage takes a single input, which can be a link from any database connector supported by DataStage®, a flat file or data set, or any processing stage. It is not necessary to restrict the data to fixed-length columns.
The Standardize stage has only one output link. This link can send standardized output and the raw input to any other stage.
Standardize stage: fast path
- Go to the Stage tab of the Standardize stage properties panel, then open the Standardization processes section.
- Click Add rule to open the standardization rule page.
- Open the Regions section of the page, open a region, then open further sub nodes until you can select a ruleset. Click Manage to edit rule properties and lookup tables. Edit classification, patterns, and overrides.
- Select a ruleset, then click Select.
- In the Standardization processes section, under Column name, click Add names +.
- On the Standardization columns page, add new columns or literals or both. Then, click Apply and return.
- Click Save.