Source data preparation

As you plan your project, you need to prepare the source data to realize the best results.

IBM® InfoSphere® QualityStage® accepts all basic data types (non-vector, non-aggregate) other than binary. Non-basic data types cannot be acted upon in InfoSphere QualityStage except for vectors in the match stages. However, non-basic data types can be passed through the InfoSphere DataStage® and QualityStage stages.

You can use various processing stages to construct some columns before using the columns in a stage that you use for data cleansing. In particular, create overlay column definitions, vector columns, and concatenated columns as explicit columns in the data before you use them.

For example, you do not need to declare the first three characters of a five-character postal code column as a separate additional column. Instead, you can use a Transformer stage to add the column to the source data explicitly before using the column in a stage that you use for data cleansing.

Note: Be sure to map missing values to null.

Conform the actual data to be matched to the following practices:

Use the Standardize stage to standardize data such as individual names or postal addresses. Complex conditions can be handled by creating new columns before matching begins.