Handling Outliers and Extreme Values
The audit report lists number of outliers and extremes is listed for each field based on the detection options specified in the Data Audit node. See the topic Data Audit Quality Tab for more information. You can choose to coerce, discard, or nullify these values for specific fields as appropriate, and then generate a SuperNode to apply the transformations.
- In the Action column, specify handling for outliers
and extremes for specific fields as desired.
The following actions are available for handling outliers and extremes:
- Coerce. Replaces outliers and extreme values with the nearest value that would not be considered extreme. For example if an outlier is defined to be anything above or below three standard deviations, then all outliers would be replaced with the highest or lowest value within this range.
- Discard. Discards records with outlying or extreme values for the specified field.
- Nullify. Replaces outliers and extremes with the null or system-missing value.
- Coerce outliers / discard extremes. Discards extreme values only.
- Coerce outliers / nullify extremes. Nullifies extreme values only.
- To generate the SuperNode, from the menus choose:
The Outlier SuperNode dialog box is displayed.
- Select All fields or Selected fields only, and then click OK to add the generated SuperNode to the stream canvas.
- Attach the SuperNode to the stream to apply the transformations.
Optionally, you can edit the SuperNode and zoom in to browse or make changes. Within the SuperNode, values are discarded, coerced, or nullified using a series of Select and/or Filler nodes as appropriate.