Generating Nodes for the Transformations

The output viewer provides a useful starting point for your data preparation. For example, you might want to normalize the field AGE so that you can use a scoring technique (such as logistic regression or discriminant analysis) that assumes a normal distribution. Based upon the initial graphs and summary statistics, you might decide to transform the AGE field according to a particular distribution (for example, log). After selecting the preferred distribution, you can then generate a derive node with a standardized transformation to use for scoring.

You can generate the following field operations nodes from the output viewer:

  • Derive
  • Filler

A Derive node creates new fields with the desired transformations, while the Filler node transforms existing fields. The nodes are placed on the canvas in the form of a SuperNode.

If you select the same transformation for different fields, a Derive or Filler node contains the formulas for that transformation type for all the fields to which that transformation applies. For example, assume that you have selected the fields and transformations, shown in the following table, to generate a Derive node.

Table 1. Example of Derive node generation
Field Transformation
AGE Current Distribution
INCOME Log
OPEN_BAL Inverse
BALANCE Inverse

The following nodes are contained in the SuperNode:

Figure 1. SuperNode on canvas
SuperNode on canvas

In this example, the Derive_Log node has the log formula for the INCOME field, and the Derive_Inverse node has the inverse formulas for the OPEN_BAL and BALANCE fields.

To Generate a Node

  1. For each field in the output viewer, select the desired transformation.
  2. From the Generate menu, choose Derive Node or Filler Node as desired.

    Doing so displays the Generate Derive Node or Generate Filler Node dialog box, as appropriate.

    Choose Non-standardized transformation or Standardized transformation (z-score) as desired. The second option applies a z score to the transformation; z scores represent values as a function of distance from the mean of the variable in standard deviations. For example, if you apply the log transformation to the AGE field and choose a standardized transformation, the final equation for the generated node will be:

    (log(AGE)-Mean)/SD

Once a node is generated and appears on the stream canvas:

  1. Attach it to the stream.
  2. For a SuperNode, optionally double-click the node to view its contents.
  3. Optionally double-click a Derive or Filler node to modify options for the selected field(s).