After an initial data exploration, you will probably have to select,
clean, or construct data in preparation for analysis. The Field Operations
palette contains many nodes useful for this transformation and preparation.
For example, using a Derive node, you might create
an attribute that is not currently represented in the data. Or you
might use a Binning node to recode field values automatically for
targeted analysis. You will probably find yourself using a Type node
frequently—it allows you to assign a measurement level, values,
and a modeling role for each field in the dataset. Its operations
are useful for handling missing values and downstream modeling.
The Field Operations palette contains the following nodes:
|
The Automated Data Preparation (ADP) node can analyze your
data and identify fixes, screen out fields that are problematic or
not likely to be useful, derive new attributes when appropriate, and
improve performance through intelligent screening and sampling techniques.
You can use the node in fully automated fashion, allowing the node
to choose and apply fixes, or you can preview the changes before they
are made and accept, reject, or amend them as desired. |
|
The Type node specifies field metadata and properties. For
example, you can specify a measurement level (continuous, nominal,
ordinal, or flag) for each field, set options for handling missing
values and system nulls, set the role of a field for modeling purposes,
specify field and value labels, and specify values for a field. |
|
The Filter node filters (discards) fields, renames fields,
and maps fields from one source node to another. |
|
The Derive node modifies data values or creates new fields
from one or more existing fields. It creates fields of type formula,
flag, nominal, state, count, and conditional. |
|
The Ensemble node combines two or more model nuggets to obtain
more accurate predictions than can be gained from any one model. |
|
The Filler node replaces field values and changes storage.
You can choose to replace values based on a CLEM condition,
such as @BLANK(@FIELD). Alternatively, you can choose
to replace all blanks or null values with a specific value. A Filler
node is often used together with a Type node to replace missing values. |
|
The Anonymize node transforms the way field names and values
are represented downstream, thus disguising the original data. This
can be useful if you want to allow other users to build models using
sensitive data, such as customer names or other details. |
|
The Reclassify node transforms one set of categorical values
to another. Reclassification is useful for collapsing categories or
regrouping data for analysis. |
|
The Binning node automatically creates new nominal (set) fields
based on the values of one or more existing continuous (numeric range)
fields. For example, you can transform a continuous income field into
a new categorical field containing groups of income as deviations
from the mean. Once you have created bins for the new field, you can
generate a Derive node based on the cut points. |
|
The Recency, Frequency, Monetary (RFM) Analysis node enables
you to determine quantitatively which customers are likely to be the
best ones by examining how recently they last purchased from you (recency),
how often they purchased (frequency), and how much they spent over
all transactions (monetary). |
|
The Partition node generates a partition field, which splits
the data into separate subsets for the training, testing, and validation
stages of model building. |
|
The Set to Flag node derives multiple flag fields based on
the categorical values defined for one or more nominal fields. |
|
The Restructure node converts a nominal or flag field into
a group of fields that can be populated with the values of yet another
field. For example, given a field named payment type, with
values of credit, cash, and debit, three new
fields would be created (credit, cash, debit),
each of which might contain the value of the actual payment made. |
|
The Transpose node swaps the data in rows and columns so that
records become fields and fields become records. |
|
Use the Time Intervals node to specify intervals and derive a new time field for estimating or
forecasting. A full range of time intervals is supported, from seconds to years.
|
|
The History node creates new fields containing data from fields
in previous records. History nodes are most often used for sequential
data, such as time series data. Before using a History node, you may
want to sort the data using a Sort node. |
|
The Field Reorder node defines the natural order used to display
fields downstream. This order affects the display of fields in a variety
of places, such as tables, lists, and the Field Chooser. This operation
is useful when working with wide datasets to make fields of interest
more visible. |
|
Within SPSS® Modeler, items such as
the Expression Builder spatial functions, the Spatio-Temporal Prediction (STP) Node, and the Map
Visualization Node use the projected coordinate system. Use the Reproject node to change the
coordinate system of any data that you import that uses a geographic coordinate system.
|
Several of these nodes can be generated directly from the audit
report created by a Data Audit node. See the topic Generating Other Nodes for Data Preparation for more information.