After an initial data exploration, you will probably have to select, clean, or
construct data in preparation for analysis. The Field Operations palette contains many nodes useful
for this transformation and preparation.
For example, using a Derive node, you might create an attribute that is not
currently represented in the data. Or you might use a Binning node to recode field values
automatically for targeted analysis. You will probably find yourself using a Type node frequently—it
allows you to assign a measurement level, values, and a modeling role for each field in the dataset.
Its operations are useful for handling missing values and downstream modeling.
The Field Operations palette contains the following nodes:
|
The Automated Data Preparation (ADP) node can analyze your data and identify fixes, screen
out fields that are problematic or not likely to be useful, derive new attributes when appropriate,
and improve performance through intelligent screening and sampling techniques. You can use the node
in fully automated fashion, allowing the node to choose and apply fixes, or you can preview the
changes before they are made and accept, reject, or amend them as desired.
|
|
The Type node specifies field metadata and properties. For example, you can specify a
measurement level (continuous, nominal, ordinal, or flag) for each field, set options for handling
missing values and system nulls, set the role of a field for modeling purposes, specify field and
value labels, and specify values for a field.
|
|
The Filter node filters (discards) fields, renames fields, and maps fields from one source
node to another.
|
|
The Derive node modifies data values or creates new fields from one or more existing fields.
It creates fields of type formula, flag, nominal, state, count, and conditional.
|
|
The Ensemble node combines two or more model nuggets to obtain more accurate predictions than
can be gained from any one model.
|
|
The Filler node replaces field values and changes storage. You can choose to replace values
based on a CLEM condition, such
as @BLANK(@FIELD) . Alternatively, you can choose to replace all blanks or null
values with a specific value. A Filler node is often used together with a Type node to replace
missing values.
|
|
The Anonymize node transforms the way field names and values are represented downstream, thus
disguising the original data. This can be useful if you want to allow other users to build models
using sensitive data, such as customer names or other details.
|
|
The Reclassify node transforms one set of categorical values to another. Reclassification is
useful for collapsing categories or regrouping data for analysis.
|
|
The Binning node automatically creates new nominal (set) fields based on the values of one or
more existing continuous (numeric range) fields. For example, you can transform a continuous income
field into a new categorical field containing groups of income as deviations from the mean. Once you
have created bins for the new field, you can generate a Derive node based on the cut points.
|
|
The Recency, Frequency, Monetary (RFM) Analysis node enables you to determine quantitatively
which customers are likely to be the best ones by examining how recently they last purchased from
you (recency), how often they purchased (frequency), and how much they spent over all transactions
(monetary).
|
|
The Partition node generates a partition field, which splits the data into separate subsets
for the training, testing, and validation stages of model building.
|
|
The Set to Flag node derives multiple flag fields based on the categorical values defined for
one or more nominal fields.
|
|
The Restructure node converts a nominal or flag field into a group of fields that can be
populated with the values of yet another field. For example, given a field named payment
type, with values of credit, cash, and debit, three new fields would be
created (credit, cash, debit), each of which might contain the value of the
actual payment made.
|
|
The Transpose node swaps the data in rows and columns so that records become fields and
fields become records.
|
|
Use the Time Intervals node to specify intervals and derive a new time field
for estimating or forecasting. A full range of time intervals is supported, from seconds to
years.
|
|
The History node creates new fields containing data from fields in previous records. History
nodes are most often used for sequential data, such as time series data. Before using a History
node, you may want to sort the data using a Sort node.
|
|
The Field Reorder node defines the natural order used to display fields downstream. This
order affects the display of fields in a variety of places, such as tables, lists, and the Field
Chooser. This operation is useful when working with wide datasets to make fields of interest more
visible.
|
|
Within SPSS® Modeler, items such as the Expression
Builder spatial functions, the Spatio-Temporal Prediction (STP) Node, and the Map Visualization Node
use the projected coordinate system. Use the Reproject node to change the coordinate system of any
data that you import that uses a geographic coordinate system.
|
Several of these nodes can be generated directly from the audit report created
by a Data Audit node. See the topic Generating Other Nodes for Data Preparation for more information.