One of the most powerful features in IBM® SPSS® Modeler is the ability to modify data values
and derive new fields from existing data. During lengthy data mining projects, it is common to
perform several derivations, such as extracting a customer ID from a string of Web log data or
creating a customer lifetime value based on transaction and demographic data. All of these
transformations can be performed, using a variety of field operations nodes.
Several nodes provide the ability to derive new fields:
|
|
The Derive node modifies data values or creates new fields from one or more existing fields.
It creates fields of type formula, flag, nominal, state, count, and conditional.
|
|
|
The Reclassify node transforms one set of categorical values to another. Reclassification is
useful for collapsing categories or regrouping data for analysis.
|
|
|
The Binning node automatically creates new nominal (set) fields based on the values of one or
more existing continuous (numeric range) fields. For example, you can transform a continuous income
field into a new categorical field containing groups of income as deviations from the mean. Once you
have created bins for the new field, you can generate a Derive node based on the cut points.
|
|
|
The Set to Flag node derives multiple flag fields based on the categorical values defined for
one or more nominal fields.
|
|
|
The Restructure node converts a nominal or flag field into a group of fields that can be
populated with the values of yet another field. For example, given a field named payment
type, with values of credit, cash, and debit, three new fields would be
created (credit, cash, debit), each of which might contain the value of the
actual payment made.
|
|
|
The History node creates new fields containing data from fields in previous records. History
nodes are most often used for sequential data, such as time series data. Before using a History
node, you may want to sort the data using a Sort node.
|
Using the Derive node
Using the Derive node, you can create six types of new fields from one or
more existing fields:
- Formula. The new field is the result of an arbitrary
CLEM expression.
- Flag. The new field is a flag, representing a
specified condition.
- Nominal. The new field is nominal, meaning that its
members are a group of specified values.
- State. The new field is one of two states. Switching
between these states is triggered by a specified condition.
- Count. The new field is based on the number of times
that a condition has been true.
- Conditional. The new field is the value of one of two
expressions, depending on the value of a condition.
Each of these nodes contains a set of special options in the Derive node
dialog box. These options are discussed in subsequent topics.
Note that use of the following may change row order:
- Executing in a database via SQL pushback
- Executing via remote IBM SPSS Analytic Server
- Using functions that run in embedded IBM SPSS Analytic Server
- Deriving a list (for example, see Deriving a list or geospatial field)
- Calling any of the spatial functions