Derive node

One of the most powerful features in IBM® SPSS® Modeler is the ability to modify data values and derive new fields from existing data. During lengthy data mining projects, it is common to perform several derivations, such as extracting a customer ID from a string of Web log data or creating a customer lifetime value based on transaction and demographic data. All of these transformations can be performed, using a variety of field operations nodes.

Several nodes provide the ability to derive new fields:

The Derive node modifies data values or creates new fields from one or more existing fields. It creates fields of type formula, flag, nominal, state, count, and conditional.
The Reclassify node transforms one set of categorical values to another. Reclassification is useful for collapsing categories or regrouping data for analysis.
The Binning node automatically creates new nominal (set) fields based on the values of one or more existing continuous (numeric range) fields. For example, you can transform a continuous income field into a new categorical field containing groups of income as deviations from the mean. Once you have created bins for the new field, you can generate a Derive node based on the cut points.
The Set to Flag node derives multiple flag fields based on the categorical values defined for one or more nominal fields.
The Restructure node converts a nominal or flag field into a group of fields that can be populated with the values of yet another field. For example, given a field named payment type, with values of credit, cash, and debit, three new fields would be created (credit, cash, debit), each of which might contain the value of the actual payment made.
The History node creates new fields containing data from fields in previous records. History nodes are most often used for sequential data, such as time series data. Before using a History node, you may want to sort the data using a Sort node.

Using the Derive node

Using the Derive node, you can create six types of new fields from one or more existing fields:

  • Formula. The new field is the result of an arbitrary CLEM expression.
  • Flag. The new field is a flag, representing a specified condition.
  • Nominal. The new field is nominal, meaning that its members are a group of specified values.
  • State. The new field is one of two states. Switching between these states is triggered by a specified condition.
  • Count. The new field is based on the number of times that a condition has been true.
  • Conditional. The new field is the value of one of two expressions, depending on the value of a condition.

Each of these nodes contains a set of special options in the Derive node dialog box. These options are discussed in subsequent topics.

Note that use of the following may change row order:
  • Executing in a database via SQL pushback
  • Executing via remote IBM SPSS Analytic Server
  • Using functions that run in embedded IBM SPSS Analytic Server
  • Deriving a list (for example, see Deriving a list or geospatial field)
  • Calling any of the functions described in Spatial functions