Overview of Record Operations

Record operations nodes are used to make changes to data at the record level. These operations are important during the Data Understanding and Data Preparation phases of data mining because they allow you to tailor the data to your particular business need.

For example, based on the results of the data audit conducted using the Data Audit node (Output palette), you might decide that you would like customer purchase records for the past three months to be merged. Using a Merge node, you can merge records based on the values of a key field, such as Customer ID. Or you might discover that a database containing information about Web site hits is unmanageable with over one million records. Using a Sample node, you can select a subset of data for use in modeling.

The Record Operations palette contains the following nodes:

The Select node selects or discards a subset of records from the data stream based on a specific condition. For example, you might select the records that pertain to a particular sales region.
The Sample node selects a subset of records. A variety of sample types are supported, including stratified, clustered, and nonrandom (structured) samples. Sampling can be useful to improve performance, and to select groups of related records or transactions for analysis.
The Balance node corrects imbalances in a dataset, so it conforms to a specified condition. The balancing directive adjusts the proportion of records where a condition is true by the factor specified.
The Aggregate node replaces a sequence of input records with summarized, aggregated output records.
The Recency, Frequency, Monetary (RFM) Aggregate node enables you to take customers' historical transactional data, strip away any unused data, and combine all of their remaining transaction data into a single row that lists when they last dealt with you, how many transactions they have made, and the total monetary value of those transactions.
The Sort node sorts records into ascending or descending order based on the values of one or more fields.
The Merge node takes multiple input records and creates a single output record containing some or all of the input fields. It is useful for merging data from different sources, such as internal customer data and purchased demographic data.
The Append node concatenates sets of records. It is useful for combining datasets with similar structures but different data.
The Distinct node removes duplicate records, either by passing the first distinct record to the data stream or by discarding the first record and passing any duplicates to the data stream instead.
The Streaming Time Series node builds and scores time series models in one step. You can use the node with data in either a local or distributed environment; in a distributed environment you can harness the power of IBM® SPSS® Analytic Server
The Spectral Clustering© algorithm uses several eigenvectors to project data into a space with fewer dimensions. Then a k-means clustering algorithm is applied in the new space to separate the data into clusters. It's reasonably fast for small records with many fields, and computationally expensive for large data sets. The Spectral Clustering node in SPSS Modeler exposes the core features and commonly used parameters of the Spectral Clustering library. The node is implemented in Python.
Space-Time-Boxes (STB) are an extension of Geohashed spatial locations. More specifically, an STB is an alphanumeric string that represents a regularly shaped region of space and time.
The Streaming TCM node builds and scores temporal causal models in one step.
The CPLEX Optimization node provides the ability to use complex mathematical (CPLEX) based optimization via an Optimization Programming Language (OPL) model file. This functionality was available in the IBM Analytical Decision Management product, which is no longer supported. But you can also use the CPLEX node in SPSS Modeler without requiring IBM Analytical Decision Management.

Many of the nodes in the Record Operations palette require you to use a CLEM expression. If you are familiar with CLEM, you can type an expression in the field. However, all expression fields provide a button that opens the CLEM Expression Builder, which helps you create such expressions automatically.

Figure 1. Expression Builder button
Expression Builder button