The Radial Basis Function (RBF) procedure produces a predictive model for one or more dependent (target) variables based on values of predictor variables.

Example. A telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. An RBF network using demographic data to predict group membership allows the company to customize offers for individual prospective customers.

Data Considerations

Dependent variables. The dependent variables can be:

• Nominal. A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for example, the department of the company in which an employee works). Examples of nominal variables include region, postal code, and religious affiliation.
• Ordinal. A variable can be treated as ordinal when its values represent categories with some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied to highly satisfied). Examples of ordinal variables include attitude scores representing degree of satisfaction or confidence and preference rating scores.
• Scale. A variable can be treated as scale (continuous) when its values represent ordered categories with a meaningful metric, so that distance comparisons between values are appropriate. Examples of scale variables include age in years and income in thousands of dollars.

The procedure assumes that the appropriate measurement level has been assigned to all dependent variables, although you can temporarily change the measurement level for a variable by right-clicking the variable in the source variable list and selecting a measurement level from the pop-up menu. To permanently change the level of measurement for a variable, see Variable Measurement Level.

An icon next to each variable in the variable list identifies the measurement level and data type:

Table 1. Measurement level icons
Numeric String Date Time
Scale (Continuous)
n/a
Ordinal
Nominal

Predictor variables. Predictors can be specified as factors (categorical) or covariates (scale).

Categorical variable coding. The procedure temporarily recodes categorical predictors and dependent variables using one-of-c coding for the duration of the procedure. If there are c categories of a variable, then the variable is stored as c vectors, with the first category denoted (1,0,...,0), the next category (0,1,0,...,0), ..., and the final category (0,0,...,0,1).

This coding scheme increases the number of synaptic weights and can result in slower training, but more "compact" coding methods usually lead to poorly fit neural networks. If your network training is proceeding very slowly, try reducing the number of categories in your categorical predictors by combining similar categories or dropping cases that have extremely rare categories.

All one-of-c coding is based on the training data, even if a testing or holdout sample is defined (see Partitions (Radial Basis Function)). Thus, if the testing or holdout samples contain cases with predictor categories that are not present in the training data, then those cases are not used by the procedure or in scoring. If the testing or holdout samples contain cases with dependent variable categories that are not present in the training data, then those cases are not used by the procedure but they may be scored.

Rescaling. Scale dependent variables and covariates are rescaled by default to improve network training. All rescaling is performed based on the training data, even if a testing or holdout sample is defined (see Partitions (Radial Basis Function)). That is, depending on the type of rescaling, the mean, standard deviation, minimum value, or maximum value of a covariate or dependent variable are computed using only the training data. If you specify a variable to define partitions, it is important that these covariates or dependent variables have similar distributions across the training, testing, and holdout samples. Use, for example, the Explore procedure to examine the distributions across partitions.

Frequency weights. Frequency weights are ignored by this procedure.

Replicating results. If you want to exactly replicate your results, use the same initialization value for the random number generator and the same data order, in addition to using the same procedure settings. More details on this issue follow:

• Random number generation. The procedure uses random number generation during random assignment of partitions. To reproduce the same randomized results in the future, use the same initialization value for the random number generator before each run of the Radial Basis Function procedure. See the topic Random Number Generators for more information. ` `
• Case order. Results are also dependent on data order because the two-step cluster algorithm is used to determine the radial basis functions. See the topic TwoStep Cluster Analysis for more information.

To minimize order effects, randomly order the cases. To verify the stability of a given solution, you may want to obtain several different solutions with cases sorted in different random orders. In situations with extremely large file sizes, multiple runs can be performed with a sample of cases sorted in different random orders.

Creating a Radial Basis Function Network

This feature requires the Neural Networks option.

Analyze > Neural Networks > Radial Basis Function...

1. Select at least one dependent variable.
2. Select at least one factor or covariate.

Optionally, on the Variables tab you can change the method for rescaling covariates. The choices are:

• Standardized. Subtract the mean and divide by the standard deviation, (x−mean)/s.
• Normalized. Subtract the minimum and divide by the range, (x−min)/(max−min). Normalized values fall between 0 and 1.
• Adjusted Normalized. Adjusted version of subtracting the minimum and dividing by the range, [2*(x−min)/(max−min)]−1. Adjusted normalized values fall between −1 and 1.
• None. No rescaling of covariates.

Fields with unknown measurement level

The Measurement Level alert is displayed when the measurement level for one or more variables (fields) in the dataset is unknown. Since measurement level affects the computation of results for this procedure, all variables must have a defined measurement level.

Scan Data. Reads the data in the active dataset and assigns default measurement level to any fields with a currently unknown measurement level. If the dataset is large, that may take some time.

Assign Manually. Opens a dialog that lists all fields with an unknown measurement level. You can use this dialog to assign measurement level to those fields. You can also assign measurement level in Variable View of the Data Editor.

Since measurement level is important for this procedure, you cannot access the dialog to run this procedure until all fields have a defined measurement level.

This procedure pastes RBF command syntax.