Overview (SELECTPRED command)
The SELECTPRED
procedure submits a large number
of predictor variables and selects a smaller subset for use in predictive
modeling procedures. This procedure uses a univariable method that
considers each predictor in isolation, as opposed to the multivariable
method of selecting procedures that is used by the NAIVEBAYES
procedure.
The SELECTPRED
procedure supports both categorical
and scale dependent variables and accepts very large sets of predictors. SELECTPRED
is
useful when rapid computing is required.
Options
Methods. The SELECTPRED
procedure includes
a screening step as well as the univariable predictor selection method.
Missing values. Cases with missing values for the dependent
variable or for all predictors are excluded. The SELECTPRED
procedure
has an option for treating user-missing values of categorical variables
as valid. User-missing values of scale variables are always treated
as invalid.
Output. SELECTPRED
displays pivot table
output by default but offers an option for suppressing most such output.
The procedure optionally displays lists of categorical and scale
predictors—not a model—by way of global macro variables. These global
macro variables can be used to represent the subset of selected predictors
in subsequent procedures.
Basic specification
The basic specification is the SELECTPRED
command
followed by a dependent variable.
By default, SELECTPRED
determines the measurement
level of the dependent variable based on its dictionary setting.
All other variables in the dataset — except the weight variable if
it is defined — are treated as predictors, with the dictionary setting
of each predictor determining its measurement level. SELECTPRED
performs
the screening step and then selects a subset of predictors by using
a univariable method. User-missing values are excluded, and default
pivot table output is displayed.
Syntax rules
- All subcommands are optional.
- Only a single instance of each subcommand is allowed.
- An error occurs if a keyword is specified more than once within a subcommand.
- Parentheses, equals signs, and slashes that are shown in the syntax chart are required.
- The command name, subcommand names, and keywords must be spelled in full.
- Empty subcommands are not honored.
Operations
The SELECTPRED
procedure begins by excluding the
following types of cases and predictors.
- Cases with a missing value for the dependent variable.
- Cases with missing values for all predictors.
- Predictors with missing values for all cases.
- Predictors with the same value for all cases.
A further screening step can be used to exclude the following types of predictors:
- Predictors with a large percentage of missing values.
- Categorical predictors with a large percentage of cases representing a single category.
- Categorical predictors with a large percentage of categories containing one case.
- Scale predictors with a small coefficient of variation (standard deviation divided by mean).
The univariable predictor selection method also ranks each predictor based on its association with the dependent variable and selects the top subset of predictors to use in subsequent modeling procedures. The ranking criterion that is used for any given predictor depends on the measurement level of the dependent variable as well as the measurement level of the predictor.
Categorical dependent variable.
- For categorical predictors, the ranking criterion can be the Pearson chi-square p-value, likelihood ratio chi-square p-value, Cramér's V, or Lambda.
- For scale predictors, the F p-value from a one-way ANOVA is always used as the criterion.
- For mixed predictors, the categorical predictors use the chi-square p-value or the likelihood ratio chi-square p-value, and the scale predictors use the F p-value from a one-way ANOVA.
Scale dependent variable.
- For categorical predictors, the F p-value from a one-way ANOVA is always used as the criterion.
- For scale predictors, the p-value for a Pearson correlation coefficient is always used.
- For mixed predictors, both types of p-values are used, depending on the predictor type.
Frequency weight
If a WEIGHT
variable is specified, its values
are used as frequency weights by the SELECTPRED
procedure.
- Cases with missing weights or weights that are less than 0.5 are not used in the analyses.
- The weight values are rounded to the nearest whole numbers before use. For example, 0.5 is rounded to 1, and 2.4 is rounded to 2.
Limitations
SPLIT FILE
settings are ignored by the SELECTPRED
procedure.