MODEL Subcommand (KNN command)
The MODEL subcommand is used to specify the nearest neighbor “model”. By default, the procedure builds a model based on 3 nearest neighbors, using all features specified on the variables list and the Euclidean distance measure of “nearness”.
METRIC Keyword
The METRIC keyword allows you to specify the distance metric used to measure the similarity of cases.
EUCLID. Euclidean distance. This is the default specification for METRIC. The distance between two cases, x and y, is the square root of the sum, over all dimensions, of the squared differences between the values for the cases.
CITYBLOCK. City-block or Manhattan distance. The distance between two cases is the sum, over all dimensions, of the absolute differences between the values for the cases.
NEIGHBORS Keyword
The NEIGHBORS keyword indicates whether to use automatic selection of the number of nearest neighbors.
If no dependent variable is specified, then any specification other than NEIGHBORS=FIXED is ignored with a warning.
FIXED. Use a fixed number of neighbors. This is the default. The FIXED keyword may be followed by parentheses containing the K option, which specifies the number of neighbors. K must be a positive integer. The default value is 3. If NEIGHBORS=FIXED is specified, then any CROSSVALIDATION subcommand specifications are ignored with a warning.
AUTO. Use automatic selection to determine the “best” number of neighbors. The AUTO keyword may be follwed by parentheses containing the KMIN and KMAX options, which specify the minimum and maximum number of neighbors, respectively, that automatic number of neighbors selection will consider in determining the “best” number of neighbors. It is invalid to specify only one option; you must specify both or neither. The options may be specified in any order and must be separated by a comma or space character. Both numbers must be integers greater than 0, with KMIN less than KMAX. The defaults are KMIN=3, KMAX=5. If NEIGHBORS=AUTO and FEATURES=ALL are specified, then V-fold cross-validation is used to select the “best” number of neighbors. The CROSSVALIDATION subcommand specifies the settings for V-fold cross-validation.
FEATURES Keyword
The FEATURES keyword indicates whether to use automatic selection of features (predictors).
If no dependent variable is specified, then any specification other than FEATURES=ALL is ignored with a warning.
ALL. Use all predictors on the command line variable lists. This is the default.
AUTO . Use forward selection to determine the “best” feature set. The AUTO keyword may be followed by parentheses containing the FORCE option, which specifies the starting set of predictors that must be included in the model. There is no default variable list on the FORCE option.
- If FEATURES=AUTO is specified, then any CROSSVALIDATION subcommand specifications are ignored with a warning.
- It is invalid for the variable list on the FORCE option to include all possible predictors; that is, there must be at least one predictor available for feature selection if FEATURES=AUTO is specified.
Combined Neighbors and Features Selection
When NEIGHBORS=AUTO and FEATURES=AUTO, the following method is used for combined neighbors and features selection:
- For each k, use the forward selection method for feature selection.
- Select the k, and accompanying feature set, with the lowest error rate or the lowest sum-of-squares error.