PARTITION Subcommand (KNN command)

The PARTITION subcommand specifies the method of partitioning the active dataset into training and holdout samples. The training sample comprises the data records used to train the model. The holdout sample is an independent set of data records used to assess the final model.

  • The partition can be defined by specifying the ratio of cases randomly assigned to each sample (training and holdout), or by a variable that assigns each case to the training or holdout sample.
  • If the PARTITION subcommand is not specified, then the default partition randomly assigns 70% of the cases to the training sample and 30% to the holdout sample. If you want to specify a different random assignment, then you must specify new values for the TRAINING and HOLDOUT keywords. The value specified on each keyword gives the relative number of cases in the active dataset to assign to each sample. For example, /PARTITION TRAINING = 50 HOLDOUT = 50 is equivalent to /PARTITION TRAINING = 5 HOLDOUT = 5; both subcommands randomly assign 50% of the cases to the training sample and 50% to the holdout sample.
  • If you want to be able to reproduce results based on the TRAINING and HOLDOUT keywords later, use the SET command to set the initialization value for the random number generator before running the KNN procedure.
  • See the discussion of the relationship between rescaling and partitioning in the RESCALE subcommand section.
  • All partitioning is performed after listwise deletion of any cases with invalid data for any variable used by the procedure. See the MISSING subcommand for details about valid and invalid data.
  • It is invalid to specify both TRAINING and VARIABLE.

TRAINING Keyword

The TRAINING keyword specifies the relative number of cases in the active dataset to randomly assign to the training sample.

The value must be an integer greater than 0. The default (if the PARTITION subcommand is not specified) is 70.

HOLDOUT Keyword

The HOLDOUT keyword specifies the relative number of cases in the active dataset to randomly assign to the holdout sample.

The value must be an integer greater than or equal to 0. The default (if the PARTITION subcommand is not specified) is 30.

VARIABLE Keyword

The VARIABLE keyword specifies a variable that assigns each case in the active dataset to the training or holdout sample. Cases with a positive value on the variable are assigned to the training sample and cases with a non-positive value to the holdout sample. Cases with a system-missing value are excluded from the analysis. (Any user-missing values for the partition variable are always treated as valid.)

The variable may not be a dependent variable or any variable specified on the command line factor or covariate lists. The variable must be numeric.