PARTITION Subcommand (LINEAR_RIDGE extension command)
The PARTITION
subcommand provides a way to create a holdout or test subset of
the input data for estimation of out-of-sample performance of the specified or chosen model. Note
that for crossvalidation, folds or partitions of the training data are created in Python. The
holdout data created by PARTITION
is not used in estimation, regardless of the mode
in effect.
- The partition can be defined by specifying the ratio of cases randomly assigned to each sample (training and holdout), or by a variable that assigns each case to the training or holdout sample.
- If the
PARTITION
subcommand is not specified, a holdout sample is created of approximately 30% of the input data is created. - A different split can be assigned by specifying other values for the
TRAINING
andHOLDOUT
keywords. If one of these keywords is specified, the other also must be specified. - The value specified on each keyword gives the relative number of cases in the active dataset to
assign to each sample. For example,
/PARTITION TRAINING = 50 HOLDOUT = 50
is equivalent to/PARTITION TRAINING = 5 HOLDOUT = 5
; both subcommands randomly assign 50% of the cases to the training sample and 50% to the holdout sample. - To use all data for training and not create a holdout sample, specify a positive value for TRAINING and 0 for HOLDOUT.
- If you want to be able to reproduce results based on the
TRAINING
andHOLDOUT
keywords later, use theSET
command to set the initialization value for the random number generator before running the LINEAR_RIDGE extension procedure. - All partitioning is performed after listwise deletion of any cases with invalid data for any variable used by the procedure.
- It is invalid to specify both
TRAINING
andVARIABLE
.
- TRAINING Keyword
- The
TRAINING
keyword specifies the relative number of cases in the active dataset to randomly assign to the training sample.The value must be an integer greater than 0. The default (if the
PARTITION
subcommand is specified without keywords) is 70. - HOLDOUT Keyword
- The
HOLDOUT
keyword specifies the relative number of cases in the active dataset to randomly assign to the holdout sample.The value must be an integer greater than or equal to 0. The default (if the
PARTITION
subcommand is specified without keywords) is 30. - VARIABLE Keyword
- The
VARIABLE
keyword specifies a variable that assigns each case in the active dataset to the training or holdout sample. Cases with a positive value on the variable are assigned to the training sample and cases with a non-positive value to the holdout sample. Cases with a system-missing value are excluded from the analysis. (Any user-missing values for the partition variable are always treated as valid.)The variable may not be the dependent variable or any variable specified on the command line factor or covariate lists. The variable must be numeric.