PARTITION Subcommand (RBF command)
The PARTITION
subcommand specifies
the method of partitioning the active dataset into training, testing,
and holdout samples. The training sample comprises the data records
used to train the neural network. The testing sample is an independent
set of data records used to track prediction error during training
in order to prevent overtraining. The holdout sample is another independent
set of data records used to assess the final neural network.
- The partition can be defined by specifying the ratio of cases randomly assigned to each sample (training, testing, and holdout) or by a variable that assigns each case to the training, testing, or holdout sample.
- If the
PARTITION
subcommand is not specified, then the default partition randomly assigns 70% of the cases to the training sample, 30% to the testing sample, and 0% to the holdout sample. If you want to specify a different random assignment, then you must specify new values for theTRAINING
,TESTING
, andHOLDOUT
keywords. The value specified on each keyword gives the relative number of cases in the active dataset to assign to each sample. For example,/PARTITION TRAINING = 50 TESTING = 30 HOLDOUT = 20
is equivalent to/PARTITION TRAINING = 5 TESTING = 3 HOLDOUT = 2
; both subcommands randomly assign 50% of the cases to the training sample, 30% to the testing sample, and 20% to the holdout sample. - If you want to be able to reproduce results based
on the
TRAINING
,TESTING
, andHOLDOUT
keywords later, use theSET
command to set the initialization value for the random number generator before running theRBF
procedure. - Be aware of the relationship between rescaling and partitioning. See the topic RESCALE Subcommand (RBF command) for more information.
- All partitioning is performed after listwise deletion of any cases with invalid data for any variable used by the procedure. See MISSING Subcommand (MLP command) for details about valid and invalid data.
TRAINING Keyword
The TRAINING
keyword specifies
the relative number of cases in the active dataset to randomly assign
to the training sample. The value must be an integer greater than
0. The default (if the PARTITION
subcommand is not specified) is 70.
TESTING Keyword
The TESTING
keyword specifies
the relative number of cases in the active dataset to randomly assign
to the testing sample. The value must be an integer greater than 0.
The default (if the PARTITION
subcommand is not specified) is 30.
HOLDOUT Keyword
The HOLDOUT
keyword specifies
the relative number of cases in the active dataset to randomly assign
to the holdout sample. The value must be an integer greater than 0.
The default (if the PARTITION
subcommand is not specified) is 0.
VARIABLE Keyword
The VARIABLE
keyword specifies
a variable that assigns each case in the active dataset to the training,
testing, or holdout sample. Cases with a positive value on the variable
are assigned to the training sample, cases with a value of 0, to the
testing sample, and cases with a negative value, to the holdout sample.
Cases with a system-missing value are excluded from the analysis.
(Any user-missing values for the partition variable are always treated
as valid.)
The variable may not be the dependent variable or any variable specified on the command line factor or covariate lists. The variable must be numeric.