Examples (CATREG command)
CATREG VARIABLES = TEST1 TEST3 TEST2 TEST4 TEST5 TEST6
TEST7 TO TEST9 STATUS01 STATUS02
/ANALYSIS TEST4 (LEVEL=NUME)
WITH TEST1 TO TEST2 (LEVEL=SPORD DEGREE=1 INKNOT=3)
TEST5 TEST7 (LEVEL=SPNOM) TEST8 (LEVEL=ORDI)
STATUS01 STATUS02 (LEVEL=NOMI)
/DISCRETIZATION = TEST1(GROUPING NCAT=5 DISTR=UNIFORM)
TEST5(GROUPING) TEST7(MULTIPLYING)
/INITIAL = RANDOM
/MAXITER = 100
/CRITITER = .000001
/RESAMPLE BOOTSTRAP (100)
/MISSING = MODEIMPU
/PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02)
/PLOT = TRANS (TEST2 TO TEST7 TEST4)
/SAVE
/OUTFILE = '/data/qdata.sav'.
-
VARIABLES
defines variables. The keywordTO
refers to the order of the variables in the working data file. - The
ANALYSIS
subcommand defines variables used in the analysis. It is specified that TEST4 is the dependent variable, with optimal scaling level numerical and that the variables TEST1, TEST2, TEST3, TEST5, TEST7, TEST8, STATUS01, and STATUS02 are the independent variables to be used in the analysis. (The keywordTO
refers to the order of the variables in theVARIABLES
subcommand.) The optimal scaling level for TEST1, TEST2, and TEST3 is spline ordinal; for TEST5 and TEST7, spline nominal; for TEST8, ordinal; and for STATUS01 and STATUS02, nominal. The splines for TEST1 and TEST2 have degree 1 and three interior knots, and the splines for TEST5 and TEST7 have degree 2 and two interior knots (default because unspecified). -
DISCRETIZATION
specifies that TEST5 and TEST7, which are fractional-value variables, are discretized: TEST5 by recoding into seven categories with a normal distribution (default because unspecified) and TEST7 by “multiplying.” TEST1, which is a categorical variable, is recoded into five categories with a close-to-uniform distribution. - Because there are nominal variables, a random initial
solution is requested by the
INITIAL
subcommand. -
MAXITER
specifies the maximum number of iterations to be 100. This is the default, so this subcommand could be omitted here. -
CRITITER
sets the convergence criterion to a value smaller than the default value. - To include cases with missing values, the
MISSING
subcommand specifies that for each variable, missing values are replaced with the most frequent category (the mode). -
RESAMPLE
specifies the .632 bootstrap for estimation of the prediction error using 100 bootstrap samples (in stead of the default of 50). -
PRINT
specifies the correlations, the coefficients, the descriptive statistics for all variables, the ANOVA table, the category quantifications for variables TEST1, TEST2, TEST3, STATUS01, and STATUS02, and the transformed data list of all cases. -
PLOT
is used to request quantification plots for the variables TEST2, TEST5, TEST7, and TEST4. - The
SAVE
subcommand adds the transformed variables to the working data file. The names of these new variables are TRANS1_1, ..., TRANS9_1. - The
OUTFILE
subcommand writes the transformed data to a data file called qdata.sav in the directory /data.
Example: Multiple Systematic Starts
CATREG
...
/INITIAL MULTISTART(ALL)(‘c:\data\startsigns.sav’)
/PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02)
/PLOT = TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’)
- Because the ordinal and spline ordinal scaling levels
are specified for some variables, there is chance of obtaining a suboptimal
solution when applying the numerical or random initial solution.
To ascertain obtaining the optimal solution, all multiple systematic
starts are used. Using all systematic starts is feasible here because
the number of variables with (spline) ordinal scaling is only 3; then
the number of all starts is 2 to the power of 3 is 8. With a larger
number of variables with (spline) ordinal scaling level, a reduced
number of starts is recommended, which can be requested by specifying
/INITIAL MULTISTART(value)
. - The specifications at the
PRINT
,PLOT
,SAVE
, andOUTFILE
subcommands will be applied to the optimal solution.
Example: Fixing Initial Signs for Regression Coefficients
CATREG
...
/INITIAL FIXSIGNS (63) (‘c:\data\startsigns.sav’)
- The
INITIAL
subcommand specifies using a specific set of fixed signs for the regression coefficients. The signs are in the file startsigns.sav in the directory c:\data. This file was created by a previous run ofCATREG
with keywordMULTISTART
at theINITIAL
subcommand (see previous example). The signs of start number 63 are specified to be used.
Example: Elastic Net Regularization
CATREG
...
/REGULARIZATION ENET (.5 2.5 .25) (.01 3.8 .05)(‘c:\data\regu_enet.sav’)
/PRINT = REGU R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2)
/PLOT = REGU (.75 1.5) TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).
-
REGULARIZATION
specifies application of Elastic Net regularization, with start value of the Lasso penalty 0.01, stop value 3.8, and increment 0.05, resulting in 76 regularized models, with Lasso penalty values 0.01,0 .06, ..., 3.76. To each of these 76 Lasso models 10 Ridge penalties are applied (0.5, 0.75, ..., 2.5), resulting in 76 × 10 = 760 Elastic Net models. -
PRINT
specifies displaying a table with the penalty values, R-squared, and the regression coefficients for each regularized model. The contents of this table is written to a data file called regu_enet.sav in the directory c:\data. - The
PLOT
subcommand requests two Elastic Net plots: a Lasso plot with a fixed Ridge penalty of 0.75 and a Lasso plot with a fixed Ridge penalty of 1.50. Any other keywords thanREGU
at thePLOT
subcommand are ignored. - Other specifications then
REGU
at thePRINT
andPLOT
subcommands, theSAVE
subcommand and theTRDATA
keyword at theOUTFILE
subcommand are ignored.
Example: Elastic Net Regularization with Crossvalidation Resampling
CATREG
...
/REGULARIZATION ENET (.5 2.5 .25)(.01 3.8 .05)(‘c:\data\regu_enet.sav’)
/RESAMPLE CROSSVAL (5)
/PRINT = REGU R COEFF DESCRIP ANOVA
/PLOT = REGU (.75 1.5) TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).
-
REGULARIZATION
is the same as in the previous example. - The
RESAMPLE
subcommand specifies 5-fold cross-validation to estimate the prediction error for each of the 760 Elastic Net models. -
PRINT
specifies displaying a table with the penalty values, R-squared, the regression coefficients, and the estimated prediction error for each regularized model. The contents of this table is written to a data file called regu_enet.sav in the directory c:\data. - The specification at the
PLOT
subcommand result in the same plots as in the previous example. - The other specifications at the
PRINT
andPLOT
subcommands, and theSAVE
andOUTFILE
specifications will be applied to the model with lowest prediction error.
Example: Obtaining a Specific Elastic Net Model
CATREG
...
/REGULARIZATION ENET (1.25 1.25 0)(.46 .46 0)
/PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02)
/PLOT = TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).
-
REGULARIZATION
is specified here (stop value equal to start value, increment zero) to obtain output for a specific Elastic Net model: the model with penalty values 1.25 (Ridge) and .46 (Lasso).