Examples (CATREG command)
CATREG VARIABLES = TEST1 TEST3 TEST2 TEST4 TEST5 TEST6
TEST7 TO TEST9 STATUS01 STATUS02
/ANALYSIS TEST4 (LEVEL=NUME)
WITH TEST1 TO TEST2 (LEVEL=SPORD DEGREE=1 INKNOT=3)
TEST5 TEST7 (LEVEL=SPNOM) TEST8 (LEVEL=ORDI)
STATUS01 STATUS02 (LEVEL=NOMI)
/DISCRETIZATION = TEST1(GROUPING NCAT=5 DISTR=UNIFORM)
TEST5(GROUPING) TEST7(MULTIPLYING)
/INITIAL = RANDOM
/MAXITER = 100
/CRITITER = .000001
/RESAMPLE BOOTSTRAP (100)
/MISSING = MODEIMPU
/PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02)
/PLOT = TRANS (TEST2 TO TEST7 TEST4)
/SAVE
/OUTFILE = '/data/qdata.sav'.
-
VARIABLESdefines variables. The keywordTOrefers to the order of the variables in the working data file. - The
ANALYSISsubcommand defines variables used in the analysis. It is specified that TEST4 is the dependent variable, with optimal scaling level numerical and that the variables TEST1, TEST2, TEST3, TEST5, TEST7, TEST8, STATUS01, and STATUS02 are the independent variables to be used in the analysis. (The keywordTOrefers to the order of the variables in theVARIABLESsubcommand.) The optimal scaling level for TEST1, TEST2, and TEST3 is spline ordinal; for TEST5 and TEST7, spline nominal; for TEST8, ordinal; and for STATUS01 and STATUS02, nominal. The splines for TEST1 and TEST2 have degree 1 and three interior knots, and the splines for TEST5 and TEST7 have degree 2 and two interior knots (default because unspecified). -
DISCRETIZATIONspecifies that TEST5 and TEST7, which are fractional-value variables, are discretized: TEST5 by recoding into seven categories with a normal distribution (default because unspecified) and TEST7 by “multiplying.” TEST1, which is a categorical variable, is recoded into five categories with a close-to-uniform distribution. - Because there are nominal variables, a random initial
solution is requested by the
INITIALsubcommand. -
MAXITERspecifies the maximum number of iterations to be 100. This is the default, so this subcommand could be omitted here. -
CRITITERsets the convergence criterion to a value smaller than the default value. - To include cases with missing values, the
MISSINGsubcommand specifies that for each variable, missing values are replaced with the most frequent category (the mode). -
RESAMPLEspecifies the .632 bootstrap for estimation of the prediction error using 100 bootstrap samples (in stead of the default of 50). -
PRINTspecifies the correlations, the coefficients, the descriptive statistics for all variables, the ANOVA table, the category quantifications for variables TEST1, TEST2, TEST3, STATUS01, and STATUS02, and the transformed data list of all cases. -
PLOTis used to request quantification plots for the variables TEST2, TEST5, TEST7, and TEST4. - The
SAVEsubcommand adds the transformed variables to the working data file. The names of these new variables are TRANS1_1, ..., TRANS9_1. - The
OUTFILEsubcommand writes the transformed data to a data file called qdata.sav in the directory /data.
Example: Multiple Systematic Starts
CATREG
...
/INITIAL MULTISTART(ALL)(‘c:\data\startsigns.sav’)
/PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02)
/PLOT = TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’)
- Because the ordinal and spline ordinal scaling levels
are specified for some variables, there is chance of obtaining a suboptimal
solution when applying the numerical or random initial solution.
To ascertain obtaining the optimal solution, all multiple systematic
starts are used. Using all systematic starts is feasible here because
the number of variables with (spline) ordinal scaling is only 3; then
the number of all starts is 2 to the power of 3 is 8. With a larger
number of variables with (spline) ordinal scaling level, a reduced
number of starts is recommended, which can be requested by specifying
/INITIAL MULTISTART(value). - The specifications at the
PRINT,PLOT,SAVE, andOUTFILEsubcommands will be applied to the optimal solution.
Example: Fixing Initial Signs for Regression Coefficients
CATREG
...
/INITIAL FIXSIGNS (63) (‘c:\data\startsigns.sav’)
- The
INITIALsubcommand specifies using a specific set of fixed signs for the regression coefficients. The signs are in the file startsigns.sav in the directory c:\data. This file was created by a previous run ofCATREGwith keywordMULTISTARTat theINITIALsubcommand (see previous example). The signs of start number 63 are specified to be used.
Example: Elastic Net Regularization
CATREG
...
/REGULARIZATION ENET (.5 2.5 .25) (.01 3.8 .05)(‘c:\data\regu_enet.sav’)
/PRINT = REGU R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2)
/PLOT = REGU (.75 1.5) TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).
-
REGULARIZATIONspecifies application of Elastic Net regularization, with start value of the Lasso penalty 0.01, stop value 3.8, and increment 0.05, resulting in 76 regularized models, with Lasso penalty values 0.01,0 .06, ..., 3.76. To each of these 76 Lasso models 10 Ridge penalties are applied (0.5, 0.75, ..., 2.5), resulting in 76 × 10 = 760 Elastic Net models. -
PRINTspecifies displaying a table with the penalty values, R-squared, and the regression coefficients for each regularized model. The contents of this table is written to a data file called regu_enet.sav in the directory c:\data. - The
PLOTsubcommand requests two Elastic Net plots: a Lasso plot with a fixed Ridge penalty of 0.75 and a Lasso plot with a fixed Ridge penalty of 1.50. Any other keywords thanREGUat thePLOTsubcommand are ignored. - Other specifications then
REGUat thePRINTandPLOTsubcommands, theSAVEsubcommand and theTRDATAkeyword at theOUTFILEsubcommand are ignored.
Example: Elastic Net Regularization with Crossvalidation Resampling
CATREG
...
/REGULARIZATION ENET (.5 2.5 .25)(.01 3.8 .05)(‘c:\data\regu_enet.sav’)
/RESAMPLE CROSSVAL (5)
/PRINT = REGU R COEFF DESCRIP ANOVA
/PLOT = REGU (.75 1.5) TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).
-
REGULARIZATIONis the same as in the previous example. - The
RESAMPLEsubcommand specifies 5-fold cross-validation to estimate the prediction error for each of the 760 Elastic Net models. -
PRINTspecifies displaying a table with the penalty values, R-squared, the regression coefficients, and the estimated prediction error for each regularized model. The contents of this table is written to a data file called regu_enet.sav in the directory c:\data. - The specification at the
PLOTsubcommand result in the same plots as in the previous example. - The other specifications at the
PRINTandPLOTsubcommands, and theSAVEandOUTFILEspecifications will be applied to the model with lowest prediction error.
Example: Obtaining a Specific Elastic Net Model
CATREG
...
/REGULARIZATION ENET (1.25 1.25 0)(.46 .46 0)
/PRINT = R COEFF DESCRIP ANOVA QUANT(TEST1 TO TEST2 STATUS01 STATUS02)
/PLOT = TRANS (TEST2 TO TEST7 TEST4)
/SAVE TRDATA PRED RES
/OUTFILE = TRDATA(‘c:\data\qdata.sav’) DISCRDATA(‘c:\data\discr.sav’).
-
REGULARIZATIONis specified here (stop value equal to start value, increment zero) to obtain output for a specific Elastic Net model: the model with penalty values 1.25 (Ridge) and .46 (Lasso).